| 1 |
Phi-4-reasoning-vision-15B Technical Report |
提出Phi-4-reasoning-vision-15B,一种紧凑型开源多模态推理模型,擅长视觉语言任务及科学数学推理。 |
multimodal chain-of-thought |
|
|
| 2 |
FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions |
提出FeedAIde以解决用户反馈报告不完整问题 |
large language model multimodal |
|
|
| 3 |
RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation |
RAGNav:一种检索增强的拓扑推理框架,用于多目标视觉-语言导航 |
VLN |
|
|
| 4 |
Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows |
Agentics 2.0:提出逻辑转换代数,构建可靠、可扩展、可观测的Agentic数据工作流 |
large language model |
|
|
| 5 |
CodeTaste: Can LLMs Generate Human-Level Code Refactorings? |
CodeTaste:评估LLM在代码重构中能否达到人类水平,并提出改进方法 |
large language model |
|
|
| 6 |
In-Context Environments Induce Evaluation-Awareness in Language Models |
利用上下文环境诱导语言模型产生评估感知,揭示其潜在的策略性欠佳表现 |
instruction following |
|
|
| 7 |
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development |
提出双螺旋治理框架,提升Agentic AI在WebGIS开发中的可靠性 |
large language model |
|
|
| 8 |
Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions |
提出RealPref基准,评估LLM在个性化用户交互中长期偏好跟随能力 |
large language model |
✅ |
|
| 9 |
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts |
提出CAM-LDS数据集,用于提升LLM在系统日志和安全警报中的网络攻击自动解释能力。 |
large language model |
|
|
| 10 |
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration |
提出SWE-CI基准,评估LLM智能体在持续集成中维护代码库的能力 |
large language model |
|
|
| 11 |
MACC: Multi-Agent Collaborative Competition for Scientific Exploration |
提出MACC框架,用于研究多智能体在科学探索中的协作与竞争机制。 |
large language model |
|
|
| 12 |
AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment |
AI4S-SDS:基于稀疏MCTS和可微物理对齐的神经符号溶剂设计系统 |
large language model |
|
|