| 1 |
See and Remember: A Multimodal Agent for Web Traversal |
提出V-GEMS,解决LLM智能体Web导航中的空间迷失和循环问题 |
large language model multimodal visual grounding |
✅ |
|
| 2 |
ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization |
提出ShipTraj-R1,利用大语言模型和强化学习优化船舶轨迹预测。 |
large language model chain-of-thought |
|
|
| 3 |
LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model |
提出LLM-MLFFN,利用大语言模型融合多层次特征,提升自动驾驶行为分类精度。 |
large language model |
|
|
| 4 |
Detecting Structural Heart Disease from Electrocardiograms via a Generalized Additive Model of Interpretable Foundation-Model Predictors |
提出基于可解释 ECG 基础模型预测器的广义加性模型,用于心血管疾病检测。 |
foundation model |
|
|
| 5 |
NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect |
提出NeuroProlog以解决数学推理中的逻辑不一致问题 |
large language model symbolic grounding |
|
|
| 6 |
SorryDB: Can AI Provers Complete Real-World Lean Theorems? |
提出SorryDB:一个动态更新的Lean定理证明基准,用于评估AI证明器的能力。 |
large language model |
|
|
| 7 |
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework |
提出基于贝叶斯对抗多智能体框架的AI for Science低代码平台,提升科学代码生成可靠性。 |
large language model |
|
|
| 8 |
Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling |
提出类型感知检索增强生成方法,解决工业优化建模中模型可执行性问题。 |
large language model |
|
|
| 9 |
Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification |
Saarthi框架通过规则和RAG增强,提升形式验证领域特定通用智能。 |
large language model |
|
|
| 10 |
Agentic AI-based Coverage Closure for Formal Verification |
提出基于代理AI的覆盖闭合方法以提升形式验证效率 |
large language model |
|
|
| 11 |
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation |
提出程序感知评估(PAE)框架,揭示LLM Agent任务完成中的隐蔽性错误。 |
large language model |
|
|
| 12 |
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry |
REGAL:一种注册表驱动架构,用于企业遥测中Agentic AI的确定性基础 |
large language model |
|
|
| 13 |
OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents |
OrchMAS:提出多智能体协同框架,解决科学领域复杂推理难题 |
large language model |
|
|
| 14 |
Architecting Trust in Artificial Epistemic Agents |
构建可信赖的认知AI Agent,应对知识生态系统中的挑战。 |
large language model |
|
|
| 15 |
SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment |
提出LLM推理碳排放估算框架以应对可持续性挑战 |
large language model |
|
|
| 16 |
LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates |
提出融合论证挖掘、论证逻辑与描述逻辑的统一框架,用于辩论推理。 |
large language model |
|
|
| 17 |
Agentified Assessment of Logical Reasoning Agents |
提出基于Agent的逻辑推理评估框架,提升评估的可复现性、可审计性和鲁棒性。 |
chain-of-thought |
|
|
| 18 |
Rethinking Code Similarity for Automated Algorithm Design with LLMs |
提出BehaveSim,通过行为相似性度量提升LLM驱动的算法自动设计。 |
large language model |
✅ |
|
| 19 |
EvoSkill: Automated Skill Discovery for Multi-Agent Systems |
提出EvoSkill以自动发现多智能体系统中的技能 |
zero-shot transfer |
|
|
| 20 |
A Natural Language Agentic Approach to Study Affective Polarization |
提出基于自然语言Agent的框架,用于研究社交媒体中的情感极化现象 |
large language model |
|
|
| 21 |
LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges |
LiveAgentBench:包含104个真实世界挑战的Agentic系统综合基准测试 |
large language model |
|
|
| 22 |
A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities |
提出NeuroCognition基准,从神经心理学角度评估LLM认知能力 |
large language model |
|
|
| 23 |
Human-Certified Module Repositories for the AI Age |
提出人工认证模块仓库HCMRs,保障AI辅助开发时代软件可信度 |
large language model |
|
|