| 1 |
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace |
构建城市空域导航基准,评估大型多模态模型在具身空间行为中的能力 |
vision-language-action multimodal |
✅ |
|
| 2 |
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems |
MONETA:利用地理信息和多智能体系统进行多模态行业分类 |
large language model multimodal |
|
|
| 3 |
CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models |
CIAO:利用大语言模型自动生成软件架构文档,提升系统可理解性。 |
large language model |
|
|
| 4 |
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models |
提出ImplicitMemBench,用于评估大语言模型中无意识行为适应能力的基准测试。 |
large language model |
|
|
| 5 |
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models |
MIMIC-Py:基于LLM的性格驱动型自动化游戏测试可扩展工具 |
large language model |
✅ |
|
| 6 |
Emotion Concepts and their Function in a Large Language Model |
发现大语言模型中功能性情绪:情绪概念影响模型行为与对齐 |
large language model |
|
|
| 7 |
Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM |
提出DiADEM模型,通过人口统计学重要性加权建模标注者分布,提升主观内容理解。 |
large language model chain-of-thought |
|
|
| 8 |
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs |
构建演绎推理统一分类法,并对LLM中的溯因推理进行全面调研。 |
large language model |
|
|
| 9 |
Are we still able to recognize pearls? Machine-driven peer review and the risk to creativity: An explainable RAG-XAI detection framework with markers extraction |
提出RAG-XAI框架,用于检测机器驱动的同行评审,保障科研创造力。 |
large language model |
|
|
| 10 |
Visual Perceptual to Conceptual First-Order Rule Learning Networks |
提出γILP框架,解决从图像数据中学习一阶规则并自动生成谓词的难题。 |
large language model |
|
|
| 11 |
From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis |
针对多Agent LLM系统中同伴保护现象,提出基于身份匿名化的民主讨论分析架构设计 |
large language model |
|
|
| 12 |
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing |
提出SAVeR框架,通过自审计保证LLM Agent推理过程的忠实性。 |
large language model |
|
|
| 13 |
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver |
SkillClaw:通过Agentic Evolver实现技能的集体进化,提升多用户Agent生态系统性能 |
large language model |
|
|
| 14 |
Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents |
TrACE:基于行动一致性的LLM Agent自适应计算控制器 |
large language model |
|
|
| 15 |
Neural-Symbolic Knowledge Tracing: Injecting Educational Knowledge into Deep Learning for Responsible Learner Modelling |
提出Responsible-DKT,融合教育知识的神经符号知识追踪方法,提升学习者建模的责任性。 |
large language model |
|
|
| 16 |
IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling |
IoT-Brain:通过空间轨迹图STG连接LLM与物理世界,实现语义空间传感器调度 |
large language model |
|
|
| 17 |
DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues |
DialBGM:提出一个日常多轮对话背景音乐推荐的基准数据集。 |
multimodal |
|
|
| 18 |
An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks |
提出Agentic评估架构,用于检测教育教科书中存在的历史偏见。 |
multimodal |
|
|
| 19 |
PyVRP$^+$: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems |
PyVRP$^+$:基于LLM驱动的元认知启发式进化,用于车辆路径问题中的混合遗传搜索 |
large language model |
|
|
| 20 |
SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility |
SPARD:通过整合奖励动态和数据效用,实现强化学习对齐的自步课程学习 |
large language model |
|
|
| 21 |
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders |
提出KnowSA_CKP,通过选择性知识增强提升LLM推荐器的性能和效率 |
large language model |
|
|
| 22 |
More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration |
揭示LLM在零成本协作中失效的原因,强调智能扩展并非解决多智能体协作的唯一途径 |
large language model |
|
|
| 23 |
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution |
Squeeze Evolve:用于无验证器进化的统一多模型编排框架 |
multimodal |
|
|
| 24 |
Towards Knowledgeable Deep Research: Framework and Benchmark |
提出混合知识分析框架HKA,解决深度研究中结构化与非结构化知识融合问题。 |
multimodal |
|
|
| 25 |
Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System |
提出基于多Agent协同的高通量材料筛选框架,提升HPC系统利用率。 |
large language model |
|
|
| 26 |
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems |
MONETA:利用地理信息和多智能体系统进行多模态行业分类 |
large language model multimodal |
|
|
| 27 |
Demystifying the Silence of Correctness Bugs in PyTorch Compiler |
针对PyTorch编译器正确性Bug,提出基于LLM变异的检测方法AlignGuard |
large language model |
|
|
| 28 |
Model Space Reasoning as Search in Feedback Space for Planning Domain Generation |
提出基于反馈空间搜索的模型空间推理方法,用于规划领域自动生成。 |
large language model |
|
|
| 29 |
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution |
Squeeze Evolve:用于无验证器进化的统一多模型编排框架 |
multimodal |
|
|
| 30 |
Towards Knowledgeable Deep Research: Framework and Benchmark |
提出混合知识分析框架HKA,解决深度研究中结构化与非结构化知识融合问题。 |
multimodal |
|
|