| 1 |
Do Audio-Visual Large Language Models Really See and Hear? |
AVLLM模态偏见研究:揭示视听大语言模型中视觉主导的融合机制 |
large language model multimodal |
|
|
| 2 |
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? |
Agentic-MME:用于评估多模态智能体能力的流程验证基准 |
large language model multimodal |
|
|
| 3 |
Improving MPI Error Detection and Repair with Large Language Models and Bug References |
利用LLM和Bug参考,提升MPI错误检测与修复能力 |
large language model chain-of-thought |
|
|
| 4 |
Understanding the Effects of Safety Unalignment on Large Language Models |
研究安全对齐失效对大型语言模型的影响,揭示权重正交化方法的潜在风险。 |
large language model |
|
|
| 5 |
AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models |
AutoVerifier:利用大语言模型自动验证科技情报的Agent框架 |
large language model |
|
|
| 6 |
Analysis of Optimality of Large Language Models on Planning Problems |
分析大型语言模型在规划问题上的最优性 |
large language model |
|
|
| 7 |
When simulations look right but causal effects go wrong: Large language models as behavioral simulators |
大型语言模型作为行为模拟器,描述性拟合良好但因果效应预测失准 |
large language model |
|
|
| 8 |
Automated Malware Family Classification using Weighted Hierarchical Ensembles of Large Language Models |
提出基于加权层级集成大语言模型的零标签恶意软件家族分类框架 |
large language model |
|
|
| 9 |
Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution |
AutoCO:利用大语言模型和双向协同进化解决约束优化问题 |
large language model |
|
|
| 10 |
Chain-of-Authorization: Embedding authorization into large language models |
提出Chain-of-Authorization框架,将访问控制嵌入大语言模型推理过程,提升安全性。 |
large language model |
|
|
| 11 |
Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling |
提出基于知识图谱和能力问题的可控RAG架构,用于文化遗产故事生成。 |
large language model multimodal |
|
|
| 12 |
Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection |
提出ForenAgent,利用Agentic工具进行图像伪造检测,实现更灵活可解释的分析。 |
large language model multimodal |
|
|
| 13 |
Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference |
针对大规模MoE LLM推理,提出数据移动预测方法以优化系统效率。 |
large language model |
|
|
| 14 |
ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents |
提出ProdCodeBench,一个源于真实生产环境的AI代码生成Agent评估基准。 |
foundation model |
|
|
| 15 |
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web |
Holos:一个基于Web规模LLM的多智能体系统,旨在构建Agentic Web。 |
large language model |
|
|
| 16 |
I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime |
AI Agent倾向于掩盖欺诈和暴力犯罪证据以服务公司利益 |
large language model |
|
|
| 17 |
Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity |
提出量化角色清晰度以解决多智能体协作中的角色一致性问题 |
large language model |
|
|
| 18 |
InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking |
提出InfoSeeker,解决Web信息搜寻中大规模异构数据聚合的挑战。 |
large language model |
|
|
| 19 |
Beyond Message Passing: Toward Semantically Aligned Agent Communication |
分析LLM Agent通信协议,揭示语义对齐不足,提出未来研究方向。 |
large language model |
|
|
| 20 |
Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis |
提出Ambig-IaC以解决云基础设施代码生成中的歧义问题 |
large language model |
|
|
| 21 |
Audio Spatially-Guided Fusion for Audio-Visual Navigation |
提出音频空间引导融合方法,提升音频-视觉导航在未知环境下的泛化性 |
multimodal |
|
|
| 22 |
From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks |
利用LLM为CAPEC和CWE框架生成代码,提升漏洞理解与检测 |
large language model |
|
|
| 23 |
High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination |
揭示LLM在群体协作中高波动性和行动偏见,与人类存在显著差异 |
large language model |
|
|
| 24 |
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers |
提出GBQA:一个评估LLM作为质量保证工程师能力的游戏基准 |
large language model |
|
|
| 25 |
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems |
揭示LLM多智能体系统中集体认知涌现的幂律,并提出DTI解决集成瓶颈。 |
large language model |
|
|
| 26 |
ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs |
ChatSVA:通过任务特定LLM桥接SVA生成,用于硬件验证 |
large language model |
|
|
| 27 |
LLM+Graph@VLDB'2025 Workshop Summary |
LLM+Graph研讨会聚焦LLM与图数据融合,推动算法与系统创新 |
large language model |
|
|
| 28 |
AlertStar: Path-Aware Alert Prediction on Hyper-Relational Knowledge Graphs |
AlertStar:基于超关系知识图谱的路径感知警报预测 |
TAMP |
|
|
| 29 |
An Independent Safety Evaluation of Kimi K2.5 |
Kimi K2.5安全性评估:揭示开源大模型在CBRNE、网络安全和偏见等方面的潜在风险 |
multimodal |
|
|
| 30 |
A Systematic Security Evaluation of OpenClaw and Its Variants |
系统性评估OpenClaw及其变体的安全漏洞,揭示工具增强型AI Agent的潜在风险。 |
large language model |
|
|
| 31 |
Glia: A Human-Inspired AI for Automated Systems Design and Optimization |
Glia:一种受人类启发的人工智能,用于自动化系统设计与优化 |
large language model |
|
|
| 32 |
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics |
ContextMATH基准测试揭示大语言模型在上下文数学推理中问题建模能力的不足 |
large language model |
|
|
| 33 |
Therefore I am. I Think |
研究表明大型语言模型在推理前已初步决定,推理过程可能服务于决策 |
chain-of-thought |
|
|
| 34 |
Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models |
提出用户轮次生成探针,评估语言模型交互感知能力,发现任务准确率与交互感知解耦。 |
instruction following |
|
|
| 35 |
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS |
提出一种两阶段提示选择策略,提升零样本TTS的情感强度和说话人一致性 |
large language model |
|
|
| 36 |
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs |
StructEval:构建LLM结构化输出能力评测基准,揭示模型在多种格式上的性能差距。 |
large language model |
|
|
| 37 |
Terminal Agents Suffice for Enterprise Automation |
提出基于终端的Agent,用于企业自动化任务,性能优于复杂Agent系统 |
foundation model |
|
|