| 1 |
A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery and Clinical Prediction |
提出STORM,一个用于生物发现和临床预测的空间转录组学和组织学多模态基础模型 |
foundation model multimodal |
|
|
| 2 |
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning |
FeynmanBench:用于评估多模态LLM在费曼图推理能力上的基准测试 |
large language model multimodal |
|
|
| 3 |
Don't Blink: Evidence Collapse during Multimodal Reasoning |
揭示多模态推理中证据崩塌现象,提出任务感知视觉否决策略 |
multimodal visual grounding |
|
|
| 4 |
Large Language Models Align with the Human Brain during Creative Thinking |
探讨大语言模型与人脑在创造性思维中的一致性 |
large language model chain-of-thought |
|
|
| 5 |
Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior |
提出基于LLM的零样本多模态课堂行为分析框架,无需存储原始视频。 |
multimodal |
|
|
| 6 |
BAAI Cardiac Agent: An intelligent multimodal agent for automated reasoning and diagnosis of cardiovascular diseases from cardiac magnetic resonance imaging |
BAAI Cardiac Agent:用于心血管疾病自动推理与诊断的多模态智能体 |
multimodal |
|
|
| 7 |
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models |
Cite Pretrain:无需检索的大语言模型知识归属方法 |
large language model |
|
|
| 8 |
Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models |
利用大语言模型实现实验室仪器全自动控制 |
large language model |
|
|
| 9 |
TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering |
TABQAWORLD:优化多模态推理,提升多轮表格问答性能 |
multimodal |
|
|
| 10 |
Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge |
提出基于模糊层次分析法和DualJudge的LLM结构化多标准评估方法 |
large language model |
|
|
| 11 |
PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage |
PolySwarm:用于预测市场交易和延迟套利的LLM多智能体框架 |
large language model |
|
|
| 12 |
The Topology of Multimodal Fusion: Why Current Architectures Fail at Creative Cognition |
揭示多模态融合架构的拓扑局限性,提出基于神经ODE的拓扑正则化方法以提升创造性认知能力。 |
multimodal |
|
|
| 13 |
CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering |
CREBench:评估大型语言模型在密码二进制逆向工程中的能力 |
large language model |
|
|
| 14 |
AutoReSpec: A Framework for Generating Specification using Large Language Models |
AutoReSpec:利用大语言模型自动生成可验证规约的协同框架 |
large language model |
|
|
| 15 |
Enhancing behavioral nudges with large language model-based iterative personalization: A field experiment on electricity and hot-water conservation |
利用大语言模型提升行为干预的个性化效果 |
large language model |
|
|
| 16 |
Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework |
提出结构化提示框架,增强LLM在安全分析中类人CoT推理的完整性 |
chain-of-thought |
|
|
| 17 |
Large Language Models for Combinatorial Optimization of Design Structure Matrix |
提出基于大语言模型的DSM重排序优化方法,提升复杂工程系统模块化效率 |
large language model |
|
|
| 18 |
TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables |
TableVision:一个大规模表格基准,用于复杂分层表格上的空间推理。 |
large language model multimodal |
|
|
| 19 |
InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories |
提出InsTraj以解决GPS轨迹生成的语义理解与约束问题 |
large language model multimodal |
|
|
| 20 |
MolDA: Molecular Understanding and Generation via Large Language Diffusion Model |
MolDA:提出基于扩散语言模型的新型分子理解与生成框架,解决自回归模型的局限性。 |
large language model multimodal |
|
|
| 21 |
From Paper to Program: A Multi-Stage LLM-Assisted Workflow for Accelerating Quantum Many-Body Algorithm Development |
提出多阶段LLM辅助工作流以加速量子多体算法开发 |
large language model foundation model |
|
|
| 22 |
Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction |
提出五种Prompt工程策略,提升工业场景LLM输出的稳定性和可靠性,减少幻觉。 |
large language model foundation model |
|
|
| 23 |
Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents |
提出TRACE-KG以解决知识图谱构建中的模式依赖问题 |
multimodal |
|
|
| 24 |
Combee: Scaling Prompt Learning for Self-Improving Language Model Agents |
Combee:扩展Prompt学习,实现自提升语言模型Agent |
large language model |
|
|
| 25 |
Soft Tournament Equilibrium |
提出软锦标赛均衡(STE)框架,用于解决通用智能体评估中的循环依赖问题。 |
large language model |
|
|
| 26 |
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference |
ShadowNPU:面向NPU的片上LLM推理系统与算法协同设计 |
large language model |
|
|
| 27 |
A Firefly Algorithm for Mixed-Variable Optimization Based on Hybrid Distance Modeling |
提出基于混合距离建模的萤火虫算法(FAmv)以解决混合变量优化问题 |
multimodal |
|
|
| 28 |
Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing |
提出基于基督教人类繁荣理解的AI评估框架 |
large language model |
|
|
| 29 |
Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away |
分析休谟因果判断理论,揭示贝叶斯形式化忽略的表征条件 |
large language model |
|
|
| 30 |
Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach |
提出基于活动分类法的LLM自动分析框架,用于评估全球AI安全倡议的政策文件。 |
large language model |
|
|
| 31 |
When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression |
通过图视角分析LLM推理幻觉的产生机制:路径复用与路径压缩 |
large language model |
|
|
| 32 |
Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors |
提出基于VLM的多智能体系统,用于自动化分析屏幕协作学习行为 |
multimodal |
|
|
| 33 |
Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization |
提出确定性Agent平台,解决生成引擎优化中RAG的幻觉和零点击问题 |
large language model |
|
|
| 34 |
Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research |
QualAnalyzer:通过原子化LLM分析实现定性研究过程的可审计性 |
large language model |
|
|
| 35 |
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents |
提出Profile-Then-Reason框架,提升工具增强语言代理的效率与可靠性 |
large language model |
|
|
| 36 |
Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents |
LLM扑克Agent在动态交互中涌现类心智理论行为 |
large language model |
|
|
| 37 |
InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI |
InferenceEvolve:利用自进化AI实现因果效应估计器的自动发现与优化 |
large language model |
|
|
| 38 |
AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments |
提出AI Trust OS,实现企业环境中自治AI的可观测性和零信任合规的持续治理。 |
large language model |
|
|
| 39 |
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents |
MemMachine:一种面向个性化AI代理的、保留真实信息的记忆系统 |
large language model |
|
|
| 40 |
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification |
UVM^2:一种基于LLM的自动化UVM机器,用于RTL验证,显著提升验证效率。 |
large language model |
|
|
| 41 |
The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance |
揭示LLM解释在提升人机团队表现中的悖论 |
large language model |
|
|
| 42 |
FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification |
FVRuleLearner:提出基于算子推理树的规则学习框架,提升形式验证中SVA生成的正确性。 |
large language model |
|
|
| 43 |
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems |
提出LLMA-Mem框架,通过记忆增强提升LLM多智能体系统在长期任务中的性能和效率。 |
large language model |
|
|
| 44 |
Toward Executable Repository-Level Code Generation via Environment Alignment |
提出EnvGraph框架,通过环境对齐实现可执行的仓库级代码生成。 |
large language model |
|
|
| 45 |
Persistent Cross-Attempt State Optimization for Repository-Level Code Generation |
LiveCoder:通过跨尝试状态优化提升代码仓库级代码生成效果 |
large language model |
|
|
| 46 |
Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News |
研究表明人类难以区分LLM生成的新闻与人工撰写的新闻,用户侧检测防御不可行。 |
large language model |
|
|
| 47 |
CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks |
CoopGuard:基于合作代理的状态化防御框架,抵御LLM多轮对抗攻击 |
large language model |
|
|
| 48 |
Commercial Persuasion in AI-Mediated Conversations |
研究揭示LLM驱动的对话式AI在商业推广中存在隐蔽诱导用户选择的风险 |
large language model |
|
|
| 49 |
Similarity Field Theory: A Mathematical Framework for Intelligence |
提出相似性场论,为理解智能系统提供数学框架 |
large language model |
|
|
| 50 |
Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics |
基于LLM的自主智能体加速科学发现,实现科学家、语言、代码和物理的协同。 |
large language model |
|
|
| 51 |
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges |
针对Agentic AI的安全威胁,提出防御、评估方法与开放挑战 |
large language model |
|
|
| 52 |
An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models |
提出一种基于Agent的框架,用于自动验证数学优化模型。 |
large language model |
|
|
| 53 |
The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making |
大语言模型在规则约束决策中表现出对情感框架的鲁棒性,揭示了“鲁棒性悖论”。 |
large language model |
|
|
| 54 |
IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery |
提出IV Co-Scientist,利用多智能体LLM框架进行因果工具变量发现。 |
large language model |
|
|
| 55 |
Collective AI can amplify tiny perturbations into divergent decisions |
集体AI决策易受微小扰动影响,导致结果发散 |
large language model |
|
|
| 56 |
An Onto-Relational-Sophic Framework for Governing Synthetic Minds |
提出Onto-Relational-Sophic框架,用于治理通用人工智能。 |
foundation model |
|
|
| 57 |
ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE |
ATLAS:一种分层约束引导框架,用于LLM辅助的MDE中结构化工件生成 |
large language model |
|
|
| 58 |
Beyond Message Passing: A Semantic View of Agent Communication Protocols |
提出Agent通信协议的三层语义视角,揭示现有协议在语义层面的不足。 |
large language model |
|
|
| 59 |
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference |
HybridKV:面向高效多模态大语言模型推理的混合KV缓存压缩框架 |
large language model multimodal |
|
|
| 60 |
OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation |
OGA-AID:面向卒中康复的多模态步态分析临床医生辅助AI报告草拟系统 |
large language model multimodal |
|
|
| 61 |
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning |
提出熵趋势奖励ETR,提升思维链推理效率与准确率 |
large language model chain-of-thought |
✅ |
|
| 62 |
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments |
提出CritBench框架以评估IEC 61850数字变电站环境中的网络安全能力 |
large language model |
✅ |
|
| 63 |
Context-Value-Action Architecture for Value-Driven Large Language Model Agents |
提出CVA架构,通过解耦认知推理与行为生成,提升LLM Agent的价值对齐与行为可解释性。 |
large language model |
|
|
| 64 |
Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models |
提出JCQL框架,结合大语言模型和小语言模型联合完成知识库补全与问答任务 |
large language model |
|
|
| 65 |
JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models |
提出JTON:一种Token高效的JSON超集,采用Zen Grid表格编码,专为大型语言模型设计。 |
large language model |
|
|
| 66 |
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models |
CAKE:用于评估大语言模型云架构知识的基准测试 |
large language model |
|
|
| 67 |
QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis |
提出QA-MoE,通过质量感知的专家混合模型实现鲁棒的多模态情感分析 |
multimodal |
|
|
| 68 |
From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems |
ASTRAL:利用多模态LLM进行网络物理系统架构驱动的安全风险评估 |
multimodal |
|
|
| 69 |
From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement |
提出一种神经符号方法,用于在受监管的采购中验证投标文件的有效性。 |
large language model |
|
|
| 70 |
Experience Transfer for Multimodal LLM Agents in Minecraft Game |
提出Echo框架,提升多模态LLM Agent在Minecraft中经验迁移效率。 |
multimodal |
|
|
| 71 |
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition |
Market-Bench:构建经济贸易竞争基准,评估大语言模型在经济活动中的能力。 |
large language model |
|
|
| 72 |
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents |
Claw-Eval:提出可信的自主Agent评估基准,解决现有评估方法的局限性。 |
large language model multimodal |
|
|
| 73 |
LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering |
LLM4CodeRE:用于代码逆向工程的双向生成式AI框架 |
large language model |
|
|
| 74 |
How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism |
揭示大语言模型指令遵循机制:技能协调而非通用机制 |
instruction following |
|
|
| 75 |
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains |
Flowr:通过Agentic AI扩展大规模超市零售供应链运营 |
large language model |
|
|
| 76 |
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms |
提出MCPSHIELD框架,系统解决基于MCP的AI Agent安全威胁 |
large language model |
|
|
| 77 |
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring |
提出Deep Researcher Agent,实现零成本监控的深度学习实验全自动框架 |
large language model |
✅ |
|
| 78 |
Vision-Guided Iterative Refinement for Frontend Code Generation |
提出基于视觉反馈迭代优化的前端代码生成框架,提升代码质量。 |
large language model |
|
|
| 79 |
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT |
提出SemLink,利用Siamese Sentence-BERT实现高效的语义超链接自动测试。 |
large language model |
|
|
| 80 |
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge |
揭示人类与LLM信任评估中的标签效应,警惕偏见传播 |
large language model |
|
|
| 81 |
Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw |
提出Agentic AI取证分析框架,解决智能体系统取证难题。 |
large language model |
|
|
| 82 |
On the Role of Fault Localization Context for LLM-Based Program Repair |
研究故障定位上下文对基于LLM的程序修复的影响,揭示最佳上下文策略。 |
large language model |
|
|
| 83 |
LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency |
将LLM评估视为张量补全问题,提出低秩结构和半参数有效性分析方法 |
large language model |
|
|
| 84 |
MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library |
提出MA-IDS:一种基于多Agent RAG框架的物联网入侵检测系统,具备经验库。 |
large language model |
|
|
| 85 |
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use |
提出Back-Reveal以解决LLM代理数据泄露问题 |
large language model |
|
|
| 86 |
Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval |
提出DIN-Retrieval,通过领域不变神经元检索实现跨领域知识迁移,提升LLM推理能力。 |
large language model |
✅ |
|
| 87 |
TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems |
TFRBench:用于评估预测系统推理能力的新基准 |
foundation model |
|
|
| 88 |
TRACE: Capability-Targeted Agentic Training |
TRACE:面向能力的Agent训练,提升Agent在复杂环境中的任务解决能力 |
large language model |
|
|
| 89 |
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition |
通过奖励分解提出新方法以减少语言模型的谄媚行为 |
large language model |
|
|