cs.CL(2025-09-26)

📊 共 76 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (59 🔗9) 支柱二:RL算法与架构 (RL & Architecture) (14 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (59 篇)

#题目一句话要点标签🔗
1 CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models CHRONOBERG:构建时序语料库,提升大语言模型对语言演变和时间感知的理解。 large language model foundation model
2 Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data 提出复合推理(CR)方法,提升大语言模型在少数据下的复杂问题求解能力 large language model chain-of-thought
3 R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning 提出R-Capsule以提高大语言模型推理效率 large language model chain-of-thought
4 Why Chain of Thought Fails in Clinical Text Understanding 系统研究链式思维在临床文本理解中的局限性 large language model chain-of-thought
5 Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning 评估大语言模型在多语言法律推理中的局限性 large language model
6 Large language models management of medications: three performance analyses 评估大型语言模型在药物管理任务中的性能表现,揭示其局限性 large language model
7 Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models 评估论证型大语言模型中不确定性量化方法的有效性 large language model
8 Detecting (Un)answerability in Large Language Models with Linear Directions 利用线性方向检测大语言模型在抽取式问答中的不可回答性 large language model
9 Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models 利用大型语言模型进行风力涡轮机维护日志的探索性语义可靠性分析 large language model
10 The Outputs of Large Language Models are Meaningless 论证大型语言模型输出的无意义性,并探讨其表象意义的来源 large language model
11 From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement 提出MACC框架,通过多轮细化自适应压缩CoT,提升推理效率与准确率。 chain-of-thought
12 FoodSEM: Large Language Model Specialized in Food Named-Entity Linking FoodSEM:针对食品命名实体链接微调的大型语言模型 large language model
13 Evaluating Open-Source Large Language Models for Technical Telecom Question Answering 评估开源大语言模型在电信技术问答中的性能 large language model
14 Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration 提出ThaiFACTUAL框架,解决泰语政治立场检测中大语言模型的偏见问题 large language model
15 SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models 提出SBFA:单比特翻转攻击破解大语言模型,揭示严重安全隐患 large language model
16 Navigating the Impact of Structured Output Format on Large Language Models through the Compass of Causal Inference 利用因果推断分析结构化输出格式对大语言模型的影响 large language model
17 ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning 提出ADAM框架,用于评估和提升LLM在人物传记推理中的能力 large language model multimodal
18 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing VoiceAssistant-Eval:一个综合性的AI助手评测基准,覆盖听觉、语音和视觉能力 large language model multimodal
19 RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media 提出RedNote-Vibe数据集以分析社交媒体上AI生成文本的动态特征 large language model TAMP
20 Human Mobility Datasets Enriched With Contextual and Social Dimensions 提出一种结合上下文、社交维度和LLM生成数据的城市人群移动数据集构建方法。 large language model multimodal
21 AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering 提出AMANDA,利用LLM Agent进行医学知识增强,解决Med-VQA在低资源下的推理瓶颈。 large language model multimodal
22 Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs 利用Prompt模拟知识截断评估LLM的时间感知能力与遗忘效果 large language model
23 Representing LLMs in Prompt Semantic Task Space 提出一种免训练方法,将LLM表示为提示语义任务空间中的线性算子,用于模型选择。 large language model
24 Transformers Can Learn Connectivity in Some Graphs but Not Others 研究表明Transformer在网格状图上学习连通性,但在复杂图上存在困难 large language model
25 AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts 提出AI Brown和AI Koditex,用于对比人类文本与LLM生成文本的英语和捷克语语料库。 large language model
26 InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models InfiR2:面向推理增强语言模型的全面FP8训练方案 large language model
27 FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory FormalML:用于评估机器学习理论中形式化子目标补全的基准测试 large language model
28 What Is The Political Content in LLMs' Pre- and Post-Training Data? 分析LLM训练数据中的政治倾向,揭示模型偏见与数据偏差的相关性 large language model
29 The Bias is in the Details: An Assessment of Cognitive Bias in LLMs 评估LLM认知偏差:揭示模型在决策中存在的系统性偏差 large language model
30 Towards Generalizable Implicit In-Context Learning with Attention Routing 提出In-Context Routing (ICR),通过注意力路由实现通用隐式上下文学习。 large language model
31 ArabJobs: A Multinational Corpus of Arabic Job Ads ArabJobs:一个多国阿拉伯语招聘广告语料库,用于公平感知的阿拉伯语NLP和劳动力市场研究。 large language model
32 We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong 提出自适应多分支引导以解决大型语言模型对齐问题 large language model
33 The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling 构建InviTE语料库,用于计算建模都铎时期英语文本中的宗教谩骂 large language model
34 Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs 提出SSKG-LLM,通过结构化图编码和自适应空间对齐缓解LLM幻觉问题 large language model
35 Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance 提出Safety Compliance框架,通过法律合规视角提升LLM安全性 large language model
36 FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction FLEXI:首个全双工人机语音交互评测基准,关注紧急情况下的模型中断 large language model
37 FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding FeatBench:提出用于评估代码智能体在振动编码中特征实现能力的新基准 large language model
38 Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation 提出递归主题划分(RTP),利用LLM构建可解释主题树,实现文本聚类和可控生成。 large language model
39 Library Hallucinations in LLMs: Risk Analysis Grounded in Developer Queries 系统性分析开发者查询对LLM代码库幻觉的影响,揭示潜在安全风险。 large language model
40 Mixture of Detectors: A Compact View of Machine-Generated Text Detection 提出混合检测器方法,用于全面评估和提升机器生成文本检测能力。 large language model
41 Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM 提出Uni-LAP框架,通过监督分类模型与LLM紧密协作,实现通用法律条文预测。 large language model
42 Think Right, Not More: Test-Time Scaling for Numerical Claim Verification 提出VERIFIERFC模型,通过测试时缩放提升LLM在数值声明验证中的性能 large language model
43 COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning 提出COSPADI,通过校准引导的稀疏字典学习压缩LLM,提升压缩性能。 large language model
44 Fine-tuning Done Right in Model Editing 重塑微调在模型编辑中的地位:提出LocFT-BF大幅超越现有方法 large language model
45 Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias 提出语音延续任务以探测语音模型偏见 foundation model
46 Black-Box Hallucination Detection via Consistency Under the Uncertain Expression 提出黑箱幻觉检测方法以解决语言模型生成虚假信息问题 large language model
47 MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation MotivGraph-SoIQ:融合动机知识图谱与苏格拉底式对话,增强LLM学术创意生成 large language model
48 SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation SimulSense:通过感知驱动的口译实现高效同声语音翻译 large language model
49 A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs 提出土耳其语引文意图分类数据集与框架,利用LLM和DSPy实现91.3%的准确率。 large language model
50 AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans AgentPack:一个由智能体与人类共同编写的代码变更数据集,用于提升代码编辑模型性能。 large language model
51 LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals LUMINA:利用上下文-知识信号检测RAG系统中的幻觉问题 large language model
52 Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations 提出LoRAN:通过结构化非线性变换增强低秩自适应能力 large language model
53 What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness 利用LLM Agent模拟提升应急预案有效性:一项迭代设计实践 large language model
54 Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models 提出TRACE框架,通过多Agent模型结构化路径生成共情回复,提升分析深度和生成流畅性。 large language model
55 Collaborative and Proactive Management of Task-Oriented Conversations 提出一种基于信息状态的协作式任务导向对话管理模型,提升对话成功率。 large language model
56 Can LLMs Solve and Generate Linguistic Olympiad Puzzles? 利用大型语言模型解决并生成语言学奥林匹克竞赛题 large language model
57 Self-Speculative Biased Decoding for Faster Live Translation 提出自推测偏置解码,加速低延迟直播翻译,无需额外模型。 large language model
58 ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation 提出ProPerSim框架,通过用户-助手模拟开发主动式个性化AI助手 large language model
59 Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval Think-on-Graph 3.0:通过多智能体双重演化上下文检索,实现异构图上高效自适应的LLM推理 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
60 Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving 探索LLM解题方案发散性及其对问题解决能力的影响 reinforcement learning large language model
61 EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation EditGRPO:结合后验编辑的强化学习,用于临床准确的胸部X光报告生成 reinforcement learning large language model multimodal
62 AutoSCORE: Enhancing Automated Scoring with Multi-Agent Large Language Models via Structured Component Recognition 提出AutoSCORE以解决自动评分中的准确性和可解释性问题 MAE large language model
63 QoNext: Towards Next-generation QoE for Foundation Models QoNext:面向大模型交互体验的下一代QoE评估框架 predictive model foundation model
64 ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models ResT:重塑Token级策略梯度,提升LLM工具使用能力 reinforcement learning large language model
65 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning WebGen-Agent:通过多层次反馈和步级强化学习增强交互式网站生成 reinforcement learning large language model
66 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping 提出RL-ZVP算法,利用大语言模型强化学习中零方差提示提升数学推理能力。 reinforcement learning large language model
67 Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning 提出Critique-Coder,通过批判强化学习提升代码生成模型性能 reinforcement learning distillation
68 ML2B: Multi-Lingual ML Benchmark For AutoML ML2B:首个用于AutoML的多语言机器学习基准测试,填补非英语ML代码生成评估空白。 representation learning large language model
69 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance 研究推理能力对大语言模型性能的影响,揭示其在不同任务和模型规模下的有效性。 distillation large language model
70 Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding 提出Group Tree Optimization,解决推测解码中草稿策略不对齐问题,提升LLM推理速度。 PPO large language model
71 S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models 提出S2J方法,弥合生成式奖励模型中求解能力与判断能力之间的差距 distillation large language model
72 Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment 提出CARB基准并改进奖励模型,提升LLM文化感知对齐能力 reinforcement learning large language model
73 StateX: Enhancing RNN Recall via Post-training State Expansion StateX:通过后训练状态扩展增强RNN的召回能力 state space model linear attention

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
74 Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models 提出Thinking-with-Sound框架,增强LALM在复杂声学场景下的多模态推理能力 manipulation multimodal chain-of-thought
75 ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents ChatInject:利用聊天模板在LLM Agent中进行提示注入攻击 manipulation large language model instruction following

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
76 Towards Minimal Causal Representations for Human Multimodal Language Understanding 提出Causal Multimodal Information Bottleneck以解决多模态语言理解中的偏差问题 HuMoR multimodal

⬅️ 返回 cs.CL 首页 · 🏠 返回主页