cs.CL(2026-04-07)

📊 共 104 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (85 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (15 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (85 篇)

#题目一句话要点标签🔗
1 Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models 揭示大型语言模型对美式英语的结构性偏见,提出DiAlign方法进行量化分析。 large language model foundation model
2 In-Context Watermarks for Large Language Models 提出In-Context Watermarking,通过提示工程实现大语言模型生成文本溯源,解决模型不可访问场景下的水印问题。 large language model instruction following
3 Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation 提出MoRE框架,通过混合检索专家解决多模态大语言模型中的幻觉问题。 large language model multimodal
4 Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity 评估多模态LLM在政治视频情感分析中的可靠性,揭示实验室与实际场景的性能差距及性别偏见。 large language model multimodal
5 Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity 提出AMuFC框架,自适应判断视觉证据必要性,提升多模态事实核查准确率 multimodal
6 Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models 提出因果图注意力网络(GCAN)框架,提升大语言模型的事实可靠性。 large language model
7 SPRIG: Improving Large Language Model Performance by System Prompt Optimization SPRIG:通过系统提示优化提升大语言模型性能 large language model
8 The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages 提出Thiomi数据集,用于低资源非洲语言的多模态学习 multimodal
9 LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties 综述性分析LLMs在医疗领域的应用与挑战,聚焦诊断与治疗功能。 large language model
10 SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression SoLA:利用软激活稀疏性和低秩分解实现大语言模型高效压缩 large language model
11 Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation 提出一种基于本体论的轻量级框架,用于对大型语言模型进行对话控制,实现可控生成。 large language model
12 Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not 研究表明大型语言模型在土耳其语歧义消解中,常识推理能力弱于人类。 large language model
13 MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models MegaFake:基于LLM生成假新闻的理论驱动型数据集,助力假新闻检测与治理 large language model
14 UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization UtilityMax Prompting:提出基于形式化语言的多目标大语言模型优化框架 large language model
15 MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification 提出MultiPress多智能体框架,用于可解释的多模态新闻分类。 multimodal
16 RUQuant: Towards Refining Uniform Quantization for Large Language Models RUQuant:通过优化均匀量化方案提升大语言模型压缩性能 large language model
17 Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations 利用大语言模型评估数字农业工具的数字包容性,加速并扩展评估流程。 large language model
18 Large Language Models are Algorithmically Blind 揭示大语言模型在算法推理上的根本缺陷:算法盲视 large language model
19 On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning 揭示长链思维微调中推理模式对泛化性能的影响,并提出分支过滤方法。 chain-of-thought
20 Informatics for Food Processing 提出FoodProX和多模态AI模型,提升食品加工评估的客观性和可扩展性 large language model multimodal
21 Self-Improving Pretraining: using post-trained models to pretrain better models 提出自提升预训练方法,利用后训练模型改进预训练阶段,提升模型安全性、事实性和推理能力。 large language model instruction following
22 POEMetric: The Last Stanza of Humanity POEMetric:首个诗歌评估框架,揭示LLM在诗歌创作中与人类的差距 large language model instruction following
23 PDF Retrieval Augmented Question Answering 提出基于RAG的PDF文档问答系统,增强多模态信息抽取能力 large language model multimodal
24 A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification 提出一种简单方法,利用语音token增强预训练语言模型,用于分类任务。 large language model multimodal
25 TriAttention: Efficient Long Reasoning with Trigonometric KV Compression TriAttention:利用三角函数KV压缩实现高效长程推理 large language model
26 LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering LangFIR:利用单语数据发现稀疏的语言特定特征,用于语言引导。 large language model
27 LightThinker++: From Reasoning Compression to Memory Management LightThinker++:通过显式自适应内存管理,提升LLM在复杂推理和Agent任务中的效率和性能。 large language model
28 SkillX: Automatically Constructing Skill Knowledge Bases for Agents SkillX:自动构建智能体技能知识库,提升泛化性和效率 large language model
29 Early Stopping for Large Reasoning Models via Confidence Dynamics 提出CoDE-Stop,利用置信度动态提前停止大型推理模型,提升效率。 chain-of-thought
30 Gaussian mixture models as a proxy for interacting language models 提出交互式高斯混合模型作为交互式语言模型的计算高效代理。 large language model
31 EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing EvoEdit:通过演化零空间对齐实现鲁棒高效的知识编辑 large language model
32 Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation 提出基于约束最大似然估计的LLM性能稳健认证方法 large language model
33 CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge CresOWLve:提出基于真实世界知识的创造性问题解决基准 large language model
34 Evolutionary Search for Automated Design of Uncertainty Quantification Methods 利用LLM驱动的进化搜索自动设计不确定性量化方法 large language model
35 Testing the Limits of Truth Directions in LLMs 揭示LLM中真值方向的局限性:层依赖、任务依赖与指令依赖 large language model
36 I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation I-CALM框架通过激励置信度感知的回避机制缓解LLM幻觉问题 large language model
37 Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs 通过计算审计揭示LLM在文化翻译而非文化思维上的局限性 large language model
38 Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations 提出基于动态系统的幻觉盆地框架,用于理解和控制大语言模型的幻觉问题 large language model
39 FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline 提出FURINA-Builder,通过可扩展的多智能体协作流程构建完全可定制的角色扮演基准。 large language model
40 A Linguistics-Aware LLM Watermarking via Syntactic Predictability 提出STELA:一种基于句法可预测性的语言学感知LLM水印方案 large language model
41 LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation 提出基于博弈论的LLM互评估框架,实现更符合人类判断的模型评估 large language model
42 Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation 深入研究LLM多语言反事实样本生成,揭示跨语言扰动的共性与局限 large language model
43 Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets 提出XTF框架,通过可解释的Token级噪声过滤提升LLM微调性能 large language model
44 Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations 提出文化对齐评估框架,揭示LLM中存在的西方中心文化偏差。 large language model
45 Researchers waste 80% of LLM annotation costs by classifying one text at a time 通过批量处理和变量堆叠,显著降低LLM文本分类标注成本,同时保持精度。 large language model
46 GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces 提出GeoBrowse地理定位基准,用于评估Agentic工具使用中的多模态推理能力。 multimodal
47 Emergent Inference-Time Semantic Contamination via In-Context Priming 提出基于上下文引导的推理时语义污染检测方法 large language model
48 Position: Logical Soundness is not a Reliable Criterion for Neurosymbolic Fact-Checking with LLMs 指出神经符号事实核查中逻辑可靠性作为唯一标准的局限性,并提倡利用LLM的人类推理能力。 large language model
49 Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Validation 提出ACE框架,通过自适应成本评估实现可靠的专利权利要求验证。 large language model
50 Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity 探索LLM协同创作中的“暗模式”,揭示其对人类创造力的潜在抑制 large language model
51 How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling 提出面向数学建模竞赛的LLM阶段性评估框架,揭示模型在执行层面的不足 large language model
52 Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection 提出RACE模型,用于细粒度区分LLM生成文本的不同类型,提升LLM监管精度。 large language model
53 BLADE: Better Language Answers through Dialogue and Explanations BLADE:通过对话和解释改进语言模型答案,促进主动学习 large language model
54 Align then Train: Efficient Retrieval Adapter Learning 提出ERA高效检索适配器,解决复杂查询下检索模型微调代价高昂的问题。 instruction following
55 Talk2AI: A Longitudinal Dataset of Human--AI Persuasive Conversations Talk2AI:一个用于研究人机说服对话的大规模纵向数据集 large language model
56 The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies PIMMUR原则:确保LLM社会集体行为模拟的有效性 large language model
57 ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation ProMediate:用于评估多方协商中主动代理的社会认知框架 large language model
58 BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding BLASST:通过Softmax阈值动态稀疏化Attention,加速长文本LLM推理。 large language model
59 From Chains to DAGs: Probing the Graph Structure of Reasoning in LLMs 提出Reasoning DAG Probing框架,探究LLM内部推理过程的图结构表示 large language model
60 Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale Sandpiper:编排式AI标注,助力大规模教育对话分析 large language model
61 ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving ICR-Drive:面向端到端语言驱动自动驾驶的指令反事实鲁棒性诊断框架 vision-language-action VLA foundation model
62 Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting 提出CoT2Edit,通过指令式思维链提示学习编辑知识,提升LLM泛化性和知识覆盖面 large language model chain-of-thought
63 The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models 揭示大语言模型中的表面顺从现象,诊断知识编辑的有效性 large language model
64 A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models 提出多阶段验证框架,用于大规模临床信息抽取,提升LLM应用的可信度。 large language model
65 Mechanistic Circuit-Based Knowledge Editing in Large Language Models 提出MCircKE,通过机制性回路编辑提升大语言模型知识更新中的多步推理能力 large language model
66 GenomeQA: Benchmarking General Large Language Models for Genome Sequence Understanding GenomeQA:评估通用大语言模型在基因组序列理解中的能力 large language model
67 EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents EpiBench:用于多模态Agent的多轮研究工作流评测基准 multimodal
68 Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA 提出数据驱动的函数调用改进方案,提升大语言模型在在线金融问答中的性能。 large language model
69 LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals 将LLM推理视为轨迹:揭示步骤特定表征几何与正确性信号,并实现推理过程干预。 large language model chain-of-thought
70 Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation 提出CARE框架,评估AI心理健康对话中治疗原则的遵循度,并构建FAITH-M基准。 large language model chain-of-thought
71 Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs 提出CrossOmni数据集,揭示并解决Omni-LLM跨模态共指对齐难题。 large language model multimodal
72 Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives 揭示LLM群体决策中社会动力学对客观性的负面影响 large language model
73 Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles 利用大型语言模型生成并评估基于心理测量学特征的生活故事 large language model
74 What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know" 提出知识加权微调方法,提升大语言模型识别未知问题的能力 large language model
75 Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue 提出Context-Agent,通过动态话语树解决非线性对话中上下文管理难题。 large language model
76 Exclusive Unlearning 提出独占式遗忘(Exclusive Unlearning)方法,提升大语言模型安全性。 large language model
77 From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection 研究表明,基于Outlines的约束解码在LLM自反思中会引发“结构滚雪球”现象。 large language model
78 BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs 提出BOSCH以解决大语言模型短上下文注意力头选择问题 large language model
79 Identifying Influential N-grams in Confidence Calibration via Regression Analysis 通过回归分析识别影响置信度校准的N-gram,提升大语言模型推理可靠性 large language model
80 See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs 提出LVSpec,通过视觉语义引导的松散推测解码加速视频LLM推理。 large language model
81 THIVLVC: Retrieval Augmented Dependency Parsing for Latin THIVLVC:提出检索增强的拉丁语依存句法分析方法,显著提升诗歌解析精度。 large language model
82 Content Fuzzing for Escaping Information Cocoons on Digital Social Media 提出ContentFuzz,通过内容模糊化突破社交媒体信息茧房 large language model
83 Multi-Drafter Speculative Decoding with Alignment Feedback 提出MetaSD框架,通过对齐反馈的多Drafter推测解码加速LLM推理。 large language model
84 Confidence Should Be Calibrated More Than One Turn Deep 提出MTCal和ConfChat,解决LLM多轮对话中置信度校准退化问题 large language model
85 Do Domain-specific Experts exist in MoE-based LLMs? 探索MoE-LLM领域专家存在性,提出无训练代价的DSMoE框架 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
86 Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison 对比分析小语言模型中的情绪表征提取与操控方法,揭示跨语言安全隐患。 RLHF motion representation
87 CAGMamba: Context-Aware Gated Cross-Modal Mamba Network for Multimodal Sentiment Analysis 提出CAGMamba,利用上下文感知门控跨模态Mamba网络进行多模态情感分析 Mamba multimodal
88 Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression CoT压缩降低推理成本,但会损害模型可信度,需同时优化效率与可信度。 DPO chain-of-thought
89 DARE: Diffusion Large Language Models Alignment and Reinforcement Executor DARE:用于扩散大语言模型对齐与强化学习的开源框架,加速后训练研究。 reinforcement learning large language model
90 AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference 提出AdaptFuse以解决大语言模型在用户交互中的推理问题 preference learning large language model
91 DeonticBench: A Benchmark for Reasoning over Rules 提出DeonticBench基准,用于评估LLM在复杂规则下的义务推理能力。 reinforcement learning large language model chain-of-thought
92 Individual and Combined Effects of English as a Second Language and Typos on LLM Performance 研究英语作为第二语言和拼写错误对LLM性能的综合影响,揭示真实场景下的性能退化 world model world models large language model
93 CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling CAWN:用于自回归语言建模的连续声波网络,突破Transformer长序列瓶颈。 SSM state space model large language model
94 Synthetic Sandbox for Training Machine Learning Engineering Agents 提出SandMLE框架,通过合成微型MLE环境,首次实现MLE领域的大规模在线强化学习。 reinforcement learning large language model
95 Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation 提出AsymGRPO以解决RLVR中的探索限制问题 reinforcement learning large language model
96 Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs 提出基于图的思维链剪枝方法,减少推理LLM中的冗余反思 DPO chain-of-thought
97 AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning AgentGL:利用强化学习驱动LLM在图结构上进行自主学习 reinforcement learning policy learning large language model
98 Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion 提出Attention Editing框架,用于跨架构注意力机制转换,提升长文本处理效率。 distillation feature matching large language model
99 Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning 提出KL优化微调框架,用于控制LLM多轮生成中的分布偏差 direct preference optimization large language model
100 Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification 提出Re-RIGHT框架,无需平行语料库即可实现多语言自适应文本简化。 reinforcement learning large language model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
101 Many Preferences, Few Policies: Towards Scalable Language Model Personalization 提出PALM算法,通过少量LLM组合实现大规模用户偏好个性化 HuMoR
102 "I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns? 提出MultiPun数据集,并探索视觉-语言模型在多模态双关语理解上的能力。 HuMoR multimodal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
103 VIGIL: An Extensible System for Real-Time Detection and Mitigation of Cognitive Bias Triggers VIGIL:首个实时检测和缓解认知偏差触发因素的可扩展浏览器扩展系统 manipulation
104 Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities CIVA:通过模拟LLM Agent社区,揭示人类价值观错位对群体行为的影响 manipulation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页