cs.CL（2026-04-07）

📊 共 104 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (85 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (15 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱一：机器人控制 (Robot Control) (2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (85 篇)

#	题目	一句话要点	标签	🔗
1	Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models	揭示大型语言模型对美式英语的结构性偏见，提出DiAlign方法进行量化分析。	large language model foundation model
2	In-Context Watermarks for Large Language Models	提出In-Context Watermarking，通过提示工程实现大语言模型生成文本溯源，解决模型不可访问场景下的水印问题。	large language model instruction following
3	Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation	提出MoRE框架，通过混合检索专家解决多模态大语言模型中的幻觉问题。	large language model multimodal
4	Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity	评估多模态LLM在政治视频情感分析中的可靠性，揭示实验室与实际场景的性能差距及性别偏见。	large language model multimodal
5	Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity	提出AMuFC框架，自适应判断视觉证据必要性，提升多模态事实核查准确率	multimodal
6	Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models	提出因果图注意力网络(GCAN)框架，提升大语言模型的事实可靠性。	large language model
7	SPRIG: Improving Large Language Model Performance by System Prompt Optimization	SPRIG：通过系统提示优化提升大语言模型性能	large language model
8	The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages	提出Thiomi数据集，用于低资源非洲语言的多模态学习	multimodal
9	LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties	综述性分析LLMs在医疗领域的应用与挑战，聚焦诊断与治疗功能。	large language model
10	SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression	SoLA：利用软激活稀疏性和低秩分解实现大语言模型高效压缩	large language model
11	Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation	提出一种基于本体论的轻量级框架，用于对大型语言模型进行对话控制，实现可控生成。	large language model
12	Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not	研究表明大型语言模型在土耳其语歧义消解中，常识推理能力弱于人类。	large language model
13	MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models	MegaFake：基于LLM生成假新闻的理论驱动型数据集，助力假新闻检测与治理	large language model
14	UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization	UtilityMax Prompting：提出基于形式化语言的多目标大语言模型优化框架	large language model
15	MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification	提出MultiPress多智能体框架，用于可解释的多模态新闻分类。	multimodal
16	RUQuant: Towards Refining Uniform Quantization for Large Language Models	RUQuant：通过优化均匀量化方案提升大语言模型压缩性能	large language model
17	Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations	利用大语言模型评估数字农业工具的数字包容性，加速并扩展评估流程。	large language model
18	Large Language Models are Algorithmically Blind	揭示大语言模型在算法推理上的根本缺陷：算法盲视	large language model
19	On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning	揭示长链思维微调中推理模式对泛化性能的影响，并提出分支过滤方法。	chain-of-thought
20	Informatics for Food Processing	提出FoodProX和多模态AI模型，提升食品加工评估的客观性和可扩展性	large language model multimodal
21	Self-Improving Pretraining: using post-trained models to pretrain better models	提出自提升预训练方法，利用后训练模型改进预训练阶段，提升模型安全性、事实性和推理能力。	large language model instruction following
22	POEMetric: The Last Stanza of Humanity	POEMetric：首个诗歌评估框架，揭示LLM在诗歌创作中与人类的差距	large language model instruction following
23	PDF Retrieval Augmented Question Answering	提出基于RAG的PDF文档问答系统，增强多模态信息抽取能力	large language model multimodal
24	A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification	提出一种简单方法，利用语音token增强预训练语言模型，用于分类任务。	large language model multimodal
25	TriAttention: Efficient Long Reasoning with Trigonometric KV Compression	TriAttention：利用三角函数KV压缩实现高效长程推理	large language model
26	LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering	LangFIR：利用单语数据发现稀疏的语言特定特征，用于语言引导。	large language model
27	LightThinker++: From Reasoning Compression to Memory Management	LightThinker++：通过显式自适应内存管理，提升LLM在复杂推理和Agent任务中的效率和性能。	large language model
28	SkillX: Automatically Constructing Skill Knowledge Bases for Agents	SkillX：自动构建智能体技能知识库，提升泛化性和效率	large language model
29	Early Stopping for Large Reasoning Models via Confidence Dynamics	提出CoDE-Stop，利用置信度动态提前停止大型推理模型，提升效率。	chain-of-thought
30	Gaussian mixture models as a proxy for interacting language models	提出交互式高斯混合模型作为交互式语言模型的计算高效代理。	large language model
31	EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing	EvoEdit：通过演化零空间对齐实现鲁棒高效的知识编辑	large language model
32	Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation	提出基于约束最大似然估计的LLM性能稳健认证方法	large language model
33	CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge	CresOWLve：提出基于真实世界知识的创造性问题解决基准	large language model
34	Evolutionary Search for Automated Design of Uncertainty Quantification Methods	利用LLM驱动的进化搜索自动设计不确定性量化方法	large language model
35	Testing the Limits of Truth Directions in LLMs	揭示LLM中真值方向的局限性：层依赖、任务依赖与指令依赖	large language model
36	I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation	I-CALM框架通过激励置信度感知的回避机制缓解LLM幻觉问题	large language model
37	Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs	通过计算审计揭示LLM在文化翻译而非文化思维上的局限性	large language model
38	Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations	提出基于动态系统的幻觉盆地框架，用于理解和控制大语言模型的幻觉问题	large language model
39	FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline	提出FURINA-Builder，通过可扩展的多智能体协作流程构建完全可定制的角色扮演基准。	large language model
40	A Linguistics-Aware LLM Watermarking via Syntactic Predictability	提出STELA：一种基于句法可预测性的语言学感知LLM水印方案	large language model
41	LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation	提出基于博弈论的LLM互评估框架，实现更符合人类判断的模型评估	large language model
42	Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation	深入研究LLM多语言反事实样本生成，揭示跨语言扰动的共性与局限	large language model
43	Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets	提出XTF框架，通过可解释的Token级噪声过滤提升LLM微调性能	large language model
44	Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations	提出文化对齐评估框架，揭示LLM中存在的西方中心文化偏差。	large language model
45	Researchers waste 80% of LLM annotation costs by classifying one text at a time	通过批量处理和变量堆叠，显著降低LLM文本分类标注成本，同时保持精度。	large language model
46	GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces	提出GeoBrowse地理定位基准，用于评估Agentic工具使用中的多模态推理能力。	multimodal
47	Emergent Inference-Time Semantic Contamination via In-Context Priming	提出基于上下文引导的推理时语义污染检测方法	large language model
48	Position: Logical Soundness is not a Reliable Criterion for Neurosymbolic Fact-Checking with LLMs	指出神经符号事实核查中逻辑可靠性作为唯一标准的局限性，并提倡利用LLM的人类推理能力。	large language model
49	Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Validation	提出ACE框架，通过自适应成本评估实现可靠的专利权利要求验证。	large language model
50	Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity	探索LLM协同创作中的“暗模式”，揭示其对人类创造力的潜在抑制	large language model
51	How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling	提出面向数学建模竞赛的LLM阶段性评估框架，揭示模型在执行层面的不足	large language model
52	Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection	提出RACE模型，用于细粒度区分LLM生成文本的不同类型，提升LLM监管精度。	large language model
53	BLADE: Better Language Answers through Dialogue and Explanations	BLADE：通过对话和解释改进语言模型答案，促进主动学习	large language model
54	Align then Train: Efficient Retrieval Adapter Learning	提出ERA高效检索适配器，解决复杂查询下检索模型微调代价高昂的问题。	instruction following
55	Talk2AI: A Longitudinal Dataset of Human--AI Persuasive Conversations	Talk2AI：一个用于研究人机说服对话的大规模纵向数据集	large language model
56	The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies	PIMMUR原则：确保LLM社会集体行为模拟的有效性	large language model
57	ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation	ProMediate：用于评估多方协商中主动代理的社会认知框架	large language model
58	BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding	BLASST：通过Softmax阈值动态稀疏化Attention，加速长文本LLM推理。	large language model
59	From Chains to DAGs: Probing the Graph Structure of Reasoning in LLMs	提出Reasoning DAG Probing框架，探究LLM内部推理过程的图结构表示	large language model
60	Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale	Sandpiper：编排式AI标注，助力大规模教育对话分析	large language model
61	ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving	ICR-Drive：面向端到端语言驱动自动驾驶的指令反事实鲁棒性诊断框架	vision-language-action VLA foundation model
62	Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting	提出CoT2Edit，通过指令式思维链提示学习编辑知识，提升LLM泛化性和知识覆盖面	large language model chain-of-thought	✅
63	The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models	揭示大语言模型中的表面顺从现象，诊断知识编辑的有效性	large language model	✅
64	A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models	提出多阶段验证框架，用于大规模临床信息抽取，提升LLM应用的可信度。	large language model
65	Mechanistic Circuit-Based Knowledge Editing in Large Language Models	提出MCircKE，通过机制性回路编辑提升大语言模型知识更新中的多步推理能力	large language model
66	GenomeQA: Benchmarking General Large Language Models for Genome Sequence Understanding	GenomeQA：评估通用大语言模型在基因组序列理解中的能力	large language model
67	EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents	EpiBench：用于多模态Agent的多轮研究工作流评测基准	multimodal
68	Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA	提出数据驱动的函数调用改进方案，提升大语言模型在在线金融问答中的性能。	large language model
69	LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals	将LLM推理视为轨迹：揭示步骤特定表征几何与正确性信号，并实现推理过程干预。	large language model chain-of-thought
70	Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation	提出CARE框架，评估AI心理健康对话中治疗原则的遵循度，并构建FAITH-M基准。	large language model chain-of-thought
71	Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs	提出CrossOmni数据集，揭示并解决Omni-LLM跨模态共指对齐难题。	large language model multimodal
72	Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives	揭示LLM群体决策中社会动力学对客观性的负面影响	large language model
73	Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles	利用大型语言模型生成并评估基于心理测量学特征的生活故事	large language model
74	What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know"	提出知识加权微调方法，提升大语言模型识别未知问题的能力	large language model
75	Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue	提出Context-Agent，通过动态话语树解决非线性对话中上下文管理难题。	large language model
76	Exclusive Unlearning	提出独占式遗忘(Exclusive Unlearning)方法，提升大语言模型安全性。	large language model
77	From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection	研究表明，基于Outlines的约束解码在LLM自反思中会引发“结构滚雪球”现象。	large language model	✅
78	BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs	提出BOSCH以解决大语言模型短上下文注意力头选择问题	large language model
79	Identifying Influential N-grams in Confidence Calibration via Regression Analysis	通过回归分析识别影响置信度校准的N-gram，提升大语言模型推理可靠性	large language model
80	See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs	提出LVSpec，通过视觉语义引导的松散推测解码加速视频LLM推理。	large language model
81	THIVLVC: Retrieval Augmented Dependency Parsing for Latin	THIVLVC：提出检索增强的拉丁语依存句法分析方法，显著提升诗歌解析精度。	large language model
82	Content Fuzzing for Escaping Information Cocoons on Digital Social Media	提出ContentFuzz，通过内容模糊化突破社交媒体信息茧房	large language model
83	Multi-Drafter Speculative Decoding with Alignment Feedback	提出MetaSD框架，通过对齐反馈的多Drafter推测解码加速LLM推理。	large language model
84	Confidence Should Be Calibrated More Than One Turn Deep	提出MTCal和ConfChat，解决LLM多轮对话中置信度校准退化问题	large language model
85	Do Domain-specific Experts exist in MoE-based LLMs?	探索MoE-LLM领域专家存在性，提出无训练代价的DSMoE框架	large language model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (15 篇)

#	题目	一句话要点	标签	🔗
86	Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison	对比分析小语言模型中的情绪表征提取与操控方法，揭示跨语言安全隐患。	RLHF motion representation
87	CAGMamba: Context-Aware Gated Cross-Modal Mamba Network for Multimodal Sentiment Analysis	提出CAGMamba，利用上下文感知门控跨模态Mamba网络进行多模态情感分析	Mamba multimodal
88	Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression	CoT压缩降低推理成本，但会损害模型可信度，需同时优化效率与可信度。	DPO chain-of-thought
89	DARE: Diffusion Large Language Models Alignment and Reinforcement Executor	DARE：用于扩散大语言模型对齐与强化学习的开源框架，加速后训练研究。	reinforcement learning large language model
90	AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference	提出AdaptFuse以解决大语言模型在用户交互中的推理问题	preference learning large language model
91	DeonticBench: A Benchmark for Reasoning over Rules	提出DeonticBench基准，用于评估LLM在复杂规则下的义务推理能力。	reinforcement learning large language model chain-of-thought
92	Individual and Combined Effects of English as a Second Language and Typos on LLM Performance	研究英语作为第二语言和拼写错误对LLM性能的综合影响，揭示真实场景下的性能退化	world model world models large language model
93	CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling	CAWN：用于自回归语言建模的连续声波网络，突破Transformer长序列瓶颈。	SSM state space model large language model
94	Synthetic Sandbox for Training Machine Learning Engineering Agents	提出SandMLE框架，通过合成微型MLE环境，首次实现MLE领域的大规模在线强化学习。	reinforcement learning large language model
95	Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation	提出AsymGRPO以解决RLVR中的探索限制问题	reinforcement learning large language model
96	Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs	提出基于图的思维链剪枝方法，减少推理LLM中的冗余反思	DPO chain-of-thought
97	AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning	AgentGL：利用强化学习驱动LLM在图结构上进行自主学习	reinforcement learning policy learning large language model	✅
98	Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion	提出Attention Editing框架，用于跨架构注意力机制转换，提升长文本处理效率。	distillation feature matching large language model
99	Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning	提出KL优化微调框架，用于控制LLM多轮生成中的分布偏差	direct preference optimization large language model
100	Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification	提出Re-RIGHT框架，无需平行语料库即可实现多语言自适应文本简化。	reinforcement learning large language model

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
101	Many Preferences, Few Policies: Towards Scalable Language Model Personalization	提出PALM算法，通过少量LLM组合实现大规模用户偏好个性化	HuMoR
102	"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?	提出MultiPun数据集，并探索视觉-语言模型在多模态双关语理解上的能力。	HuMoR multimodal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
103	VIGIL: An Extensible System for Real-Time Detection and Mitigation of Cognitive Bias Triggers	VIGIL：首个实时检测和缓解认知偏差触发因素的可扩展浏览器扩展系统	manipulation
104	Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities	CIVA：通过模拟LLM Agent社区，揭示人类价值观错位对群体行为的影响	manipulation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2026-04-07）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (85 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (15 篇)

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理