cs.CL(2026-03-03)

📊 共 25 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (18 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (18 篇)

#题目一句话要点标签🔗
1 Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory 提出多模态项目反应理论(M3IRT)框架,用于评估多模态大语言模型的跨模态推理能力。 large language model multimodal
2 Real-Time Generation of Game Video Commentary with Multimodal LLMs: Pause-Aware Decoding Approaches 提出基于多模态LLM的暂停感知解码方法,实现游戏视频实时解说生成 large language model multimodal
3 How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities 提出SteerEval,用于多粒度评估大语言模型的可控性,揭示现有方法在细粒度控制上的不足。 large language model
4 TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health TrustMH-Bench:用于评估大语言模型在心理健康领域可信度的综合基准 large language model
5 TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models TAO-Attack:面向大语言模型的高级优化型越狱攻击方法 large language model
6 A Browser-based Open Source Assistant for Multimodal Content Verification 提出基于浏览器的开源多模态内容核查助手,辅助记者和事实核查人员快速验证数字媒体信息。 multimodal
7 OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets 通过大规模数据集,重新评估MLLM时代下文档信息抽取中OCR的必要性。 large language model multimodal
8 From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench KMP-Bench:评估LLM在K-8数学教学中教学智能的综合基准 large language model
9 GPUTOK: GPU Accelerated Byte Level BPE Tokenization GPUTOK:利用GPU加速字节级BPE分词,提升长文本处理效率 large language model
10 APRES: An Agentic Paper Revision and Evaluation System APRES:一种基于LLM的论文修订与评估系统,提升论文质量与影响力。 large language model
11 Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection 提出基于指令调优LLM的紧凑提示方法,用于联合论证成分检测。 large language model
12 Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs 知识追踪模型在教育预测任务中优于大型语言模型,更快速、经济、准确 large language model
13 Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration 提出DiSE,通过序列再生实现扩散语言模型的高效自评估。 large language model
14 ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs 提出基于规范化和确定性解析的推理方法,提升LLM在形式推理任务中的性能 large language model
15 Eval4Sim: An Evaluation Framework for Persona Simulation 提出Eval4Sim框架以解决对话模拟评估不足的问题 large language model
16 LaTeX Compilation: Challenges in the Era of LLMs 针对LLM时代TeX局限性,提出Mogan STEM编辑器以提升编译效率和LLM微调性能 large language model
17 Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models 提出跨模型家族推测预填充,利用小模型草稿实现免训练长文本压缩。 large language model
18 Think, But Don't Overthink: Reproducing Recursive Language Models 复现递归语言模型:过深递归导致模型“过度思考” large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
19 PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems PrivMedChat:面向医疗对话系统的端到端差分隐私强化学习与人类反馈对齐 reinforcement learning PPO RLHF
20 Sensory-Aware Sequential Recommendation via Review-Distilled Representations 提出基于感官属性的序列推荐框架以提升推荐效果 representation learning distillation large language model
21 MaBERT:A Padding Safe Interleaved Transformer Mamba Hybrid Encoder for Efficient Extended Context Masked Language Modeling MaBERT:一种Padding安全的交错Transformer-Mamba混合编码器,用于高效的扩展上下文掩码语言建模 Mamba state space model
22 Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization 提出Graph-GRPO,通过群组相对策略优化稳定多智能体拓扑学习。 reinforcement learning large language model
23 StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning StitchCUDA:一种基于规则的Agent强化学习的自动化多智能体端到端GPU编程框架 reinforcement learning
24 Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction 提出多智能体协作框架,解决零样本文档级事件论元抽取问题。 reinforcement learning reward design

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
25 Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? 提出Code2Math框架,利用代码智能体自主进化更具挑战性的数学问题 IMoS large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页