cs.CL（2025-09-04）

📊 共 29 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (28 🔗9) 支柱二：RL算法与架构 (RL & Architecture) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (28 篇)

#	题目	一句话要点	标签	🔗
1	Sample-efficient Integration of New Modalities into Large Language Models	提出SEMI方法，高效地将新模态集成到大型语言模型中	large language model foundation model multimodal
2	Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models	提出MedRevQA和MedChangeQA数据集，评估大语言模型对过时医学知识的记忆	large language model
3	RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models	提出RTQA框架，利用大语言模型递归推理解决复杂时序知识图谱问答难题。	large language model	✅
4	SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning	SPFT-SQL：通过自博弈微调增强大型语言模型在Text-to-SQL解析任务中的性能	large language model
5	Quantized Large Language Models in Biomedical Natural Language Processing: Evaluation and Recommendation	量化LLM实现生物医学NLP：评估与推荐，降低部署成本	large language model
6	A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models	综述性研究：全面评估大型语言模型推理过程中的可信度	large language model	✅
7	Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects	综述语音讽刺识别：多模态融合、挑战与未来展望	multimodal
8	CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking	CANDY：评估大语言模型在中文虚假信息核查中的局限性与辅助潜力	large language model chain-of-thought	✅
9	Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought	提出矩阵思维（MoT）框架，提升LLM在复杂推理任务中的效率与准确性	large language model chain-of-thought	✅
10	Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?	提出Inverse IFEval基准，评估LLM克服训练偏差并遵循反常指令的能力	large language model instruction following
11	Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning	提出SSMR-Bench：合成乐谱推理问题，提升AI音乐家能力	large language model multimodal
12	MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation	MobileRAG：通过检索增强生成提升移动代理性能，解决任务错误、环境交互和记忆缺失问题。	large language model	✅
13	Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling	提出KPoEM数据集以解决现代韩诗情感分析问题	large language model
14	ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs	ODKE+：利用LLM和本体指导的开放域知识抽取系统，实现大规模高精度知识图谱构建。	large language model
15	Cross-Layer Attention Probing for Fine-Grained Hallucination Detection	提出跨层注意力探测(CLAP)技术，用于细粒度地检测大型语言模型中的幻觉现象。	large language model
16	MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions	MAGneT：协同多智能体生成合成多轮心理健康咨询对话，解决高质量数据稀缺问题。	large language model
17	On Robustness and Reliability of Benchmark-Based Evaluation of LLMs	评估LLM基准测试的鲁棒性和可靠性：探究语言变异对模型性能的影响	large language model
18	VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents	提出VoxRole：用于评估语音角色扮演代理的综合基准	large language model
19	SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment	提出SelfAug，通过自对齐分布缓解RAG中灾难性遗忘问题	large language model	✅
20	SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation	提出SiLVERScore，用于语义感知的、基于嵌入的、手语生成评估方法。	multimodal
21	OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics	OleSpeech-IV：一个大规模、多说话人、多语种、主题丰富的会话语音数据集	TAMP
22	Why Language Models Hallucinate	揭示语言模型幻觉根源：训练与评估机制偏差导致模型倾向于猜测而非承认不确定性	large language model
23	Explicit and Implicit Data Augmentation for Social Event Detection	提出SED-Aug框架，结合显式文本增强和隐式特征增强，提升社交事件检测性能。	large language model	✅
24	Towards Stable and Personalised Profiles for Lexical Alignment in Spoken Human-Agent Dialogue	构建稳定且个性化的词汇配置文件，为口语人机对话中的词汇对齐奠定基础	large language model
25	Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries	Iti-Validator：用于验证和修正LLM生成行程的保障框架	large language model
26	False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize	揭示基于探针的恶意输入检测方法泛化性不足的根本原因	large language model	✅
27	Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth	Drivelology：构建多语言“深度胡说”数据集，挑战LLM的语用理解能力	large language model
28	Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain	评估检索增强生成在医疗领域对抗性证据下的鲁棒性	large language model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs	构建安全LLM：提出基于提示攻击的威胁模型	distillation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2025-09-04）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (28 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册