cs.CL(2025-09-04)

📊 共 29 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (28 🔗9) 支柱二:RL算法与架构 (RL & Architecture) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (28 篇)

#题目一句话要点标签🔗
1 Sample-efficient Integration of New Modalities into Large Language Models 提出SEMI方法,高效地将新模态集成到大型语言模型中 large language model foundation model multimodal
2 Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models 提出MedRevQA和MedChangeQA数据集,评估大语言模型对过时医学知识的记忆 large language model
3 RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models 提出RTQA框架,利用大语言模型递归推理解决复杂时序知识图谱问答难题。 large language model
4 SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning SPFT-SQL:通过自博弈微调增强大型语言模型在Text-to-SQL解析任务中的性能 large language model
5 Quantized Large Language Models in Biomedical Natural Language Processing: Evaluation and Recommendation 量化LLM实现生物医学NLP:评估与推荐,降低部署成本 large language model
6 A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models 综述性研究:全面评估大型语言模型推理过程中的可信度 large language model
7 Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects 综述语音讽刺识别:多模态融合、挑战与未来展望 multimodal
8 CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking CANDY:评估大语言模型在中文虚假信息核查中的局限性与辅助潜力 large language model chain-of-thought
9 Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought 提出矩阵思维(MoT)框架,提升LLM在复杂推理任务中的效率与准确性 large language model chain-of-thought
10 Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? 提出Inverse IFEval基准,评估LLM克服训练偏差并遵循反常指令的能力 large language model instruction following
11 Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning 提出SSMR-Bench:合成乐谱推理问题,提升AI音乐家能力 large language model multimodal
12 MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation MobileRAG:通过检索增强生成提升移动代理性能,解决任务错误、环境交互和记忆缺失问题。 large language model
13 Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling 提出KPoEM数据集以解决现代韩诗情感分析问题 large language model
14 ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs ODKE+:利用LLM和本体指导的开放域知识抽取系统,实现大规模高精度知识图谱构建。 large language model
15 Cross-Layer Attention Probing for Fine-Grained Hallucination Detection 提出跨层注意力探测(CLAP)技术,用于细粒度地检测大型语言模型中的幻觉现象。 large language model
16 MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions MAGneT:协同多智能体生成合成多轮心理健康咨询对话,解决高质量数据稀缺问题。 large language model
17 On Robustness and Reliability of Benchmark-Based Evaluation of LLMs 评估LLM基准测试的鲁棒性和可靠性:探究语言变异对模型性能的影响 large language model
18 VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents 提出VoxRole:用于评估语音角色扮演代理的综合基准 large language model
19 SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment 提出SelfAug,通过自对齐分布缓解RAG中灾难性遗忘问题 large language model
20 SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation 提出SiLVERScore,用于语义感知的、基于嵌入的、手语生成评估方法。 multimodal
21 OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics OleSpeech-IV:一个大规模、多说话人、多语种、主题丰富的会话语音数据集 TAMP
22 Why Language Models Hallucinate 揭示语言模型幻觉根源:训练与评估机制偏差导致模型倾向于猜测而非承认不确定性 large language model
23 Explicit and Implicit Data Augmentation for Social Event Detection 提出SED-Aug框架,结合显式文本增强和隐式特征增强,提升社交事件检测性能。 large language model
24 Towards Stable and Personalised Profiles for Lexical Alignment in Spoken Human-Agent Dialogue 构建稳定且个性化的词汇配置文件,为口语人机对话中的词汇对齐奠定基础 large language model
25 Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries Iti-Validator:用于验证和修正LLM生成行程的保障框架 large language model
26 False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize 揭示基于探针的恶意输入检测方法泛化性不足的根本原因 large language model
27 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth Drivelology:构建多语言“深度胡说”数据集,挑战LLM的语用理解能力 large language model
28 Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain 评估检索增强生成在医疗领域对抗性证据下的鲁棒性 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
29 Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs 构建安全LLM:提出基于提示攻击的威胁模型 distillation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页