| 1 |
Session-Level Spoken Language Assessment with a Multimodal Foundation Model via Multi-Target Learning |
提出基于多模态基础模型和多目标学习的会话级口语评估方法 |
foundation model multimodal |
|
|
| 2 |
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering |
RephQA:评估大型语言模型在公共健康问答中的可读性,并提出优化策略。 |
large language model chain-of-thought |
|
|
| 3 |
Multi-Physics: A Comprehensive Benchmark for Multimodal LLMs Reasoning on Chinese Multi-Subject Physics Problems |
提出Multi-Physics:一个用于评估多模态LLM在中文物理问题上推理能力的综合基准。 |
multimodal chain-of-thought |
✅ |
|
| 4 |
Intrinsic Meets Extrinsic Fairness: Assessing the Downstream Impact of Bias Mitigation in Large Language Models |
研究LLM内外部公平性:偏差缓解对下游任务的影响评估 |
large language model |
|
|
| 5 |
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models |
提出基于稀疏自编码器的内部表征解学习方法,提升大语言模型的信息遗忘效果。 |
large language model |
|
|
| 6 |
Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets |
提出基于自构建知识三元组的大语言模型概念遗忘方法 |
large language model |
|
|
| 7 |
DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models |
DivLogicEval:用于评估大语言模型逻辑推理能力的新基准框架 |
large language model |
|
|
| 8 |
CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics |
CFDLLMBench:用于评估大语言模型在计算流体动力学中应用能力的基准套件 |
large language model |
✅ |
|
| 9 |
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models |
提出知识注入缩放律,高效指导大语言模型预训练中的领域知识注入 |
large language model |
|
|
| 10 |
Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining |
提出Climb框架,通过优化多语言数据分配提升大语言模型性能。 |
large language model |
|
|
| 11 |
'Rich Dad, Poor Lad': How do Large Language Models Contextualize Socioeconomic Factors in College Admission ? |
提出DPAF框架,揭示LLM在大学招生中对社会经济因素的偏见及推理机制 |
large language model |
|
|
| 12 |
Longitudinal and Multimodal Recording System to Capture Real-World Patient-Clinician Conversations for AI and Encounter Research: Protocol |
构建纵向多模态记录系统,捕捉真实医患对话,促进AI与诊疗研究 |
multimodal |
|
|
| 13 |
The Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis |
利用高性能GPU加速基于大语言模型的放射影像诊断 |
large language model |
|
|
| 14 |
The Curious Case of Visual Grounding: Different Effects for Speech- and Text-based Language Encoders |
研究视觉信息融入对语音和文本语言编码器内部表征的影响 |
visual grounding |
|
|
| 15 |
Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment |
微调大型多模态模型用于自动发音评估,提升细粒度评估能力 |
multimodal |
|
|
| 16 |
Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics |
评估LLM在辩论中的非线性推理能力:基于论证理论语义 |
large language model chain-of-thought |
|
|
| 17 |
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion |
VOX-KRIKRI:通过连续融合统一语音和语言 |
large language model multimodal |
|
|
| 18 |
BEFT: Bias-Efficient Fine-Tuning of Language Models |
BEFT:一种高效偏置项微调方法,提升语言模型在低数据场景下的性能。 |
large language model |
|
|
| 19 |
Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment |
提出基于不确定性校准的大语言模型,用于提升自动作文评分系统的可靠性。 |
large language model |
|
|
| 20 |
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans |
评估冲突对话中的行为对齐:LLM智能体与人类的多维度对比 |
large language model |
|
|
| 21 |
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation |
提出RPG,用于统一和可扩展的代码库生成 |
large language model |
|
|
| 22 |
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs |
提出 CultureScope,通过多维度文化知识分类评估LLM的文化理解能力。 |
large language model |
✅ |
|
| 23 |
Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech |
提出Think-Verbalize-Speak框架,解耦推理与口语表达,提升口语对话系统性能。 |
large language model |
✅ |
|
| 24 |
Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions |
提出FRAME框架和SCOPE协议,解决会议摘要生成中幻觉、遗漏和个性化问题 |
large language model |
|
|
| 25 |
Distribution-Aligned Decoding for Efficient LLM Task Adaptation |
提出SVDecode,通过分布对齐解码高效适应LLM下游任务 |
large language model |
|
|
| 26 |
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning |
提出Best-of-L跨语言奖励模型,提升多语言LLM在数学推理中的性能。 |
large language model |
|
|
| 27 |
Once Upon a Time: Interactive Learning for Storytelling with Small Language Models |
提出交互式学习框架,利用认知反馈提升小语言模型的故事生成能力。 |
large language model |
|
|
| 28 |
Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding |
提出流水线并行自推测解码PPSD,优化基于早退出的LLM推理。 |
large language model |
|
|
| 29 |
LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs |
LiteLong:一种资源高效的长文本数据合成方法,用于训练大型语言模型 |
large language model |
|
|
| 30 |
Computational Analysis of Conversation Dynamics through Participant Responsivity |
通过参与者响应性分析对话动态,量化对话质量并区分不同对话。 |
large language model |
|
|
| 31 |
Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval |
提出纯语义索引,解决LLM生成式推荐与检索中的语义ID冲突问题 |
large language model |
|
|
| 32 |
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval |
研究LLM数据增强在检索中的有效性和可扩展性,揭示最优增强策略。 |
large language model |
|
|
| 33 |
Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning |
提出ConTest-NLI基准,评估LLM在基于构式语法的NLI泛化能力 |
large language model |
|
|
| 34 |
CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion |
提出CodeRAG以解决代码补全中的知识检索问题 |
large language model |
✅ |
|
| 35 |
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression |
UniGist:面向通用和硬件对齐的序列级长上下文压缩框架 |
large language model |
|
|
| 36 |
REFER: Mitigating Bias in Opinion Summarisation via Frequency Framed Prompting |
提出REFER框架,通过频率框架提示缓解意见摘要中的偏见 |
large language model |
|
|
| 37 |
SciEvent: Benchmarking Multi-domain Scientific Event Extraction |
SciEvent:提出多领域科学事件抽取基准,促进结构化科学内容理解 |
large language model |
|
|
| 38 |
DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm |
提出DNA-DetectLLM,利用DNA修复机制实现零样本AI生成文本检测。 |
large language model |
✅ |
|
| 39 |
A method for improving multilingual quality and diversity of instruction fine-tuning datasets |
提出M-DaQ方法,提升多语言指令微调数据集的质量和多样性,增强LLM的多语言能力。 |
large language model |
|
|