| 1 |
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging |
提出基于权重空间模型融合的框架,缓解大语言模型在医疗领域微调中的灾难性遗忘问题。 |
large language model foundation model instruction following |
|
|
| 2 |
Reliable Control-Point Selection for Steering Reasoning in Large Language Models |
提出稳定性过滤方法,提升大语言模型中控制点选择的可靠性,从而改善推理能力。 |
large language model chain-of-thought |
✅ |
|
| 3 |
ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues |
ImplicitBBQ:通过特征线索评估大型语言模型中的隐性偏见 |
large language model chain-of-thought |
|
|
| 4 |
Is Clinical Text Enough? A Multimodal Study on Mortality Prediction in Heart Failure Patients |
提出基于实体感知的多模态Transformer模型,提升心力衰竭患者短期死亡率预测精度。 |
large language model multimodal |
|
|
| 5 |
Towards Position-Robust Talent Recommendation via Large Language Models |
提出L3TR框架,利用大语言模型解决人才推荐中的位置偏差问题 |
large language model |
|
|
| 6 |
Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents |
针对函数调用语言代理,发现适度思维链更优,并提出FR-CoT方法。 |
chain-of-thought |
|
|
| 7 |
SURE: Synergistic Uncertainty-aware Reasoning for Multimodal Emotion Recognition in Conversations |
提出SURE框架,通过协同不确定性感知推理提升对话多模态情感识别 |
multimodal |
|
|
| 8 |
Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition |
提出基于大语言模型和人工指导的越南语语音情感识别框架 |
large language model |
|
|
| 9 |
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning |
揭示CoT微调中推理模式对泛化性能的影响,并提出分支过滤方法。 |
chain-of-thought |
|
|
| 10 |
SAFE: Stepwise Atomic Feedback for Error correction in Multi-hop Reasoning |
SAFE框架通过原子反馈纠正多跳推理中的错误,提升LLM的推理可靠性。 |
large language model chain-of-thought |
|
|
| 11 |
GaelEval: Benchmarking LLM Performance for Scottish Gaelic |
GaelEval:构建苏格兰盖尔语LLM多维度评测基准,揭示模型语言和文化能力。 |
large language model |
|
|
| 12 |
Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding |
GOOSE:利用各向异性推测树实现免训练推测解码,加速大语言模型推理。 |
large language model |
|
|
| 13 |
LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches |
提出LiveMathematicianBench,用于评估LLM在研究级数学推理中的能力 |
large language model |
|
|
| 14 |
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework |
统一框架下对比LLM Agent记忆模块,并提出新记忆方法提升性能 |
large language model |
|
|
| 15 |
Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations |
提出机制性诊断框架以解决大语言模型对表面扰动的脆弱性问题 |
large language model |
|
|
| 16 |
Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression |
Swift-SVD:面向低秩LLM压缩的理论最优与高效实践框架 |
large language model |
|
|
| 17 |
Read More, Think More: Revisiting Observation Reduction for Web Agents |
提出观察表示选择策略以提升网络智能体性能 |
large language model |
|
|
| 18 |
Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once |
提出Magic, Madness, Heaven, Sin框架,用于评估LLM输出多样性并解决跨领域优化冲突。 |
large language model |
|
|
| 19 |
From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents |
提出SWE-ZERO到SWE-HERO两阶段SFT方法,提升软件工程Agent在SWE-bench上的性能。 |
zero-shot transfer |
|
|