| 1 |
Natural Context Drift Undermines the Natural Language Understanding of Large Language Models |
提出框架分析自然文本演变对LLM问答能力的影响 |
large language model |
|
|
| 2 |
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition |
提出序列化输出提示以提升多说话者语音识别性能 |
large language model |
|
|
| 3 |
Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation |
评估大型语言模型在伊斯兰继承法推理中的表现 |
large language model |
✅ |
|
| 4 |
REFRAG: Rethinking RAG based Decoding |
提出REFRAG以解决RAG解码效率问题 |
large language model |
|
|
| 5 |
Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning |
Vis-CoT:人机协同交互式可视化LLM思维链推理框架 |
large language model chain-of-thought |
|
|
| 6 |
Rethinking the Chain-of-Thought: The Roles of In-Context Learning and Pre-trained Priors |
深入探究思维链:上下文学习与预训练先验的双重角色 |
large language model chain-of-thought |
|
|
| 7 |
CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models |
提出因果注意力调整(CAT)方法,将细粒度因果知识注入大型语言模型。 |
large language model |
✅ |
|
| 8 |
Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal |
UniCR:提出统一框架,通过校准不确定性证据实现大语言模型风险可控的拒绝回答 |
large language model |
|
|
| 9 |
On the Alignment of Large Language Models with Global Human Opinion |
提出基于世界价值观调查的框架,评估大语言模型与全球人类意见的对齐程度 |
large language model |
✅ |
|
| 10 |
Can Large Language Models Master Complex Card Games? |
探索LLM在复杂卡牌游戏中的能力:通过微调实现类人智能 |
large language model |
✅ |
|
| 11 |
DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression |
DaMoC:基于数据和模型压缩高效选择领域任务微调的最佳大语言模型 |
large language model |
|
|
| 12 |
Efficient Large Language Models with Zero-Shot Adjustable Acceleration |
提出零样本可调加速方法,提升大语言模型推理效率 |
large language model |
|
|
| 13 |
WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data |
提出WATCHED,一种基于Web AI Agent的内容审核工具,用于检测和解释仇恨言论。 |
large language model chain-of-thought |
|
|
| 14 |
ShortageSim: Simulating Drug Shortages under Information Asymmetry |
ShortageSim:首个信息不对称下药品短缺监管干预的模拟框架 |
large language model |
|
|
| 15 |
Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs |
重新审视LLM的prompt敏感性:评估方法伪像还是模型缺陷? |
large language model |
|
|
| 16 |
Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations |
评估LLM在学术推荐中的公平性:揭示并量化语言模型中的偏见 |
large language model |
|
|
| 17 |
Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry |
构建现代中文诗歌检测基准,评估现有方法在识别LLM生成诗歌上的局限性 |
large language model |
|
|
| 18 |
Do Retrieval Augmented Language Models Know When They Don't Know? |
研究检索增强语言模型(RALM)的拒识能力,并提出改进方案。 |
large language model |
|
|
| 19 |
Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA |
提出Reason-KE框架,通过显式推理链实现对LLM的鲁棒知识编辑,提升多跳QA抗干扰能力。 |
large language model |
|
|
| 20 |
LLMs cannot spot math errors, even when allowed to peek into the solution |
LLM难以发现数学解题步骤中的错误,即使允许查看参考答案 |
large language model |
|
|
| 21 |
LongCat-Flash Technical Report |
LongCat-Flash:一个具有高效计算和高级Agent能力的5600亿参数MoE语言模型 |
foundation model |
✅ |
|
| 22 |
Culture is Everywhere: A Call for Intentionally Cultural Evaluation |
提出“有意文化评估”框架,解决LLM文化对齐评估中存在的偏见问题 |
large language model |
|
|
| 23 |
Mitigating Catastrophic Forgetting in Continual Learning through Model Growth |
通过模型增长缓解持续学习中的灾难性遗忘 |
large language model |
|
|
| 24 |
Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective |
提出实体对齐翻译方法,解决零样本跨语言命名实体识别中非拉丁语系性能下降问题 |
large language model |
|
|