| 1 |
Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models |
提出一致性多模态推理框架以解决复杂推理问题 |
large language model multimodal |
|
|
| 2 |
XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs |
提出XFacta以解决多模态虚假信息检测的评估问题 |
large language model multimodal |
|
|
| 3 |
Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time |
提出STIM框架以诊断链式思维推理中的记忆化问题 |
large language model chain-of-thought |
|
|
| 4 |
TIBSTC-CoT: A Multi-Domain Instruction Dataset for Chain-of-Thought Reasoning in Language Models |
提出TIBSTC-CoT以解决藏语数据稀缺问题 |
large language model chain-of-thought |
✅ |
|
| 5 |
Semantic Structure in Large Language Model Embeddings |
揭示大型语言模型嵌入中的语义结构以优化特征引导 |
large language model |
|
|
| 6 |
MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification |
提出MArgE框架以解决多LLM证据整合问题 |
large language model |
|
|
| 7 |
From Monolingual to Bilingual: Investigating Language Conditioning in Large Language Models for Psycholinguistic Tasks |
研究语言条件对大型语言模型心理语言学任务的影响 |
large language model |
|
|
| 8 |
AI-Based Measurement of Innovation: Mapping Expert Insight into Large Language Model Applications |
提出基于大语言模型的创新测量框架以解决专家评估局限性 |
large language model |
|
|
| 9 |
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models |
提出PNLAC与InhibitFT以解决政治立场跨主题泛化问题 |
large language model |
|
|
| 10 |
Isolating Culture Neurons in Multilingual Large Language Models |
提出一种方法以识别多语言大语言模型中的文化神经元 |
large language model |
✅ |
|
| 11 |
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models |
揭示大型语言模型中谄媚行为的内在机制 |
large language model |
|
|
| 12 |
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models |
提出TDBench以解决时间敏感问答评估的局限性 |
large language model |
✅ |
|
| 13 |
Prompting Large Language Models to Detect Dementia Family Caregivers |
提出基于大语言模型的推文检测方法以支持痴呆症家庭护理者 |
large language model |
|
|
| 14 |
Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation |
提出ICARE框架以解决放射科报告生成的评估问题 |
large language model foundation model |
|
|
| 15 |
PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs |
提出PoeTone框架以实现结构化中文宋词的约束生成 |
large language model chain-of-thought |
|
|
| 16 |
Can LLMs Generate High-Quality Task-Specific Conversations? |
提出参数化框架以控制大语言模型对话质量 |
large language model |
|
|
| 17 |
Test Set Quality in Multilingual LLM Evaluation |
提出多语言LLM评估数据集质量分析方法以提升评估准确性 |
large language model |
|
|
| 18 |
AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification |
提出AutoGeTS以解决文本分类数据不足问题 |
large language model |
|
|
| 19 |
Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs |
提出新分类法以分析大语言模型中的记忆现象 |
large language model |
|
|
| 20 |
Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction |
提出Sparse-dLLM以解决扩散大语言模型的计算复杂性问题 |
large language model |
✅ |
|
| 21 |
I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2 |
揭示LLaMA 3.2内部音素表示以提升韵律任务能力 |
large language model |
|
|
| 22 |
Modular Arithmetic: Language Models Solve Math Digit by Digit |
提出数字位置特定电路以解决语言模型的算术问题 |
large language model |
|
|
| 23 |
Monsoon Uprising in Bangladesh: How Facebook Shaped Collective Identity |
研究Facebook在孟加拉国集体身份构建中的作用 |
multimodal |
|
|
| 24 |
CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation |
提出CompressKV以解决长上下文处理中的KV缓存效率问题 |
large language model |
✅ |
|
| 25 |
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis |
提出CAMERA框架以解决MoE模型的冗余压缩问题 |
large language model |
|
|
| 26 |
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo |
提出VeOmni以解决多模态大模型训练效率低下问题 |
large language model |
|
|
| 27 |
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training |
提出LaMPE以解决长文本输入的性能下降问题 |
large language model |
✅ |
|
| 28 |
Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems |
提出Proof2Hybrid以解决数学基准测试自动生成问题 |
large language model |
|
|
| 29 |
Learning Dynamics of Meta-Learning in Small Model Pretraining |
提出元学习动态以优化小模型预训练 |
large language model |
|
|
| 30 |
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers |
提出HIN框架以揭示音频大语言模型中的后门攻击问题 |
large language model |
|
|
| 31 |
Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions |
提出FiqhQA基准以评估LLM在宗教问题上的可靠性与回避行为 |
large language model |
|
|
| 32 |
The SMeL Test: A simple benchmark for media literacy in language models |
提出SMeL测试以评估语言模型的媒体素养能力 |
large language model |
|
|
| 33 |
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents |
构建SpeechRole数据集以评估语音角色扮演代理的性能 |
multimodal |
|
|