| 1 |
Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models |
揭示大型语言模型对美式英语的结构性偏见,提出DiAlign方法进行量化分析。 |
large language model foundation model |
|
|
| 2 |
In-Context Watermarks for Large Language Models |
提出In-Context Watermarking,通过提示工程实现大语言模型生成文本溯源,解决模型不可访问场景下的水印问题。 |
large language model instruction following |
|
|
| 3 |
Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation |
提出MoRE框架,通过混合检索专家解决多模态大语言模型中的幻觉问题。 |
large language model multimodal |
|
|
| 4 |
Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity |
评估多模态LLM在政治视频情感分析中的可靠性,揭示实验室与实际场景的性能差距及性别偏见。 |
large language model multimodal |
|
|
| 5 |
Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity |
提出AMuFC框架,自适应判断视觉证据必要性,提升多模态事实核查准确率 |
multimodal |
|
|
| 6 |
Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models |
提出因果图注意力网络(GCAN)框架,提升大语言模型的事实可靠性。 |
large language model |
|
|
| 7 |
SPRIG: Improving Large Language Model Performance by System Prompt Optimization |
SPRIG:通过系统提示优化提升大语言模型性能 |
large language model |
|
|
| 8 |
The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages |
提出Thiomi数据集,用于低资源非洲语言的多模态学习 |
multimodal |
|
|
| 9 |
LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties |
综述性分析LLMs在医疗领域的应用与挑战,聚焦诊断与治疗功能。 |
large language model |
|
|
| 10 |
SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression |
SoLA:利用软激活稀疏性和低秩分解实现大语言模型高效压缩 |
large language model |
|
|
| 11 |
Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation |
提出一种基于本体论的轻量级框架,用于对大型语言模型进行对话控制,实现可控生成。 |
large language model |
|
|
| 12 |
Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not |
研究表明大型语言模型在土耳其语歧义消解中,常识推理能力弱于人类。 |
large language model |
|
|
| 13 |
MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models |
MegaFake:基于LLM生成假新闻的理论驱动型数据集,助力假新闻检测与治理 |
large language model |
|
|
| 14 |
UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization |
UtilityMax Prompting:提出基于形式化语言的多目标大语言模型优化框架 |
large language model |
|
|
| 15 |
MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification |
提出MultiPress多智能体框架,用于可解释的多模态新闻分类。 |
multimodal |
|
|
| 16 |
RUQuant: Towards Refining Uniform Quantization for Large Language Models |
RUQuant:通过优化均匀量化方案提升大语言模型压缩性能 |
large language model |
|
|
| 17 |
Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations |
利用大语言模型评估数字农业工具的数字包容性,加速并扩展评估流程。 |
large language model |
|
|
| 18 |
Large Language Models are Algorithmically Blind |
揭示大语言模型在算法推理上的根本缺陷:算法盲视 |
large language model |
|
|
| 19 |
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning |
揭示长链思维微调中推理模式对泛化性能的影响,并提出分支过滤方法。 |
chain-of-thought |
|
|
| 20 |
Informatics for Food Processing |
提出FoodProX和多模态AI模型,提升食品加工评估的客观性和可扩展性 |
large language model multimodal |
|
|
| 21 |
Self-Improving Pretraining: using post-trained models to pretrain better models |
提出自提升预训练方法,利用后训练模型改进预训练阶段,提升模型安全性、事实性和推理能力。 |
large language model instruction following |
|
|
| 22 |
POEMetric: The Last Stanza of Humanity |
POEMetric:首个诗歌评估框架,揭示LLM在诗歌创作中与人类的差距 |
large language model instruction following |
|
|
| 23 |
PDF Retrieval Augmented Question Answering |
提出基于RAG的PDF文档问答系统,增强多模态信息抽取能力 |
large language model multimodal |
|
|
| 24 |
A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification |
提出一种简单方法,利用语音token增强预训练语言模型,用于分类任务。 |
large language model multimodal |
|
|
| 25 |
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression |
TriAttention:利用三角函数KV压缩实现高效长程推理 |
large language model |
|
|
| 26 |
LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering |
LangFIR:利用单语数据发现稀疏的语言特定特征,用于语言引导。 |
large language model |
|
|
| 27 |
LightThinker++: From Reasoning Compression to Memory Management |
LightThinker++:通过显式自适应内存管理,提升LLM在复杂推理和Agent任务中的效率和性能。 |
large language model |
|
|
| 28 |
SkillX: Automatically Constructing Skill Knowledge Bases for Agents |
SkillX:自动构建智能体技能知识库,提升泛化性和效率 |
large language model |
|
|
| 29 |
Early Stopping for Large Reasoning Models via Confidence Dynamics |
提出CoDE-Stop,利用置信度动态提前停止大型推理模型,提升效率。 |
chain-of-thought |
|
|
| 30 |
Gaussian mixture models as a proxy for interacting language models |
提出交互式高斯混合模型作为交互式语言模型的计算高效代理。 |
large language model |
|
|
| 31 |
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing |
EvoEdit:通过演化零空间对齐实现鲁棒高效的知识编辑 |
large language model |
|
|
| 32 |
Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation |
提出基于约束最大似然估计的LLM性能稳健认证方法 |
large language model |
|
|
| 33 |
CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge |
CresOWLve:提出基于真实世界知识的创造性问题解决基准 |
large language model |
|
|
| 34 |
Evolutionary Search for Automated Design of Uncertainty Quantification Methods |
利用LLM驱动的进化搜索自动设计不确定性量化方法 |
large language model |
|
|
| 35 |
Testing the Limits of Truth Directions in LLMs |
揭示LLM中真值方向的局限性:层依赖、任务依赖与指令依赖 |
large language model |
|
|
| 36 |
I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation |
I-CALM框架通过激励置信度感知的回避机制缓解LLM幻觉问题 |
large language model |
|
|
| 37 |
Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs |
通过计算审计揭示LLM在文化翻译而非文化思维上的局限性 |
large language model |
|
|
| 38 |
Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations |
提出基于动态系统的幻觉盆地框架,用于理解和控制大语言模型的幻觉问题 |
large language model |
|
|
| 39 |
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline |
提出FURINA-Builder,通过可扩展的多智能体协作流程构建完全可定制的角色扮演基准。 |
large language model |
|
|
| 40 |
A Linguistics-Aware LLM Watermarking via Syntactic Predictability |
提出STELA:一种基于句法可预测性的语言学感知LLM水印方案 |
large language model |
|
|
| 41 |
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation |
提出基于博弈论的LLM互评估框架,实现更符合人类判断的模型评估 |
large language model |
|
|
| 42 |
Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation |
深入研究LLM多语言反事实样本生成,揭示跨语言扰动的共性与局限 |
large language model |
|
|
| 43 |
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets |
提出XTF框架,通过可解释的Token级噪声过滤提升LLM微调性能 |
large language model |
|
|
| 44 |
Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations |
提出文化对齐评估框架,揭示LLM中存在的西方中心文化偏差。 |
large language model |
|
|
| 45 |
Researchers waste 80% of LLM annotation costs by classifying one text at a time |
通过批量处理和变量堆叠,显著降低LLM文本分类标注成本,同时保持精度。 |
large language model |
|
|
| 46 |
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces |
提出GeoBrowse地理定位基准,用于评估Agentic工具使用中的多模态推理能力。 |
multimodal |
|
|
| 47 |
Emergent Inference-Time Semantic Contamination via In-Context Priming |
提出基于上下文引导的推理时语义污染检测方法 |
large language model |
|
|
| 48 |
Position: Logical Soundness is not a Reliable Criterion for Neurosymbolic Fact-Checking with LLMs |
指出神经符号事实核查中逻辑可靠性作为唯一标准的局限性,并提倡利用LLM的人类推理能力。 |
large language model |
|
|
| 49 |
Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Validation |
提出ACE框架,通过自适应成本评估实现可靠的专利权利要求验证。 |
large language model |
|
|
| 50 |
Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity |
探索LLM协同创作中的“暗模式”,揭示其对人类创造力的潜在抑制 |
large language model |
|
|
| 51 |
How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling |
提出面向数学建模竞赛的LLM阶段性评估框架,揭示模型在执行层面的不足 |
large language model |
|
|
| 52 |
Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection |
提出RACE模型,用于细粒度区分LLM生成文本的不同类型,提升LLM监管精度。 |
large language model |
|
|
| 53 |
BLADE: Better Language Answers through Dialogue and Explanations |
BLADE:通过对话和解释改进语言模型答案,促进主动学习 |
large language model |
|
|
| 54 |
Align then Train: Efficient Retrieval Adapter Learning |
提出ERA高效检索适配器,解决复杂查询下检索模型微调代价高昂的问题。 |
instruction following |
|
|
| 55 |
Talk2AI: A Longitudinal Dataset of Human--AI Persuasive Conversations |
Talk2AI:一个用于研究人机说服对话的大规模纵向数据集 |
large language model |
|
|
| 56 |
The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies |
PIMMUR原则:确保LLM社会集体行为模拟的有效性 |
large language model |
|
|
| 57 |
ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation |
ProMediate:用于评估多方协商中主动代理的社会认知框架 |
large language model |
|
|
| 58 |
BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding |
BLASST:通过Softmax阈值动态稀疏化Attention,加速长文本LLM推理。 |
large language model |
|
|
| 59 |
From Chains to DAGs: Probing the Graph Structure of Reasoning in LLMs |
提出Reasoning DAG Probing框架,探究LLM内部推理过程的图结构表示 |
large language model |
|
|
| 60 |
Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale |
Sandpiper:编排式AI标注,助力大规模教育对话分析 |
large language model |
|
|
| 61 |
ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving |
ICR-Drive:面向端到端语言驱动自动驾驶的指令反事实鲁棒性诊断框架 |
vision-language-action VLA foundation model |
|
|
| 62 |
Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting |
提出CoT2Edit,通过指令式思维链提示学习编辑知识,提升LLM泛化性和知识覆盖面 |
large language model chain-of-thought |
✅ |
|
| 63 |
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models |
揭示大语言模型中的表面顺从现象,诊断知识编辑的有效性 |
large language model |
✅ |
|
| 64 |
A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models |
提出多阶段验证框架,用于大规模临床信息抽取,提升LLM应用的可信度。 |
large language model |
|
|
| 65 |
Mechanistic Circuit-Based Knowledge Editing in Large Language Models |
提出MCircKE,通过机制性回路编辑提升大语言模型知识更新中的多步推理能力 |
large language model |
|
|
| 66 |
GenomeQA: Benchmarking General Large Language Models for Genome Sequence Understanding |
GenomeQA:评估通用大语言模型在基因组序列理解中的能力 |
large language model |
|
|
| 67 |
EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents |
EpiBench:用于多模态Agent的多轮研究工作流评测基准 |
multimodal |
|
|
| 68 |
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA |
提出数据驱动的函数调用改进方案,提升大语言模型在在线金融问答中的性能。 |
large language model |
|
|
| 69 |
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals |
将LLM推理视为轨迹:揭示步骤特定表征几何与正确性信号,并实现推理过程干预。 |
large language model chain-of-thought |
|
|
| 70 |
Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation |
提出CARE框架,评估AI心理健康对话中治疗原则的遵循度,并构建FAITH-M基准。 |
large language model chain-of-thought |
|
|
| 71 |
Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs |
提出CrossOmni数据集,揭示并解决Omni-LLM跨模态共指对齐难题。 |
large language model multimodal |
|
|
| 72 |
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives |
揭示LLM群体决策中社会动力学对客观性的负面影响 |
large language model |
|
|
| 73 |
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles |
利用大型语言模型生成并评估基于心理测量学特征的生活故事 |
large language model |
|
|
| 74 |
What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know" |
提出知识加权微调方法,提升大语言模型识别未知问题的能力 |
large language model |
|
|
| 75 |
Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue |
提出Context-Agent,通过动态话语树解决非线性对话中上下文管理难题。 |
large language model |
|
|
| 76 |
Exclusive Unlearning |
提出独占式遗忘(Exclusive Unlearning)方法,提升大语言模型安全性。 |
large language model |
|
|
| 77 |
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection |
研究表明,基于Outlines的约束解码在LLM自反思中会引发“结构滚雪球”现象。 |
large language model |
✅ |
|
| 78 |
BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs |
提出BOSCH以解决大语言模型短上下文注意力头选择问题 |
large language model |
|
|
| 79 |
Identifying Influential N-grams in Confidence Calibration via Regression Analysis |
通过回归分析识别影响置信度校准的N-gram,提升大语言模型推理可靠性 |
large language model |
|
|
| 80 |
See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs |
提出LVSpec,通过视觉语义引导的松散推测解码加速视频LLM推理。 |
large language model |
|
|
| 81 |
THIVLVC: Retrieval Augmented Dependency Parsing for Latin |
THIVLVC:提出检索增强的拉丁语依存句法分析方法,显著提升诗歌解析精度。 |
large language model |
|
|
| 82 |
Content Fuzzing for Escaping Information Cocoons on Digital Social Media |
提出ContentFuzz,通过内容模糊化突破社交媒体信息茧房 |
large language model |
|
|
| 83 |
Multi-Drafter Speculative Decoding with Alignment Feedback |
提出MetaSD框架,通过对齐反馈的多Drafter推测解码加速LLM推理。 |
large language model |
|
|
| 84 |
Confidence Should Be Calibrated More Than One Turn Deep |
提出MTCal和ConfChat,解决LLM多轮对话中置信度校准退化问题 |
large language model |
|
|
| 85 |
Do Domain-specific Experts exist in MoE-based LLMs? |
探索MoE-LLM领域专家存在性,提出无训练代价的DSMoE框架 |
large language model |
✅ |
|