| 1 |
CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models |
CHRONOBERG:构建时序语料库,提升大语言模型对语言演变和时间感知的理解。 |
large language model foundation model |
✅ |
|
| 2 |
Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data |
提出复合推理(CR)方法,提升大语言模型在少数据下的复杂问题求解能力 |
large language model chain-of-thought |
|
|
| 3 |
R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning |
提出R-Capsule以提高大语言模型推理效率 |
large language model chain-of-thought |
|
|
| 4 |
Why Chain of Thought Fails in Clinical Text Understanding |
系统研究链式思维在临床文本理解中的局限性 |
large language model chain-of-thought |
|
|
| 5 |
Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning |
评估大语言模型在多语言法律推理中的局限性 |
large language model |
|
|
| 6 |
Large language models management of medications: three performance analyses |
评估大型语言模型在药物管理任务中的性能表现,揭示其局限性 |
large language model |
|
|
| 7 |
Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models |
评估论证型大语言模型中不确定性量化方法的有效性 |
large language model |
|
|
| 8 |
Detecting (Un)answerability in Large Language Models with Linear Directions |
利用线性方向检测大语言模型在抽取式问答中的不可回答性 |
large language model |
|
|
| 9 |
Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models |
利用大型语言模型进行风力涡轮机维护日志的探索性语义可靠性分析 |
large language model |
|
|
| 10 |
The Outputs of Large Language Models are Meaningless |
论证大型语言模型输出的无意义性,并探讨其表象意义的来源 |
large language model |
|
|
| 11 |
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement |
提出MACC框架,通过多轮细化自适应压缩CoT,提升推理效率与准确率。 |
chain-of-thought |
✅ |
|
| 12 |
FoodSEM: Large Language Model Specialized in Food Named-Entity Linking |
FoodSEM:针对食品命名实体链接微调的大型语言模型 |
large language model |
|
|
| 13 |
Evaluating Open-Source Large Language Models for Technical Telecom Question Answering |
评估开源大语言模型在电信技术问答中的性能 |
large language model |
|
|
| 14 |
Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration |
提出ThaiFACTUAL框架,解决泰语政治立场检测中大语言模型的偏见问题 |
large language model |
|
|
| 15 |
SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models |
提出SBFA:单比特翻转攻击破解大语言模型,揭示严重安全隐患 |
large language model |
|
|
| 16 |
Navigating the Impact of Structured Output Format on Large Language Models through the Compass of Causal Inference |
利用因果推断分析结构化输出格式对大语言模型的影响 |
large language model |
|
|
| 17 |
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning |
提出ADAM框架,用于评估和提升LLM在人物传记推理中的能力 |
large language model multimodal |
|
|
| 18 |
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing |
VoiceAssistant-Eval:一个综合性的AI助手评测基准,覆盖听觉、语音和视觉能力 |
large language model multimodal |
✅ |
|
| 19 |
RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media |
提出RedNote-Vibe数据集以分析社交媒体上AI生成文本的动态特征 |
large language model TAMP |
✅ |
|
| 20 |
Human Mobility Datasets Enriched With Contextual and Social Dimensions |
提出一种结合上下文、社交维度和LLM生成数据的城市人群移动数据集构建方法。 |
large language model multimodal |
|
|
| 21 |
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering |
提出AMANDA,利用LLM Agent进行医学知识增强,解决Med-VQA在低资源下的推理瓶颈。 |
large language model multimodal |
✅ |
|
| 22 |
Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs |
利用Prompt模拟知识截断评估LLM的时间感知能力与遗忘效果 |
large language model |
✅ |
|
| 23 |
Representing LLMs in Prompt Semantic Task Space |
提出一种免训练方法,将LLM表示为提示语义任务空间中的线性算子,用于模型选择。 |
large language model |
|
|
| 24 |
Transformers Can Learn Connectivity in Some Graphs but Not Others |
研究表明Transformer在网格状图上学习连通性,但在复杂图上存在困难 |
large language model |
|
|
| 25 |
AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts |
提出AI Brown和AI Koditex,用于对比人类文本与LLM生成文本的英语和捷克语语料库。 |
large language model |
|
|
| 26 |
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models |
InfiR2:面向推理增强语言模型的全面FP8训练方案 |
large language model |
|
|
| 27 |
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory |
FormalML:用于评估机器学习理论中形式化子目标补全的基准测试 |
large language model |
|
|
| 28 |
What Is The Political Content in LLMs' Pre- and Post-Training Data? |
分析LLM训练数据中的政治倾向,揭示模型偏见与数据偏差的相关性 |
large language model |
|
|
| 29 |
The Bias is in the Details: An Assessment of Cognitive Bias in LLMs |
评估LLM认知偏差:揭示模型在决策中存在的系统性偏差 |
large language model |
|
|
| 30 |
Towards Generalizable Implicit In-Context Learning with Attention Routing |
提出In-Context Routing (ICR),通过注意力路由实现通用隐式上下文学习。 |
large language model |
|
|
| 31 |
ArabJobs: A Multinational Corpus of Arabic Job Ads |
ArabJobs:一个多国阿拉伯语招聘广告语料库,用于公平感知的阿拉伯语NLP和劳动力市场研究。 |
large language model |
✅ |
|
| 32 |
We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong |
提出自适应多分支引导以解决大型语言模型对齐问题 |
large language model |
|
|
| 33 |
The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling |
构建InviTE语料库,用于计算建模都铎时期英语文本中的宗教谩骂 |
large language model |
|
|
| 34 |
Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs |
提出SSKG-LLM,通过结构化图编码和自适应空间对齐缓解LLM幻觉问题 |
large language model |
✅ |
|
| 35 |
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance |
提出Safety Compliance框架,通过法律合规视角提升LLM安全性 |
large language model |
|
|
| 36 |
FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction |
FLEXI:首个全双工人机语音交互评测基准,关注紧急情况下的模型中断 |
large language model |
|
|
| 37 |
FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding |
FeatBench:提出用于评估代码智能体在振动编码中特征实现能力的新基准 |
large language model |
|
|
| 38 |
Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation |
提出递归主题划分(RTP),利用LLM构建可解释主题树,实现文本聚类和可控生成。 |
large language model |
|
|
| 39 |
Library Hallucinations in LLMs: Risk Analysis Grounded in Developer Queries |
系统性分析开发者查询对LLM代码库幻觉的影响,揭示潜在安全风险。 |
large language model |
|
|
| 40 |
Mixture of Detectors: A Compact View of Machine-Generated Text Detection |
提出混合检测器方法,用于全面评估和提升机器生成文本检测能力。 |
large language model |
|
|
| 41 |
Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM |
提出Uni-LAP框架,通过监督分类模型与LLM紧密协作,实现通用法律条文预测。 |
large language model |
|
|
| 42 |
Think Right, Not More: Test-Time Scaling for Numerical Claim Verification |
提出VERIFIERFC模型,通过测试时缩放提升LLM在数值声明验证中的性能 |
large language model |
✅ |
|
| 43 |
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning |
提出COSPADI,通过校准引导的稀疏字典学习压缩LLM,提升压缩性能。 |
large language model |
|
|
| 44 |
Fine-tuning Done Right in Model Editing |
重塑微调在模型编辑中的地位:提出LocFT-BF大幅超越现有方法 |
large language model |
|
|
| 45 |
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias |
提出语音延续任务以探测语音模型偏见 |
foundation model |
|
|
| 46 |
Black-Box Hallucination Detection via Consistency Under the Uncertain Expression |
提出黑箱幻觉检测方法以解决语言模型生成虚假信息问题 |
large language model |
|
|
| 47 |
MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation |
MotivGraph-SoIQ:融合动机知识图谱与苏格拉底式对话,增强LLM学术创意生成 |
large language model |
|
|
| 48 |
SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation |
SimulSense:通过感知驱动的口译实现高效同声语音翻译 |
large language model |
|
|
| 49 |
A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs |
提出土耳其语引文意图分类数据集与框架,利用LLM和DSPy实现91.3%的准确率。 |
large language model |
|
|
| 50 |
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans |
AgentPack:一个由智能体与人类共同编写的代码变更数据集,用于提升代码编辑模型性能。 |
large language model |
|
|
| 51 |
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals |
LUMINA:利用上下文-知识信号检测RAG系统中的幻觉问题 |
large language model |
|
|
| 52 |
Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations |
提出LoRAN:通过结构化非线性变换增强低秩自适应能力 |
large language model |
|
|
| 53 |
What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness |
利用LLM Agent模拟提升应急预案有效性:一项迭代设计实践 |
large language model |
|
|
| 54 |
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models |
提出TRACE框架,通过多Agent模型结构化路径生成共情回复,提升分析深度和生成流畅性。 |
large language model |
|
|
| 55 |
Collaborative and Proactive Management of Task-Oriented Conversations |
提出一种基于信息状态的协作式任务导向对话管理模型,提升对话成功率。 |
large language model |
|
|
| 56 |
Can LLMs Solve and Generate Linguistic Olympiad Puzzles? |
利用大型语言模型解决并生成语言学奥林匹克竞赛题 |
large language model |
|
|
| 57 |
Self-Speculative Biased Decoding for Faster Live Translation |
提出自推测偏置解码,加速低延迟直播翻译,无需额外模型。 |
large language model |
|
|
| 58 |
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation |
提出ProPerSim框架,通过用户-助手模拟开发主动式个性化AI助手 |
large language model |
|
|
| 59 |
Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval |
Think-on-Graph 3.0:通过多智能体双重演化上下文检索,实现异构图上高效自适应的LLM推理 |
large language model |
|
|