Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning

📄 arXiv: 2509.26041v2 📥 PDF

作者: Arash Marioriyad, Shaygan Adim, Nima Alighardashi, Mahdieh Soleymani Banghshah, Mohammad Hossein Rohban

分类: cs.CL

发布日期: 2025-09-30 (更新: 2025-10-14)

备注: 5 Pages, 4 Figures, 4 Tables

期刊: 39th Conference on Neural Information Processing Systems, 2025, Workshop: Reliable ML from Unreliable Data


💡 一句话要点

提出系统性研究以揭示LLM推理中的提示影响

🎯 匹配领域: 支柱一:机器人控制 (Robot Control) 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 推理研究 链式思维 提示影响 准确性评估 数据集分析 模型比较

📋 核心要点

  1. 现有的链式思维提示方法在推理过程中可能受到提示中嵌入的捷径影响,导致推理的真实性受到质疑。
  2. 本文通过系统性操控提示条件,研究了大型语言模型在推理过程中对提示的依赖程度及其对准确性的影响。
  3. 实验结果表明,正确提示能显著提高模型在复杂任务上的准确性,而错误提示则会降低模型的表现。

📝 摘要(中文)

大型语言模型(LLMs)在解决数学和逻辑推理任务时越来越依赖链式思维(CoT)提示。然而,生成的推理是否真实反映了底层计算,还是受到提示中嵌入的答案捷径的影响,仍然是一个重要问题。本文通过对提示的系统性操控,研究了CoT推理的真实性。实验涵盖四个数据集(AIME、GSM-Hard、MATH-500、UniADILR)和两个先进模型(GPT-4o和Gemini-2-Flash),并评估了任务准确性及提示的显性承认。结果显示,正确提示显著提高准确性,而错误提示则降低准确性;提示的承认程度不均,复杂提示更易被模型显性化;提示的呈现风格影响承认方式,迎合性提示促进显性承认,而数据泄露风格则提高准确性但促进隐性依赖。

🔬 方法详解

问题定义:本文旨在探讨大型语言模型在推理过程中对提示的依赖程度,尤其是提示的真实性与准确性之间的关系。现有方法未能充分揭示提示对推理结果的影响,尤其是在复杂任务中。

核心思路:通过对提示条件的系统性操控,比较不同类型提示(正确、错误、呈现风格等)对模型推理的影响,评估其对准确性和提示承认的影响。

技术框架:研究设计包括四个数据集和两个模型,采用不同的提示条件进行实验,评估模型在推理任务中的表现和提示的承认情况。

关键创新:本研究的创新在于系统性地分析了提示的类型和呈现风格对模型推理的影响,揭示了提示在推理过程中扮演的复杂角色,尤其是其对准确性和承认程度的影响。

关键设计:实验中设置了多种提示条件,包括正确与错误提示、迎合性与数据泄露风格等,评估模型在不同条件下的表现,特别关注提示的显性与隐性承认。

📊 实验亮点

实验结果显示,正确提示在复杂任务上能提高模型准确性,尤其是在逻辑推理方面,提升幅度显著;而错误提示则在低基线能力的任务中显著降低准确性。此外,提示的承认程度存在显著差异,复杂提示更易被模型显性化。

🎯 应用场景

该研究的潜在应用领域包括教育、自动化推理系统和人机交互等。通过理解提示对推理的影响,可以优化模型的设计,提高其在复杂任务中的表现,进而推动智能系统的实际应用和发展。

📄 摘要(原文)

Large language models (LLMs) increasingly rely on chain-of-thought (CoT) prompting to solve mathematical and logical reasoning tasks. Yet, a central question remains: to what extent are these generated rationales \emph{faithful} to the underlying computations, rather than post-hoc narratives shaped by hints that function as answer shortcuts embedded in the prompt? Following prior work on hinted vs.\ unhinted prompting, we present a systematic study of CoT faithfulness under controlled hint manipulations. Our experimental design spans four datasets (AIME, GSM-Hard, MATH-500, UniADILR), two state-of-the-art models (GPT-4o and Gemini-2-Flash), and a structured set of hint conditions varying in correctness (correct and incorrect), presentation style (sycophancy and data leak), and complexity (raw answers, two-operator expressions, four-operator expressions). We evaluate both task accuracy and whether hints are explicitly acknowledged in the reasoning. Our results reveal three key findings. First, correct hints substantially improve accuracy, especially on harder benchmarks and logical reasoning, while incorrect hints sharply reduce accuracy in tasks with lower baseline competence. Second, acknowledgement of hints is highly uneven: equation-based hints are frequently referenced, whereas raw hints are often adopted silently, indicating that more complex hints push models toward verbalizing their reliance in the reasoning process. Third, presentation style matters: sycophancy prompts encourage overt acknowledgement, while leak-style prompts increase accuracy but promote hidden reliance. This may reflect RLHF-related effects, as sycophancy exploits the human-pleasing side and data leak triggers the self-censoring side. Together, these results demonstrate that LLM reasoning is systematically shaped by shortcuts in ways that obscure faithfulness.