反馈摩擦：大型语言模型难以完全整合外部反馈

摘要

近期研究表明，大型语言模型（LLMs）在获得外部反馈后，具备一定能力改进其响应。然而，这些模型在多大程度上能有效且彻底地整合外部反馈仍不明确。在理想情况下，若LLMs接收到近乎完美且完整的反馈，我们预期它们能完全吸收反馈，并将错误答案修正为正确。本文通过设计一个受控实验环境，系统性地探究了LLMs整合反馈的能力。针对每个问题，一个求解模型尝试解答，随后一个拥有近乎完整标准答案访问权限的反馈生成器提供针对性反馈，之后求解模型再次尝试。我们在包括数学推理、知识推理、科学推理及跨领域综合评估在内的多样化任务上评估了这一流程，使用了包括Claude 3.7（含与不含扩展思考）在内的顶尖语言模型。令人惊讶的是，即便在这些近乎理想的条件下，求解模型仍持续表现出对反馈的抗拒，这一局限我们称之为“反馈摩擦”。为缓解此局限，我们尝试了基于采样的策略，如逐步提高温度参数及明确拒绝之前尝试过的错误答案，虽有所改善，但仍未能助模型达到目标性能。我们还对“反馈摩擦”的潜在成因进行了严格探索，排除了模型过度自信及数据熟悉度等因素。我们希望通过揭示LLMs中的这一问题并排除若干表面原因，能为未来的自我改进研究提供帮助。

English

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and change their incorrect answers to correct ones. In this paper, we systematically investigate LLMs' ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3.7 (with and without extended thinking). Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We also perform a rigorous exploration of potential causes of FEEDBACK FRICTION, ruling out factors such as model overconfidence and data familiarity. We hope that highlighting this issue in LLMs and ruling out several apparent causes will help future research in self-improvement.

反馈摩擦：大型语言模型难以完全整合外部反馈

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

摘要

Support