反饋摩擦：大型語言模型難以完全整合外部反饋

摘要

近期研究表明，大型語言模型（LLMs）在獲得外部反饋後，具備一定程度的改進回應能力。然而，這些模型如何有效且徹底地整合外部反饋仍不明確。在理想情況下，若LLMs接收到近乎完美且完整的反饋，我們期望它們能完全吸收反饋並將錯誤答案修正為正確答案。本文通過設計一個受控的實驗環境，系統性地探討了LLMs整合反饋的能力。對於每個問題，首先由一個求解模型嘗試解答，接著一個擁有近乎完整標準答案訪問權限的反饋生成器產生針對性反饋，之後求解模型再次嘗試。我們在多樣化的任務上評估了這一流程，包括數學推理、知識推理、科學推理以及使用包括Claude 3.7（有無擴展思維）在內的尖端語言模型進行的通用多領域評估。令人驚訝的是，即使在這些近乎理想的條件下，求解模型仍持續表現出對反饋的抗拒，這一限制我們稱之為“反饋摩擦力”。為緩解這一限制，我們嘗試了基於採樣的策略，如逐步提高溫度參數和明確拒絕之前嘗試過的錯誤答案，這些方法雖帶來了一定改善，但仍未能幫助模型達到目標性能。我們還對“反饋摩擦力”的潛在成因進行了嚴謹探索，排除了模型過度自信和數據熟悉度等因素。我們希望通過揭示LLMs中的這一問題並排除若干表面原因，能為未來的自我改進研究提供幫助。

English

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and change their incorrect answers to correct ones. In this paper, we systematically investigate LLMs' ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3.7 (with and without extended thinking). Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We also perform a rigorous exploration of potential causes of FEEDBACK FRICTION, ruling out factors such as model overconfidence and data familiarity. We hope that highlighting this issue in LLMs and ruling out several apparent causes will help future research in self-improvement.

反饋摩擦：大型語言模型難以完全整合外部反饋

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

摘要

Support