フィードバック摩擦：LLMは外部フィードバックを完全に取り込むのに苦戦

要旨

最近の研究では、大規模言語モデル（LLM）が外部からのフィードバックを与えられると、その応答を改善する能力をある程度持つことが示されています。しかし、これらのモデルが外部フィードバックをどの程度効果的かつ徹底的に取り込むことができるかはまだ明らかではありません。理想的なシナリオでは、LLMがほぼ完璧で完全なフィードバックを受け取った場合、フィードバックを完全に統合し、誤った回答を正しいものに変更することが期待されます。本論文では、制御された実験環境を設計することで、LLMのフィードバック取り込み能力を体系的に調査します。各問題に対して、ソルバーモデルが解答を試み、その後、ほぼ完全な正解にアクセスできるフィードバック生成器が特定のフィードバックを生成し、その後にソルバーが再度試みます。このパイプラインを、数学的推論、知識推論、科学的推論、および一般的な多分野評価を含む多様なタスクで評価し、Claude 3.7（拡張思考あり・なしを含む）などの最先端の言語モデルを使用します。驚くべきことに、これらのほぼ理想的な条件下でも、ソルバーモデルは一貫してフィードバックに対する抵抗を示し、この制限を「フィードバック摩擦（FEEDBACK FRICTION）」と呼びます。この制限を緩和するために、段階的な温度上昇や以前に試みた誤った回答の明示的な拒否などのサンプリングベースの戦略を実験し、改善は見られるものの、モデルが目標性能を達成するには至りませんでした。また、フィードバック摩擦の潜在的な原因を厳密に探求し、モデルの過信やデータの親しみやすさなどの要因を除外しました。LLMにおけるこの問題を強調し、いくつかの明白な原因を除外することで、自己改善に関する将来の研究に役立つことを期待しています。

English

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and change their incorrect answers to correct ones. In this paper, we systematically investigate LLMs' ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3.7 (with and without extended thinking). Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We also perform a rigorous exploration of potential causes of FEEDBACK FRICTION, ruling out factors such as model overconfidence and data familiarity. We hope that highlighting this issue in LLMs and ruling out several apparent causes will help future research in self-improvement.

フィードバック摩擦：LLMは外部フィードバックを完全に取り込むのに苦戦

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

要旨

Support