피드백 마찰: 대형 언어 모델은 외부 피드백을 완전히 통합하는 데 어려움을 겪는다

초록

최근 연구에 따르면, 대형 언어 모델(LLM)은 외부 피드백을 받았을 때 응답을 개선할 수 있는 어느 정도의 능력을 보여줍니다. 그러나 이러한 모델이 외부 피드백을 얼마나 효과적이고 철저히 통합할 수 있는지는 여전히 명확하지 않습니다. 이상적인 시나리오에서, LLM이 거의 완벽하고 완전한 피드백을 받는다면, 모델은 피드백을 완전히 통합하여 잘못된 답변을 올바른 답변으로 변경할 것으로 기대할 수 있습니다. 본 논문에서는 통제된 실험 환경을 설계하여 LLM의 피드백 통합 능력을 체계적으로 조사합니다. 각 문제에 대해, 솔버 모델이 해결을 시도한 후, 거의 완전한 정답에 접근할 수 있는 피드백 생성기가 목표로 하는 피드백을 생성하고, 이후 솔버가 다시 시도합니다. 우리는 이 파이프라인을 수학적 추론, 지식 추론, 과학적 추론, 그리고 Claude 3.7(확장 사고 포함 및 미포함)과 같은 최신 언어 모델을 사용한 일반적인 다중 도메인 평가를 포함한 다양한 작업에 걸쳐 평가합니다. 놀랍게도, 이러한 거의 이상적인 조건 하에서도 솔버 모델은 피드백에 대해 일관적으로 저항성을 보이며, 우리는 이러한 한계를 '피드백 마찰(FEEDBACK FRICTION)'이라고 명명합니다. 이러한 한계를 완화하기 위해, 우리는 점진적인 온도 상승과 이전에 시도한 잘못된 답변의 명시적 거부와 같은 샘플링 기반 전략을 실험하여 개선을 이루었지만, 여전히 모델이 목표 성능에 도달하는 데는 실패했습니다. 또한, 우리는 피드백 마찰의 잠재적 원인에 대해 엄격한 탐구를 수행하며, 모델의 과도한 자신감과 데이터 친숙도와 같은 요인들을 배제했습니다. 우리는 LLM에서 이 문제를 강조하고 여러 명백한 원인을 배제함으로써 자기 개선에 대한 미래 연구에 도움이 되기를 바랍니다.

English

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and change their incorrect answers to correct ones. In this paper, we systematically investigate LLMs' ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3.7 (with and without extended thinking). Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We also perform a rigorous exploration of potential causes of FEEDBACK FRICTION, ruling out factors such as model overconfidence and data familiarity. We hope that highlighting this issue in LLMs and ruling out several apparent causes will help future research in self-improvement.

피드백 마찰: 대형 언어 모델은 외부 피드백을 완전히 통합하는 데 어려움을 겪는다

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

초록

Support