인플레이스 피드백: 다중 턴 추론에서 LLM을 안내하는 새로운 패러다임

초록

대규모 언어 모델(LLMs)은 사용자 피드백을 기반으로 출력을 반복적으로 개선하는 다중 턴 추론(multi-turn reasoning) 맥락에서 점점 더 많이 연구되고 있습니다. 이러한 설정은 복잡한 추론이 필요한 작업에 필수적이지만, 기존의 피드백 패러다임은 주로 새로운 메시지를 발행하는 방식에 의존합니다. LLMs은 이러한 피드백을 안정적으로 통합하는 데 어려움을 겪어 일관된 개선을 이루지 못하는 경우가 많습니다. 본 연구에서는 사용자가 LLM의 이전 응답을 직접 수정하고, 모델이 이 수정된 응답을 조건으로 하여 개정된 응답을 생성하는 새로운 상호작용 패러다임인 '제자리 피드백(in-place feedback)'을 소개합니다. 다양한 추론 집약적 벤치마크에서의 실험적 평가 결과, 제자리 피드백은 기존의 다중 턴 피드백보다 더 나은 성능을 달성하면서도 79.1% 더 적은 토큰을 사용하는 것으로 나타났습니다. 통제된 환경에서의 보완적 분석은 더 나아가 제자리 피드백이 다중 턴 피드백의 핵심 한계를 해결한다는 것을 보여줍니다: 모델은 종종 피드백을 응답의 오류가 있는 부분에 정확히 적용하지 못해 오류를 그대로 남기거나, 이전에 정확했던 내용에 새로운 오류를 도입하는 경우가 있습니다. 이러한 연구 결과는 제자리 피드백이 추론 집약적 작업에서 LLMs을 안내하는 더 자연스럽고 효과적인 메커니즘을 제공한다는 것을 시사합니다.

English

Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using 79.1% fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.

인플레이스 피드백: 다중 턴 추론에서 LLM을 안내하는 새로운 패러다임

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

초록

Support