イン・プレイス・フィードバック：マルチターン推論におけるLLMのガイドのための新たなパラダイム

要旨

大規模言語モデル（LLMs）は、ユーザーからのフィードバックに基づいて出力を反復的に改善するマルチターン推論の文脈で、ますます研究が進められています。このような設定は、複雑な推論を必要とするタスクにおいて重要ですが、既存のフィードバックパラダイムはしばしば新たなメッセージの発行に依存しています。LLMsはこれらを確実に統合することが難しく、一貫した改善が得られないことがあります。本研究では、ユーザーがLLMの前回の応答を直接編集し、モデルがこの修正された応答を条件として改訂を生成する、新たなインタラクションパラダイムである「インプレイスフィードバック」を提案します。多様な推論集約型ベンチマークでの実証評価により、インプレイスフィードバックは従来のマルチターンフィードバックよりも優れた性能を発揮し、79.1%少ないトークンを使用することが明らかになりました。制御環境での補完的分析はさらに、インプレイスフィードバックがマルチターンフィードバックの核心的な限界を解決することを示しています。すなわち、モデルはフィードバックを応答の誤った部分に正確に適用することがしばしばできず、誤りが修正されないまま残ったり、以前は正しかった内容に新たな誤りが導入されたりすることがあります。これらの知見は、インプレイスフィードバックが推論集約型タスクにおいてLLMsを導くためのより自然で効果的なメカニズムを提供することを示唆しています。

English

Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using 79.1% fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.

イン・プレイス・フィードバック：マルチターン推論におけるLLMのガイドのための新たなパラダイム

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

要旨

Support