SCREWS: リビジョンを伴う推論のためのモジュール型フレームワーク

要旨

大規模言語モデル（LLM）は、フィードバックに基づいて出力を反復的に洗練・修正することで、さまざまなタスクにおける精度を向上させることができます。しかし、これらの修正が誤りを導入する場合があり、その際は以前の結果に戻す方が良いことが観察されています。さらに、修正は通常均質的であり、初期の回答を生成したのと同じ推論方法を使用するため、誤りを正せない可能性があります。この領域の探求を可能にするため、我々はSCREWSという、修正を伴う推論のためのモジュール型フレームワークを提案します。SCREWSは、サンプリング、条件付き再サンプリング、選択の3つの主要モジュールで構成され、各モジュールはタスクごとに手動で選択可能なサブモジュールを含んでいます。SCREWSは、既存の複数のアプローチを共通のフレームワークの下に統合するだけでなく、改善された推論チェーンを特定するためのいくつかの新しい戦略を明らかにします。我々は、最先端のLLM（ChatGPTとGPT-4）を用いて、多様な推論タスク（算数の文章題、マルチホップ質問応答、コードデバッグ）に対してこのフレームワークを評価し、それぞれに有用な新しい推論戦略を発見しました。異質な修正戦略が重要であること、また、元の候補と修正された候補の間での選択が重要であることが明らかになりました。

English

Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct errors. To enable exploration in this space, we present SCREWS, a modular framework for reasoning with revisions. It is comprised of three main modules: Sampling, Conditional Resampling, and Selection, each consisting of sub-modules that can be hand-selected per task. We show that SCREWS not only unifies several previous approaches under a common framework, but also reveals several novel strategies for identifying improved reasoning chains. We evaluate our framework with state-of-the-art LLMs (ChatGPT and GPT-4) on a diverse set of reasoning tasks and uncover useful new reasoning strategies for each: arithmetic word problems, multi-hop question answering, and code debugging. Heterogeneous revision strategies prove to be important, as does selection between original and revised candidates.

SCREWS: リビジョンを伴う推論のためのモジュール型フレームワーク

SCREWS: A Modular Framework for Reasoning with Revisions

要旨

Support