SCREWS：一個用於推理與修訂的模組化框架

摘要

大型語言模型（LLMs）可以通過根據反饋逐步改進和修訂其輸出來提高在各種任務上的準確性。我們觀察到這些修訂可能會引入錯誤，這種情況下最好回滾到先前的結果。此外，修訂通常是同質的：它們使用產生初始答案的相同推理方法，這可能無法更正錯誤。為了在這個領域進行探索，我們提出了SCREWS，一個用於推理與修訂的模塊化框架。它由三個主要模塊組成：抽樣、條件重抽樣和選擇，每個模塊都包含可根據任務手動選擇的子模塊。我們展示了SCREWS不僅將幾種先前方法統一到一個共同框架下，還揭示了幾種用於識別改進推理鏈的新策略。我們使用最先進的LLMs（ChatGPT和GPT-4）在各種推理任務上評估我們的框架，並發現了每個任務的有用新推理策略：算術單詞問題、多跳問答和代碼調試。異質的修訂策略被證明很重要，同時在原始候選和修訂候選之間進行選擇也很重要。

English

Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct errors. To enable exploration in this space, we present SCREWS, a modular framework for reasoning with revisions. It is comprised of three main modules: Sampling, Conditional Resampling, and Selection, each consisting of sub-modules that can be hand-selected per task. We show that SCREWS not only unifies several previous approaches under a common framework, but also reveals several novel strategies for identifying improved reasoning chains. We evaluate our framework with state-of-the-art LLMs (ChatGPT and GPT-4) on a diverse set of reasoning tasks and uncover useful new reasoning strategies for each: arithmetic word problems, multi-hop question answering, and code debugging. Heterogeneous revision strategies prove to be important, as does selection between original and revised candidates.

SCREWS：一個用於推理與修訂的模組化框架

SCREWS: A Modular Framework for Reasoning with Revisions

摘要

Support