SCREWS：用于推理修订的模块化框架

摘要

大型语言模型（LLMs）可以通过迭代地根据反馈调整和修订其输出来提高在各种任务上的准确性。我们观察到这些修订可能会引入错误，这种情况下最好回滚到先前的结果。此外，修订通常是同质的：它们使用产生初始答案的相同推理方法，这可能无法纠正错误。为了在这个领域进行探索，我们提出了SCREWS，一个用于推理和修订的模块化框架。它由三个主要模块组成：采样、条件重采样和选择，每个模块包含可根据任务手动选择的子模块。我们展示了SCREWS不仅在一个共同框架下统一了几种先前的方法，还揭示了几种识别改进推理链的新策略。我们使用最先进的LLMs（ChatGPT和GPT-4）在各种推理任务上评估我们的框架，并为每个任务发现了有用的新推理策略：算术单词问题、多跳问题回答和代码调试。异质的修订策略被证明是重要的，选择原始和修订候选者之间的选择也很重要。

English

Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct errors. To enable exploration in this space, we present SCREWS, a modular framework for reasoning with revisions. It is comprised of three main modules: Sampling, Conditional Resampling, and Selection, each consisting of sub-modules that can be hand-selected per task. We show that SCREWS not only unifies several previous approaches under a common framework, but also reveals several novel strategies for identifying improved reasoning chains. We evaluate our framework with state-of-the-art LLMs (ChatGPT and GPT-4) on a diverse set of reasoning tasks and uncover useful new reasoning strategies for each: arithmetic word problems, multi-hop question answering, and code debugging. Heterogeneous revision strategies prove to be important, as does selection between original and revised candidates.

SCREWS：用于推理修订的模块化框架

SCREWS: A Modular Framework for Reasoning with Revisions

摘要

Support