SCREWS: 개정을 통한 추론을 위한 모듈형 프레임워크

초록

대규모 언어 모델(LLMs)은 피드백을 기반으로 출력을 반복적으로 개선하고 수정함으로써 다양한 작업에서 정확도를 향상시킬 수 있다. 우리는 이러한 수정 과정에서 오류가 발생할 수 있으며, 이 경우 이전 결과로 되돌리는 것이 더 나을 수 있음을 관찰했다. 또한, 수정은 일반적으로 동질적이다: 초기 답변을 생성한 것과 동일한 추론 방법을 사용하므로 오류를 수정하지 못할 수 있다. 이 분야의 탐구를 가능하게 하기 위해, 우리는 수정을 통한 추론을 위한 모듈식 프레임워크인 SCREWS를 제시한다. SCREWS는 샘플링(Sampling), 조건부 재샘플링(Conditional Resampling), 선택(Selection)이라는 세 가지 주요 모듈로 구성되며, 각 모듈은 작업별로 수동 선택 가능한 하위 모듈로 이루어져 있다. 우리는 SCREWS가 여러 기존 접근법을 공통 프레임워크 아래 통합할 뿐만 아니라, 개선된 추론 체인을 식별하기 위한 여러 새로운 전략을 발견할 수 있음을 보여준다. 우리는 이 프레임워크를 최신 LLMs(ChatGPT 및 GPT-4)를 사용하여 다양한 추론 작업(산술 단어 문제, 다중 홉 질의응답, 코드 디버깅)에 대해 평가하고, 각 작업에 유용한 새로운 추론 전략을 발견했다. 이질적인 수정 전략과 원본 및 수정된 후보 간의 선택이 중요함이 입증되었다.

English

Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct errors. To enable exploration in this space, we present SCREWS, a modular framework for reasoning with revisions. It is comprised of three main modules: Sampling, Conditional Resampling, and Selection, each consisting of sub-modules that can be hand-selected per task. We show that SCREWS not only unifies several previous approaches under a common framework, but also reveals several novel strategies for identifying improved reasoning chains. We evaluate our framework with state-of-the-art LLMs (ChatGPT and GPT-4) on a diverse set of reasoning tasks and uncover useful new reasoning strategies for each: arithmetic word problems, multi-hop question answering, and code debugging. Heterogeneous revision strategies prove to be important, as does selection between original and revised candidates.

SCREWS: 개정을 통한 추론을 위한 모듈형 프레임워크

SCREWS: A Modular Framework for Reasoning with Revisions

초록

Support