언어 모델을 컴파일러로: 의사코드 실행 시뮬레이션이 언어 모델의 알고리즘적 추론 능력을 향상시킨다

초록

알고리즘적 추론은 문제 뒤에 숨은 복잡한 패턴을 이해하고 이를 해결을 위한 일련의 추론 단계로 분해하는 능력을 의미합니다. 이러한 알고리즘적 추론의 특성은 대규모 언어 모델(LLM)에게는 도전적인 과제로 남아 있습니다. 비록 LLM이 다른 추론 과제에서 유망한 성능을 보여주었지만 말입니다. 이러한 맥락에서, 최근 일부 연구에서는 프로그래밍 언어(예: Python)를 사용하여 주어진 문제/질문을 해결하기 위한 필요한 논리를 표현하는 방법(예: Program-of-Thought)을 시도하고 있습니다. 이는 프로그래밍 언어의 엄격하고 정확한 문법에서 영감을 받은 것입니다. 그러나 단일 추론 호출 내에서 즉석에서 올바른 논리를 표현하는 실행 가능한 코드를 작성하는 것은 간단한 일이 아닙니다. 또한, 특정 문제를 위해 생성된 코드는 동일한 작업에서 나온 다른 문제에 재사용할 수 없으며, 동일한 논리가 필요할지라도 마찬가지입니다. 본 논문은 언어 모델의 추론 과정을 두 단계로 분해하는 새로운 프레임워크인 Think-and-Execute를 제안합니다. (1) Think 단계에서는 주어진 작업을 해결하기 위해 모든 문제에서 공유되는 작업 수준의 논리를 발견하고, 이를 의사코드로 표현합니다. (2) Execute 단계에서는 생성된 의사코드를 각 문제에 맞게 추가로 조정하고 코드의 실행을 시뮬레이션합니다. 7가지 알고리즘적 추론 과제에 대한 광범위한 실험을 통해 Think-and-Execute의 효과를 입증했습니다. 우리의 접근 방식은 문제별 추론을 수행하는 여러 강력한 베이스라인(예: CoT 및 PoT)과 비교하여 언어 모델의 추론을 더욱 개선하며, 작업 수준의 논리를 발견하는 것이 도움이 됨을 시사합니다. 또한, 자연어와 비교했을 때 의사코드가 언어 모델의 추론을 더 잘 안내할 수 있음을 보여줍니다. 비록 언어 모델이 자연어 지시를 따르도록 훈련되었음에도 불구하고 말입니다.

English

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.

언어 모델을 컴파일러로: 의사코드 실행 시뮬레이션이 언어 모델의 알고리즘적 추론 능력을 향상시킨다

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

초록

Support