语言模型作为编译器：模拟伪代码执行提升语言模型中的算法推理能力

摘要

算法推理指的是理解问题背后复杂模式的能力，并将这些模式分解为一系列通向解决方案的推理步骤。这种算法推理的特性对大型语言模型（LLMs）构成了挑战，尽管它们在其他推理任务中展示了令人鼓舞的表现。在此背景下，一些近期研究利用编程语言（如Python）来表达解决特定实例/问题所需的逻辑（例如，思维程序），这得益于其严格且精确的语法。然而，在单次推理调用中即时编写表达正确逻辑的可执行代码并非易事。此外，为某一实例生成的代码无法重用于其他实例，即使它们来自同一任务且可能需要相同的逻辑来解决。本文提出了“思考与执行”这一新颖框架，将语言模型的推理过程分解为两个步骤。（1）在“思考”阶段，我们发现了解决给定任务的所有实例共享的任务级逻辑，并用伪代码表达该逻辑；（2）在“执行”阶段，我们将生成的伪代码进一步定制于每个实例，并模拟代码的执行。通过在七个算法推理任务上的广泛实验，我们展示了“思考与执行”框架的有效性。与执行实例特定推理的几个强基线方法（如CoT和PoT）相比，我们的方法更好地提升了语言模型的推理能力，表明发现任务级逻辑的益处。此外，我们表明，与自然语言相比，伪代码能更好地指导语言模型的推理，尽管它们被训练来遵循自然语言指令。

English

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.

语言模型作为编译器：模拟伪代码执行提升语言模型中的算法推理能力

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

摘要

Summary

Support

Support