語言模型作為編譯器：模擬偽代碼執行可提升語言模型的演算推理能力

摘要

演算推理指的是理解問題背後的複雜模式並將其分解為解決方案的推理步驟序列的能力。演算推理的這種特性使其對大型語言模型（LLMs）構成挑戰，即使它們在其他推理任務中表現出有希望的性能。在這個背景下，一些最近的研究使用編程語言（例如Python）來表達解決給定實例/問題（例如，思維程序）所需邏輯的靈感來自它們嚴格和精確的語法。然而，在單次推理調用中即時撰寫表達正確邏輯的可執行代碼是非常困難的。此外，專門為一個實例生成的代碼無法重複使用於其他實例，即使它們來自相同任務並可能需要相同的邏輯來解決。本文提出了一種新穎的框架Think-and-Execute，將語言模型的推理過程分解為兩個步驟。在Think中，我們發現一個在解決給定任務時所有實例共享的任務級邏輯，然後用偽代碼表達這個邏輯；在Execute中，我們進一步為每個實例定制生成的偽代碼並模擬代碼的執行。通過對七個演算推理任務進行大量實驗，我們展示了Think-and-Execute的有效性。我們的方法相較於執行特定實例推理的幾個強基線（例如CoT和PoT），更好地改善了語言模型的推理，這表明發現任務級邏輯的幫助性。此外，我們表明與自然語言相比，偽代碼可以更好地引導語言模型的推理，即使它們被訓練遵循自然語言指令。

English

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.

語言模型作為編譯器：模擬偽代碼執行可提升語言模型的演算推理能力

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

摘要

Support