語言模型作為編譯器:模擬偽代碼執行可提升語言模型的演算推理能力
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
April 3, 2024
作者: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo
cs.AI
摘要
演算推理指的是理解問題背後的複雜模式並將其分解為解決方案的推理步驟序列的能力。演算推理的這種特性使其對大型語言模型(LLMs)構成挑戰,即使它們在其他推理任務中表現出有希望的性能。在這個背景下,一些最近的研究使用編程語言(例如Python)來表達解決給定實例/問題(例如,思維程序)所需邏輯的靈感來自它們嚴格和精確的語法。然而,在單次推理調用中即時撰寫表達正確邏輯的可執行代碼是非常困難的。此外,專門為一個實例生成的代碼無法重複使用於其他實例,即使它們來自相同任務並可能需要相同的邏輯來解決。本文提出了一種新穎的框架Think-and-Execute,將語言模型的推理過程分解為兩個步驟。在Think中,我們發現一個在解決給定任務時所有實例共享的任務級邏輯,然後用偽代碼表達這個邏輯;在Execute中,我們進一步為每個實例定制生成的偽代碼並模擬代碼的執行。通過對七個演算推理任務進行大量實驗,我們展示了Think-and-Execute的有效性。我們的方法相較於執行特定實例推理的幾個強基線(例如CoT和PoT),更好地改善了語言模型的推理,這表明發現任務級邏輯的幫助性。此外,我們表明與自然語言相比,偽代碼可以更好地引導語言模型的推理,即使它們被訓練遵循自然語言指令。
English
Algorithmic reasoning refers to the ability to understand the complex
patterns behind the problem and decompose them into a sequence of reasoning
steps towards the solution. Such nature of algorithmic reasoning makes it a
challenge for large language models (LLMs), even though they have demonstrated
promising performance in other reasoning tasks. Within this context, some
recent studies use programming languages (e.g., Python) to express the
necessary logic for solving a given instance/question (e.g.,
Program-of-Thought) as inspired by their strict and precise syntaxes. However,
it is non-trivial to write an executable code that expresses the correct logic
on the fly within a single inference call. Also, the code generated
specifically for an instance cannot be reused for others, even if they are from
the same task and might require identical logic to solve. This paper presents
Think-and-Execute, a novel framework that decomposes the reasoning process of
language models into two steps. (1) In Think, we discover a task-level logic
that is shared across all instances for solving a given task and then express
the logic with pseudocode; (2) In Execute, we further tailor the generated
pseudocode to each instance and simulate the execution of the code. With
extensive experiments on seven algorithmic reasoning tasks, we demonstrate the
effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning
compared to several strong baselines performing instance-specific reasoning
(e.g., CoT and PoT), suggesting the helpfulness of discovering task-level
logic. Also, we show that compared to natural language, pseudocode can better
guide the reasoning of LMs, even though they are trained to follow natural
language instructions.Summary
AI-Generated Summary