ChatPaper.aiChatPaper

模仿游戏:图灵机模拟器具备长度泛化推理能力

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

July 17, 2025
作者: Zhouqi Hua, Wenwei Zhang, Chengqi Lyu, Yuzhe Gu, Songyang Gao, Kuikun Liu, Kai Chen
cs.AI

摘要

长度泛化能力,即解决训练过程中未见过的更长序列问题的能力,是Transformer架构下大型语言模型(LLM)面临的核心挑战。尽管现有研究主要集中于算术运算和符号操作任务的数据驱动方法,但这些方法往往局限于特定任务,整体性能有限。为寻求更通用的解决方案,本文聚焦于一类更广泛的、可计算推理问题,即那些算法能够解决、进而图灵机也能解决的问题。基于此视角,本文提出了图灵机模仿学习(TAIL),旨在提升LLM的长度泛化能力。TAIL通过计算机程序合成模仿图灵机执行过程的思维链(CoT)数据,将推理步骤线性扩展为原子状态,以缓解捷径学习现象,并引入显式内存获取机制,降低基础操作中动态及长距离数据访问的难度。为验证TAIL的可靠性与普适性,我们构建了一个涵盖8类算法、18项任务的挑战性合成数据集。无需额外修饰,TAIL仅凭合成数据便显著提升了Qwen2.5-7B在多项任务上的长度泛化能力及性能表现,超越了以往方法及DeepSeek-R1。实验结果表明,图灵机中的关键概念,而非其思维模式,对TAIL实现长度泛化不可或缺,模型在注意力层中展现出与图灵机特性相符的读写行为。本工作为未来从合成数据中学习LLM推理能力的研究指明了一个有前景的方向。
English
Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps into atomic states to alleviate shortcut learning and explicit memory fetch mechanism to reduce the difficulties of dynamic and long-range data access in elementary operations. To validate the reliability and universality of TAIL, we construct a challenging synthetic dataset covering 8 classes of algorithms and 18 tasks. Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B on various tasks using only synthetic data, surpassing previous methods and DeepSeek-R1. The experimental results reveal that the key concepts in the Turing Machine, instead of the thinking styles, are indispensable for TAIL for length generalization, through which the model exhibits read-and-write behaviors consistent with the properties of the Turing Machine in their attention layers. This work provides a promising direction for future research in the learning of LLM reasoning from synthetic data.
PDF331July 18, 2025