模仿遊戲：圖靈機模擬器具備長度泛化能力的推理者

摘要

長度泛化能力，即解決訓練期間未見過的長序列問題的能力，是基於Transformer的大型語言模型（LLM）面臨的核心挑戰。儘管現有研究主要集中在數據驅動的方法來處理算術運算和符號操作任務，這些方法往往具有任務特定性且整體性能有限。為尋求更通用的解決方案，本文聚焦於更廣泛的可計算推理問題，即算法能夠解決的問題，因此圖靈機也能解決。從這一角度出發，本文提出了圖靈機模仿學習（TAIL）來提升LLM的長度泛化能力。TAIL通過計算機程序合成模仿圖靈機執行過程的思維鏈（CoT）數據，將推理步驟線性擴展為原子狀態，以緩解捷徑學習，並引入顯式記憶提取機制，降低基本操作中動態和長距離數據訪問的難度。為驗證TAIL的可靠性和普適性，我們構建了一個涵蓋8類算法和18個任務的具有挑戰性的合成數據集。無需繁瑣的調整，TAIL僅使用合成數據就顯著提升了Qwen2.5-7B在各種任務上的長度泛化能力和性能，超越了先前的方法和DeepSeek-R1。實驗結果表明，圖靈機中的關鍵概念，而非思維方式，對TAIL實現長度泛化至關重要，模型在其注意力層中展現出與圖靈機特性一致的讀寫行為。這項工作為未來從合成數據中學習LLM推理提供了一個有前景的研究方向。

English

Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps into atomic states to alleviate shortcut learning and explicit memory fetch mechanism to reduce the difficulties of dynamic and long-range data access in elementary operations. To validate the reliability and universality of TAIL, we construct a challenging synthetic dataset covering 8 classes of algorithms and 18 tasks. Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B on various tasks using only synthetic data, surpassing previous methods and DeepSeek-R1. The experimental results reveal that the key concepts in the Turing Machine, instead of the thinking styles, are indispensable for TAIL for length generalization, through which the model exhibits read-and-write behaviors consistent with the properties of the Turing Machine in their attention layers. This work provides a promising direction for future research in the learning of LLM reasoning from synthetic data.

模仿遊戲：圖靈機模擬器具備長度泛化能力的推理者

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

摘要

Support