变压器中隐含演绎推理的缩放特性

摘要

我们深入研究了深度受限Transformer在霍恩子句上的隐式演绎推理扩展特性。通过系统性地将可证明性与伪特征解耦并强化算法对齐，发现采用双向前缀掩码的足够深层模型中，隐式推理在不同图拓扑结构和问题宽度下均能逼近显式思维链的性能表现，但深度外推仍需依赖思维链方法。

English

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

变压器中隐含演绎推理的缩放特性

The Scaling Properties of Implicit Deductive Reasoning in Transformers

摘要

Support