透過誤差控制動態重新思考遞迴模型中的狀態追蹤

摘要

遞迴架構中的狀態追蹤理論主要關注表達能力：一種固定架構能否在理論上實現一組符號轉換規則。我們主張，同樣重要的是誤差控制，即支配隱藏狀態沿著區分符號狀態的方向發生漂移的動態機制。我們證明，仿射遞迴網路（一類包含狀態空間模型與線性注意力的模型）一旦保留了狀態表徵，就無法沿狀態分隔子空間修正誤差。因此，實際的仿射追蹤器並未學會穩健的狀態追蹤；相反，它們學習的是由累積的狀態相關誤差所支配的有限時間範圍解。我們刻畫了此種失效機制，表明追蹤在可讀範圍內僅當累積的類內離散程度相對於初始類間分離度仍保持較小時才得以維持。我們在群體狀態追蹤任務上進行實驗證明，這種崩潰是可預測的：當可區分性比率超過訓練解碼器的可讀性閾值時，追蹤便會失效。在訓練過的模型中，此交叉點可預測下游準確度失效的時間範圍。這些結果表明，穩健的狀態追蹤不僅取決於架構的理論表達能力，更關鍵地取決於其誤差控制能力。

English

The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics governing hidden-state drift along the directions that distinguish symbolic states. We prove that affine recurrent networks, a class of models encompassing State-Space Models and Linear Attention, cannot correct errors along state-separating subspaces once they preserve state representations. Consequently, practical affine trackers do not learn robust state tracking; rather, they learn finite horizon solutions governed by accumulated state-relevant error. We characterize the mechanics of this failure, showing that tracking remains readable only while the accumulating within-class spread remains small relative to the initial between-class separation. We demonstrate empirically on group state-tracking tasks that this breakdown is predictable: tracking collapses when the distinguishability ratio crosses the readability threshold of the trained decoder. Across trained models, the point of this crossing predicts the horizon at which downstream accuracy fails. These results establish that robust state tracking is determined not only by an architecture's theoretical expressivity but crucially by its error control.

透過誤差控制動態重新思考遞迴模型中的狀態追蹤

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

摘要

Support