基於鍵值綁定的測試時訓練實為線性注意力的隱藏形式

摘要

將鍵值綁定作為序列建模層的測試時訓練，通常被解讀為一種在測試時記憶鍵值映射的線上元學習形式。然而，我們的分析發現多個與這種基於記憶的解釋相矛盾的現象。基於這些發現，我們重新審視了測試時訓練的數學表述，證明一大類測試時訓練架構均可表述為某種學習型線性注意力運算元。這一觀點不僅能解釋先前令人困惑的模型行為，更帶來多項實際優勢：它支持基於理論的架構簡化，實現能保持性能同時提升效率的完全並行化表述，並能將多樣化的測試時訓練變體系統性地歸約為標準線性注意力形式。總體而言，我們的研究將測試時訓練重新定義為具有增強表徵能力的學習型線性注意力機制，而非測試階段的記憶行為。

English

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity.

基於鍵值綁定的測試時訓練實為線性注意力的隱藏形式

Test-Time Training with KV Binding Is Secretly Linear Attention

摘要

Support