KV绑定的测试时训练实为线性注意力的隐秘形式
Test-Time Training with KV Binding Is Secretly Linear Attention
February 24, 2026
作者: Junchen Liu, Sven Elflein, Or Litany, Zan Gojcic, Ruilong Li
cs.AI
摘要
测试时训练(TTT)采用键值绑定作为序列建模层,通常被解释为一种在线元学习形式,即在测试时记忆键值映射关系。然而,我们的分析揭示了多个与该记忆驱动解释相矛盾的现象。基于这些发现,我们重新审视TTT的数学表述,证明一大类TTT架构可表示为某种习得的线性注意力算子。这一视角不仅能解释先前令人困惑的模型行为,还带来多重实践价值:它支持基于理论原理的架构简化,允许构建完全并行的实现方案(在保持性能的同时提升效率),并能将多样化的TTT变体系统性地归结为标准线性注意力形式。总体而言,我们的研究将TTT重新定义为具有增强表征能力的习得线性注意力机制,而非测试时记忆过程。
English
Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity.