KV绑定的测试时训练实为线性注意力的隐秘形式

摘要

测试时训练（TTT）采用键值绑定作为序列建模层，通常被解释为一种在线元学习形式，即在测试时记忆键值映射关系。然而，我们的分析揭示了多个与该记忆驱动解释相矛盾的现象。基于这些发现，我们重新审视TTT的数学表述，证明一大类TTT架构可表示为某种习得的线性注意力算子。这一视角不仅能解释先前令人困惑的模型行为，还带来多重实践价值：它支持基于理论原理的架构简化，允许构建完全并行的实现方案（在保持性能的同时提升效率），并能将多样化的TTT变体系统性地归结为标准线性注意力形式。总体而言，我们的研究将TTT重新定义为具有增强表征能力的习得线性注意力机制，而非测试时记忆过程。

English

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity.