无误差线性注意力机制是免费午餐:源自连续时间动力学的精确解
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
December 14, 2025
作者: Jingdi Lei, Di Zhang, Soujanya Poria
cs.AI
摘要
线性注意力机制与状态空间模型(SSM)有望解决采用softmax注意力的长上下文语言模型中存在的二次计算成本瓶颈。我们提出无损线性注意力(EFLA),这是一种数值稳定、完全可并行化且广义化的增量规则表达。具体而言,我们将在线学习更新构建为连续时间动力系统,并证明其精确解不仅可获取,还能以线性时间复杂度和完全并行化方式计算。通过利用动态矩阵的秩-1结构,我们直接推导出有效对应无限阶龙格-库塔法的精确闭式解。该注意力机制理论上不存在误差累积,能完美捕捉连续动态特性,同时保持线性时间复杂度。通过大量实验验证,EFLA在噪声环境中表现出鲁棒性能,相较于DeltaNet在未引入额外参数的情况下实现了更低的语言建模困惑度和更优的下游基准性能。本研究为构建高保真、可扩展的线性时间注意力模型奠定了新的理论基础。
English
Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, fully parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively corresponding to the infinite-order Runge-Kutta method. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.