ChatPaper.aiChatPaper

無誤差線性注意力機制是免費的午餐:源自連續時間動力學的精確解

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

December 14, 2025
作者: Jingdi Lei, Di Zhang, Soujanya Poria
cs.AI

摘要

線性時間注意力機制與狀態空間模型(SSM)有望解決採用softmax注意力的長上下文語言模型中存在的二次計算成本瓶頸。我們提出誤差無損線性注意力(EFLA),這是一種數值穩定、完全可並行化且廣義化的delta規則表述。具體而言,我們將線上學習更新建模為連續時間動力學系統,並證明其精確解不僅可實現,還能以線性時間複雜度與完全並行化方式計算。通過利用動力學矩陣的秩-1結構,我們直接推導出對應於無限階龍格-庫塔法的閉式精確解。該注意力機制理論上無誤差累積,能完美捕捉連續動力學特性,同時保持線性時間複雜度。透過大量實驗驗證,EFLA在噪聲環境下展現出強健性能,相較DeltaNet在未引入額外參數的情況下,實現了更低的語言建模困惑度與更優的下游基準性能。本研究為構建高保真、可擴展的線性時間注意力模型奠定了新的理論基礎。
English
Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, fully parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively corresponding to the infinite-order Runge-Kutta method. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.
PDF342December 17, 2025