Dynamische Lineaire Aandacht

Samenvatting

De schaalbaarheid van grote taalmodellen (Large Language Models, LLM's) naar lange contexten wordt fundamenteel beperkt door de kwadratische complexiteit van standaard aandacht, wat de adoptie van lineaire aandachtmechanismen met subkwadratische kosten motiveert. Om de representatiecapaciteit onder lange contexten te verbeteren, organiseren recente benaderingen het geheugen op een multi-toestand manier. Bestaande multi-toestand lineaire aandachtmethoden zijn echter afhankelijk van vaste toestandssamenvoegingsstrategieën die zich niet kunnen aanpassen aan dynamisch variërend tokenbelang, waardoor kritieke tokens onomkeerbaar worden verdoezeld en ernstige foutaccumulatie over lange sequenties ontstaat. Om deze beperking aan te pakken, introduceren we DLA, een dynamisch geheugenmodelleringskader voor multi-toestand lineaire aandacht. DLA introduceert (i) Informatiebewuste Dynamische Toestandssamenvoeging (Information-Aware Dynamic State Merging), die adaptief toestandsgrenzen bepaalt op basis van informatievariatie op token-niveau, waarbij hoge-resolutie-representaties rond semantische overgangen behouden blijven terwijl stabiele regio's agressief worden samengevat, en (ii) Capaciteitsbegrensde Geheugenmodellering (Capacity-Bounded Memory Modeling), die een vaste, chronologisch geordende toestandscache handhaaft door selectief aangrenzende laag-informatieve toestanden samen te voegen om geheugengroei te beheersen met minimaal informatieverlies. We trainen DLA vooraf op twee verschillende lineaire aandachtmodellen en evalueren op 16 datasets in drie categorieën. Experimentele resultaten tonen de superioriteit van DLA ten opzichte van de state-of-the-art.

English

The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt to dynamically varying token importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework for multi-state linear attention. DLA introduces (i) Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii) Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.