長角牛：狀態空間模型是分期攤銷的在線學習器

摘要

現代人工智慧方法（如大型語言模型LLMs）最基本的能力是能夠預測長序列中的下一個標記，這稱為「序列建模」。儘管Transformer模型是目前主導的序列建模方法，但其與序列長度相關的二次計算成本是一個顯著的缺點。狀態空間模型（SSMs）由於其線性解碼效率和高度可並行化的訓練而提供了一個有前途的替代方案。然而，現有的SSMs通常依賴看似特定的線性遞歸設計。在這項工作中，我們通過在線學習的觀點探索SSM設計，將SSMs概念化為特定在線學習問題的元模塊。這種方法將SSM設計與制定精確的在線學習目標相關聯，並從優化這些目標中衍生出狀態轉換規則。基於這一見解，我們引入了一種基於隱式更新的新型深度SSM架構，用於優化在線回歸目標。我們的實驗結果顯示，我們的模型在標準序列建模基準和語言建模任務中優於最先進的SSMs，包括Mamba模型。

English

The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.

長角牛：狀態空間模型是分期攤銷的在線學習器

Longhorn: State Space Models are Amortized Online Learners

摘要

Support