长角牛：状态空间模型是摊销的在线学习器

摘要

现代人工智能方法（如大型语言模型LLMs）最基本的能力是能够预测长序列中的下一个标记，这被称为“序列建模”。尽管Transformer模型是当前主流的序列建模方法，但其与序列长度相关的二次计算成本是一个重要的缺点。状态空间模型（SSMs）由于其线性解码效率和训练过程中的高并行性，提供了一种有前途的替代方案。然而，现有的SSMs通常依赖看似临时的线性递归设计。在这项工作中，我们通过在线学习的视角探索SSM设计，将SSMs概念化为特定在线学习问题的元模块。这种方法将SSM设计与制定精确的在线学习目标联系起来，状态转移规则是从优化这些目标中得出的。基于这一见解，我们提出了一种基于隐式更新的新型深度SSM架构，用于优化在线回归目标。我们的实验结果表明，我们的模型在标准序列建模基准和语言建模任务中胜过了最先进的SSMs，包括Mamba模型。

English

The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.

长角牛：状态空间模型是摊销的在线学习器

Longhorn: State Space Models are Amortized Online Learners

摘要

Support