具有可学习核函数的线性变换器更好上下文模型

摘要

推进次二次架构在语言模型（LMs）领域的前沿对于快速发展的自然语言处理领域至关重要。当前的创新，包括状态空间模型，最初因在语言建模任务上超越Transformer的表现而受到赞誉。然而，这些模型揭示了在基本的上下文学习能力方面存在的不足 - 这是Transformer传统上擅长的领域。Based模型作为一种混合解决方案出现，将线性Transformer与受到指数函数泰勒展开启发的核相结合，再辅以卷积网络。模仿Transformer的上下文能力，它成为该领域中的一个强有力竞争者。在我们的工作中，我们提出了一种独特而优雅的Based核的改变，增强了其在上下文学习能力上的表现，通过在Pile数据集上展示的多查询联想回忆任务和整体语言建模过程进行评估。

English

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

具有可学习核函数的线性变换器更好上下文模型

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

摘要

Support