具有可學習核函數的線性變換器在上下文模型中表現更佳。

摘要

在自然語言處理這個快速發展領域中，推進語言模型（LMs）次二次方架構的前沿至關重要。當前的創新，包括狀態空間模型，最初因在語言建模任務上超越Transformer的表現而受到讚譽。然而，這些模型揭示了在基本的上下文學習能力方面存在的不足 - 這是Transformer傳統上擅長的領域。Based模型作為一種混合解決方案出現，將線性Transformer與受到指數函數泰勒展開啟發的核心結合，並輔以卷積網絡。與Transformer的上下文靈活性相呼應，它成為該領域的一個強勁競爭者。在我們的工作中，我們提出了對Based核心的獨特優雅修改，增強了其在上下文學習能力上的表現，並通過對Pile數據集上的多查詢聯想回想任務和整體語言建模過程的評估來加以證明。

English

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

具有可學習核函數的線性變換器在上下文模型中表現更佳。

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

摘要

Support