学習可能なカーネル関数を備えた線形トランスフォーマーは優れたインコンテキストモデルである

要旨

言語モデル（LM）のサブ二次計算量アーキテクチャのフロンティアを進化させることは、自然言語処理の急速に進化する分野において極めて重要です。現在の革新、例えば状態空間モデルは、当初言語モデリングタスクにおいてTransformerの性能を凌駕すると称賛されました。しかし、これらのモデルは、Transformerが伝統的に優れている領域であるインコンテキスト学習能力において欠陥を露呈しました。Basedモデルは、線形Transformerと指数関数のテイラー展開に着想を得たカーネルを融合し、畳み込みネットワークで拡張したハイブリッドソリューションとして登場しました。Transformerのインコンテキスト適応力を模倣し、この分野で有力な競争相手となりました。本研究では、Basedカーネルに対する単一で洗練された変更を提示し、Multi-Query Associative RecallタスクとPileデータセットで実証された全体的な言語モデリングプロセスにおけるインコンテキスト学習能力を強化します。

English

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

学習可能なカーネル関数を備えた線形トランスフォーマーは優れたインコンテキストモデルである

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

要旨

Support