TRAMS: 長距離言語モデリングのためのトレーニング不要なメモリ選択

要旨

Transformerアーキテクチャは多くのAIモデルにおいて重要な役割を果たしているが、長距離言語モデリングにおいて依然として課題を抱えている。長距離依存性の問題に対処するためにいくつかの特定のTransformerアーキテクチャが設計されているものの、Transformer-XLのような既存の手法は、無効なメモリの割合が高いという問題に悩まされている。本研究では、TRAining-free Memory Selection（TRAMS）と呼ばれるプラグアンドプレイ戦略を提案する。この戦略は、単純な指標に基づいて注意計算に参加するトークンを選択するものであり、現在のクエリとの高い注意スコアを持つ可能性が高いトークンを保持し、それ以外のトークンを無視することを可能にする。我々はこのアプローチを単語レベルのベンチマーク（WikiText-103）と文字レベルのベンチマーク（enwik8）でテストし、追加のトレーニングやパラメータを増やすことなく改善が得られることを確認した。

English

The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

TRAMS: 長距離言語モデリングのためのトレーニング不要なメモリ選択

TRAMS: Training-free Memory Selection for Long-range Language Modeling

要旨

Support