アテンションメカニズムにおける正規化の限界

要旨

本論文は、アテンションメカニズムにおける正規化の限界を調査する。まず、モデルの選択能力とトークン選択に関わる幾何学的分離を特定するための理論的枠組みを提示する。我々の分析には、ソフトマックススケーリング下でのトークンベクトルの距離と分離基準に関する明示的な境界が含まれる。事前学習済みGPT-2モデルを用いた実験を通じて、理論的結果を実証的に検証し、アテンションメカニズムの主要な挙動を分析する。特に、選択されるトークン数が増加するにつれて、モデルの有益なトークンを識別する能力が低下し、しばしば均一な選択パターンに収束することを示す。また、ソフトマックス正規化下での勾配感度が、特に低温設定において訓練中に課題を引き起こすことを示す。これらの知見は、ソフトマックスベースのアテンションメカニズムに対する現在の理解を進展させ、将来のアテンションアーキテクチャにおけるより堅牢な正規化と選択戦略の必要性を動機付ける。

English

This paper investigates the limitations of the normalization in attention mechanisms. We begin with a theoretical framework that enables the identification of the model's selective ability and the geometric separation involved in token selection. Our analysis includes explicit bounds on distances and separation criteria for token vectors under softmax scaling. Through experiments with pre-trained GPT-2 model, we empirically validate our theoretical results and analyze key behaviors of the attention mechanism. Notably, we demonstrate that as the number of selected tokens increases, the model's ability to distinguish informative tokens declines, often converging toward a uniform selection pattern. We also show that gradient sensitivity under softmax normalization presents challenges during training, especially at low temperature settings. These findings advance current understanding of softmax-based attention mechanism and motivate the need for more robust normalization and selection strategies in future attention architectures.

アテンションメカニズムにおける正規化の限界

Limitations of Normalization in Attention Mechanism

要旨

Support