LRP再考：Transformerの説明可能性における位置情報の属性付けが欠けていた要素

要旨

Transformerモデルに対する効果的な説明可能性ツールの開発は、深層学習研究における重要な課題です。この分野で最も有望なアプローチの一つが、層ごとの関連性伝播法（Layer-wise Relevance Propagation, LRP）です。LRPは、事前に定義されたルールに基づいて活性化値を再分配し、関連性スコアをネットワークを通じて入力空間に逆伝播させます。しかし、Transformerの説明可能性に関する既存のLRPベースの手法は、Transformerアーキテクチャの重要な構成要素である位置エンコーディング（Positional Encoding, PE）を完全に見落としており、これにより保存性の原則が破られ、構造的および位置的特徴に関連する重要なタイプの関連性が失われています。この制約を解決するため、我々はTransformerの説明可能性における入力空間を位置-トークンペアの集合として再定式化しました。これにより、Rotary、Learnable、Absolute PEなど、さまざまな位置エンコーディング手法にわたって帰属を伝播するための、理論的に基づいた専用のLRPルールを提案することが可能になりました。LLaMA 3などのファインチューニングされた分類器やゼロショット基盤モデルを用いた広範な実験により、我々の手法が視覚およびNLPの説明可能性タスクにおいて最先端の手法を大幅に上回ることを実証しました。我々のコードは公開されています。

English

The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for Transformer explainability entirely overlook a critical component of the Transformer architecture: its positional encoding (PE), resulting in violation of the conservation property, and the loss of an important and unique type of relevance, which is also associated with structural and positional features. To address this limitation, we reformulate the input space for Transformer explainability as a set of position-token pairs. This allows us to propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods, including Rotary, Learnable, and Absolute PE. Extensive experiments with both fine-tuned classifiers and zero-shot foundation models, such as LLaMA 3, demonstrate that our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks. Our code is publicly available.

LRP再考：Transformerの説明可能性における位置情報の属性付けが欠けていた要素

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability

要旨

Support