重探LRP：位置歸因作為Transformer可解釋性的缺失要素

摘要

開發有效的Transformer可解釋性工具是深度學習研究中的一項關鍵追求。在這一領域，層次相關性傳播（Layer-wise Relevance Propagation, LRP）是最具前景的方法之一，它通過基於預定義規則重新分配激活值，將相關性分數從網絡向後傳播至輸入空間。然而，現有的基於LRP的Transformer可解釋性方法完全忽略了Transformer架構中的一個關鍵組件：位置編碼（Positional Encoding, PE），這導致了守恆屬性的違反，以及與結構和位置特徵相關的重要且獨特的相關性類型的丟失。為解決這一局限，我們將Transformer可解釋性的輸入空間重新表述為一組位置-詞元對。這使我們能夠提出專門的、理論基礎紮實的LRP規則，旨在跨多種位置編碼方法（包括旋轉式、可學習式和絕對式PE）傳播歸因。通過對微調分類器和零樣本基礎模型（如LLaMA 3）的廣泛實驗，我們的方法在視覺和自然語言處理的可解釋性任務中均顯著優於現有最先進技術。我們的代碼已公開提供。

English

The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for Transformer explainability entirely overlook a critical component of the Transformer architecture: its positional encoding (PE), resulting in violation of the conservation property, and the loss of an important and unique type of relevance, which is also associated with structural and positional features. To address this limitation, we reformulate the input space for Transformer explainability as a set of position-token pairs. This allows us to propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods, including Rotary, Learnable, and Absolute PE. Extensive experiments with both fine-tuned classifiers and zero-shot foundation models, such as LLaMA 3, demonstrate that our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks. Our code is publicly available.

重探LRP：位置歸因作為Transformer可解釋性的缺失要素

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability

摘要

Support