注意を喚起します！マスク画像モデリングにおける注意深いプロービングの再考

要旨

ファインチューニング（FT）が大規模化に伴い実用的でなくなってきている中、自己教師あり学習（SSL）の評価プロトコルとしてプロービングが注目を集めている。しかし、標準的な線形プロービング（LP）は、パッチトークンの分散的な性質により、Masked Image Modeling（MIM）で訓練されたモデルの潜在能力を十分に反映できない。これにより、パッチレベルの特徴を選択的に集約するためにアテンションを使用する代替手法であるアテンションプロービングの必要性が高まっている。しかし、その採用が増えているにもかかわらず、アテンションプロービングは未だに十分に研究されておらず、既存の手法は過剰なパラメータ化と計算効率の低さに悩まされている。本研究では、精度と効率のトレードオフという観点からアテンションプロービングを再検討する。既存の手法のメカニズムを分析し、その性能をベンチマークする体系的な研究を行った。我々は、冗長な射影を排除し、訓練可能なパラメータの数を削減し、従来のマルチヘッドアテンションに比べて最大10倍の高速化を実現するマルチクエリクロスアテンションメカニズムである効率的プロービング（EP）を提案する。そのシンプルさにもかかわらず、EPはLPおよび従来のアテンションプロービング手法を7つのベンチマークで上回り、MIMを超えて多様な事前学習パラダイムにうまく一般化し、解釈可能なアテンションマップを生成し、低ショットおよびレイヤーワイズ設定において強い改善を示す。コードはhttps://github.com/billpsomas/efficient-probingで公開されている。

English

As fine-tuning (FT) becomes increasingly impractical at scale, probing is emerging as the preferred evaluation protocol for self-supervised learning (SSL). Yet, the standard linear probing (LP) fails to adequately reflect the potential of models trained with Masked Image Modeling (MIM), due to the distributed nature of patch tokens. This motivates the need for attentive probing, an alternative that uses attention to selectively aggregate patch-level features. Despite its growing adoption, attentive probing remains under-explored, with existing methods suffering from excessive parameterization and poor computational efficiency. In this work, we revisit attentive probing through the lens of the accuracy-efficiency trade-off. We conduct a systematic study of existing methods, analyzing their mechanisms and benchmarking their performance. We introduce efficient probing (EP), a multi-query cross-attention mechanism that eliminates redundant projections, reduces the number of trainable parameters, and achieves up to a 10times speed-up over conventional multi-head attention. Despite its simplicity, EP outperforms LP and prior attentive probing approaches across seven benchmarks, generalizes well beyond MIM to diverse pre-training paradigms, produces interpretable attention maps, and achieves strong gains in low-shot and layer-wise settings. Code available at https://github.com/billpsomas/efficient-probing.

注意を喚起します！マスク画像モデリングにおける注意深いプロービングの再考

Attention, Please! Revisiting Attentive Probing for Masked Image Modeling

要旨

Support