なぜそうなのか：視覚基盤モデルが自己説明可能な分類器としての役割

要旨

視覚基盤モデル（VFM）は、その最先端の性能によりますます注目を集めています。しかし、重要なアプリケーションにおいては解釈可能性が依然として重要です。この観点から、自己説明可能モデル（SEM）は、予測を解釈可能な概念の重み付き和に分解する解釈可能な分類器を提供することを目指しています。その可能性にもかかわらず、最近の研究では、これらの説明がしばしば信頼性に欠けることが示されています。本研究では、VFMを新たなプロトタイプ的アーキテクチャと専門的な学習目標と組み合わせます。凍結されたVFMの上に軽量なヘッド（約100万パラメータ）のみを学習させることで、我々のアプローチ（ProtoFM）は効率的で解釈可能なソリューションを提供します。評価の結果、我々のアプローチは競争力のある分類性能を達成しつつ、文献に基づく一連の解釈可能性指標において既存のモデルを上回ることが示されました。コードはhttps://github.com/hturbe/proto-fmで公開されています。

English

Visual foundation models (VFMs) have become increasingly popular due to their state-of-the-art performance. However, interpretability remains crucial for critical applications. In this sense, self-explainable models (SEM) aim to provide interpretable classifiers that decompose predictions into a weighted sum of interpretable concepts. Despite their promise, recent studies have shown that these explanations often lack faithfulness. In this work, we combine VFMs with a novel prototypical architecture and specialized training objectives. By training only a lightweight head (approximately 1M parameters) on top of frozen VFMs, our approach (ProtoFM) offers an efficient and interpretable solution. Evaluations demonstrate that our approach achieves competitive classification performance while outperforming existing models across a range of interpretability metrics derived from the literature. Code is available at https://github.com/hturbe/proto-fm.

なぜそうなのか：視覚基盤モデルが自己説明可能な分類器としての役割

Tell me why: Visual foundation models as self-explainable classifiers

要旨

Support