让分类器“唱”起来：解析分类器中的语义不变量

摘要

所有分类器（包括最先进的视觉模型）都具有不变性特征，这些特征部分源于其线性映射的几何结构。这些存在于分类器零空间中的不变性会诱导出映射到相同输出的等效输入集合。由于现有方法难以提供人类可解读的信息，这些不变性的语义内容始终模糊不清。为弥补这一空白，我们提出零空间几何语义解释方法（SING），该方法能构建相对于网络的等效图像，并为可用变化赋予语义解释。我们通过将网络特征映射到多模态视觉语言模型，从而获得自然语言描述和诱导语义转换的可视化示例。SING既可应用于单张图像以揭示局部不变性，也可应用于图像集，从而在类别和模型层面实现广泛的统计分析。例如，我们的方法表明ResNet50会将相关语义属性泄露至零空间，而基于自监督DINO预训练的视觉Transformer模型DinoViT，在保持不变空间中类别语义方面表现更优。

English

All classifiers, including state-of-the-art vision models, possess invariants, partially rooted in the geometry of their linear mappings. These invariants, which reside in the null-space of the classifier, induce equivalent sets of inputs that map to identical outputs. The semantic content of these invariants remains vague, as existing approaches struggle to provide human-interpretable information. To address this gap, we present Semantic Interpretation of the Null-space Geometry (SING), a method that constructs equivalent images, with respect to the network, and assigns semantic interpretations to the available variations. We use a mapping from network features to multi-modal vision language models. This allows us to obtain natural language descriptions and visual examples of the induced semantic shifts. SING can be applied to a single image, uncovering local invariants, or to sets of images, allowing a breadth of statistical analysis at the class and model levels. For example, our method reveals that ResNet50 leaks relevant semantic attributes to the null space, whereas DinoViT, a ViT pretrained with self-supervised DINO, is superior in maintaining class semantics across the invariant space.