SCoCCA：基于典型相关分析的多模态稀疏概念解构

摘要

解译视觉语言模型的内部推理机制对于在安全关键领域部署人工智能至关重要。基于概念的可解释性方法通过语义化组件表征模型行为，提供了符合人类认知的理解视角。然而现有方法主要局限于单模态图像分析，忽视了跨模态交互作用。诸如CLIP模型生成的图文嵌入向量存在模态鸿沟问题——视觉与文本特征遵循不同分布规律，这限制了模型的可解释性。典型相关性分析（CCA）为对齐不同分布的特征提供了理论框架，但尚未被应用于多模态概念级分析。我们证明CCA与InfoNCE损失函数的目标密切关联，优化CCA实则隐式优化InfoNCE目标，从而在不影响预训练InfoNCE目标的前提下，提供了一种无需重新训练的简易跨模态对齐机制。基于此发现，我们将概念可解释性与CCA相结合，提出概念典型相关性分析（CoCCA）框架，在实现跨模态嵌入对齐的同时支持可解释的概念分解。进一步我们提出稀疏概念典型相关性分析（SCoCCA），通过施加稀疏性约束生成更具解耦性和判别性的概念，显著提升激活、消融及语义操控等任务的性能。该方法将基于概念的解释推广至多模态嵌入空间，在概念发现任务中达到最先进水平，这通过重构与操控任务（如概念消融）得到了验证。

English

Interpreting the internal reasoning of vision-language models is essential for deploying AI in safety-critical domains. Concept-based explainability provides a human-aligned lens by representing a model's behavior through semantically meaningful components. However, existing methods are largely restricted to images and overlook the cross-modal interactions. Text-image embeddings, such as those produced by CLIP, suffer from a modality gap, where visual and textual features follow distinct distributions, limiting interpretability. Canonical Correlation Analysis (CCA) offers a principled way to align features from different distributions, but has not been leveraged for multi-modal concept-level analysis. We show that the objectives of CCA and InfoNCE are closely related, such that optimizing CCA implicitly optimizes InfoNCE, providing a simple, training-free mechanism to enhance cross-modal alignment without affecting the pre-trained InfoNCE objective. Motivated by this observation, we couple concept-based explainability with CCA, introducing Concept CCA (CoCCA), a framework that aligns cross-modal embeddings while enabling interpretable concept decomposition. We further extend it and propose Sparse Concept CCA (SCoCCA), which enforces sparsity to produce more disentangled and discriminative concepts, facilitating improved activation, ablation, and semantic manipulation. Our approach generalizes concept-based explanations to multi-modal embeddings and achieves state-of-the-art performance in concept discovery, evidenced by reconstruction and manipulation tasks such as concept ablation.

SCoCCA：基于典型相关分析的多模态稀疏概念解构

SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis

摘要

Support