SCoCCA: 정준 상관 분석을 통한 다중 모드 희소 개념 분해

초록

시각-언어 모델의 내부 추론 과정을 해석하는 것은 안전이 중시되는 분야에 AI를 배포하기 위해 필수적입니다. 개념 기반 설명 가능성은 의미론적으로 의미 있는 구성 요소를 통해 모델의 동작을 표현함으로써 인간에 맞춰진 렌즈를 제공합니다. 그러나 기존 방법은 주로 이미지에 국한되어 있으며 크로스 모달 상호작용을 간과합니다. CLIP에서 생성된 것과 같은 텍스트-이미지 임베딩은 모달리티 간 격차 문제를 겪는데, 이는 시각적 특성과 텍스트 특성이 서로 다른 분포를 따르므로 해석 가능성을 제한합니다. 정준 상관 분석(CCA)은 서로 다른 분포의 특성을 정렬하는 원리 기반 방법을 제공하지만, 다중 모달 개념 수준 분석에 활용되지 않았습니다. 우리는 CCA와 InfoNCE의 목적이 밀접하게 연관되어 있어 CCA를 최적화함으로써 암묵적으로 InfoNCE를 최적화함을 보이며, 사전 훈련된 InfoNCE 목적 함수에 영향을 주지 않으면서 크로스 모달 정렬을 강화하는 간단하고 훈련이 필요 없는 메커니즘을 제시합니다. 이러한 관찰에 동기를 부여하여, 우리는 개념 기반 설명 가능성과 CCA를 결합하여 크로스 모달 임베딩을 정렬하면서 해석 가능한 개념 분해를 가능하게 하는 프레임워크인 Concept CCA(CoCCA)를 소개합니다. 우리는 이를 더욱 확장하여 희소성을 적용하여 더욱 분리되고 판별력 있는 개념을 생성하는 Sparse Concept CCA(SCoCCA)를 제안하며, 이는 향상된 활성화, 절제 및 의미론적 조작을 용이하게 합니다. 우리의 접근 방식은 개념 기반 설명을 다중 모달 임베딩으로 일반화하며, 개념 절제와 같은 재구성 및 조작 작업을 통해 입증된 개념 발견 분야에서 최첨단 성능을 달성합니다.

English

Interpreting the internal reasoning of vision-language models is essential for deploying AI in safety-critical domains. Concept-based explainability provides a human-aligned lens by representing a model's behavior through semantically meaningful components. However, existing methods are largely restricted to images and overlook the cross-modal interactions. Text-image embeddings, such as those produced by CLIP, suffer from a modality gap, where visual and textual features follow distinct distributions, limiting interpretability. Canonical Correlation Analysis (CCA) offers a principled way to align features from different distributions, but has not been leveraged for multi-modal concept-level analysis. We show that the objectives of CCA and InfoNCE are closely related, such that optimizing CCA implicitly optimizes InfoNCE, providing a simple, training-free mechanism to enhance cross-modal alignment without affecting the pre-trained InfoNCE objective. Motivated by this observation, we couple concept-based explainability with CCA, introducing Concept CCA (CoCCA), a framework that aligns cross-modal embeddings while enabling interpretable concept decomposition. We further extend it and propose Sparse Concept CCA (SCoCCA), which enforces sparsity to produce more disentangled and discriminative concepts, facilitating improved activation, ablation, and semantic manipulation. Our approach generalizes concept-based explanations to multi-modal embeddings and achieves state-of-the-art performance in concept discovery, evidenced by reconstruction and manipulation tasks such as concept ablation.

SCoCCA: 정준 상관 분석을 통한 다중 모드 희소 개념 분해

SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis

초록

Support