SEM：面向视觉语言模型事后去偏的稀疏嵌入调制方法

摘要

诸如CLIP这类连接视觉与语言的模型是多模态人工智能的核心组件，但其大规模非精选训练数据会引入严重的社会性偏见与伪相关性。现有的事后去偏方法通常直接在稠密的CLIP嵌入空间中操作，而该空间中偏见信息与任务相关信息高度耦合，导致在保持语义保真度的同时消除偏见的能力受限。本研究提出稀疏嵌入调制（SEM）——一种在稀疏自编码器（SAE）潜空间运行的零样本事后去偏框架。通过将CLIP文本嵌入解耦为分离特征，SEM可识别并调控偏见相关神经元，同时保留查询相关神经元，从而实现更精确的非线性干预。在四个基准数据集和两种CLIP主干网络上的实验表明，SEM在检索和零样本分类任务中显著提升了公平性。研究结果证明，稀疏潜表征为视觉语言模型的事后去偏提供了有效基础。

English

Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification. Our results demonstrate that sparse latent representations provide an effective foundation for post-hoc debiasing of vision-language models.

SEM：面向视觉语言模型事后去偏的稀疏嵌入调制方法

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

摘要

Support