SEM: Sparse Embedding Modulatie voor Post-Hoc Debiasen van Visueel-Taalmodellen

Samenvatting

Modellen die visie en taal verbinden, zoals CLIP, zijn cruciale componenten van multimodale AI, maar hun grootschalige, ongefiltreerde trainingsdata introduceren ernstige sociale en spurious biases. Bestaande post-hoc debiasingmethoden werken vaak direct in de dense CLIP-embeddingruimte, waar bias en taakrelevante informatie sterk verweven zijn. Deze verwevenheid beperkt hun vermogen om bias te verwijderen zonder de semantische trouw aan te tasten. In dit werk stellen we Sparse Embedding Modulation (SEM) voor, een post-hoc, zero-shot debiasingraamwerk dat opereert in een Sparse Autoencoder (SAE) latente ruimte. Door CLIP-tekstembeddings te ontbinden in ontvlochten kenmerken, identificeert en moduleert SEM biasrelevante neuronen terwijl queryrelevante behouden blijven. Dit maakt preciezere, niet-lineaire interventies mogelijk. Over vier benchmarkdatasets en twee CLIP-backbones heen behaalt SEM aanzienlijke verbeteringen in eerlijkheid bij retrieval en zero-shot classificatie. Onze resultaten tonen aan dat sparse latente representaties een effectieve basis bieden voor post-hoc debiasing van visie-taalmodellen.

English

Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification. Our results demonstrate that sparse latent representations provide an effective foundation for post-hoc debiasing of vision-language models.

SEM: Sparse Embedding Modulatie voor Post-Hoc Debiasen van Visueel-Taalmodellen

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Samenvatting

Support