MineTheGap:文本到图像模型偏见的自动挖掘框架
MineTheGap: Automatic Mining of Biases in Text-to-Image Models
December 15, 2025
作者: Noa Cohen, Nurit Spingarn-Eliezer, Inbar Huberman-Spiegelglas, Tomer Michaeli
cs.AI
摘要
文本到图像生成模型根据文本提示生成图像,但提示内容常使预期图像的某些方面存在模糊性。面对这些模糊描述时,研究表明TTI模型在解读过程中会表现出特定偏好。这种偏好可能产生社会影响,例如当模型仅展示特定种族从事某种职业时;在生成图像集合中产生冗余而非呈现多样性可能时,也会影响用户体验。本文提出MineTheGap方法——一种能自动挖掘引发TTI模型产生偏好性输出的提示文本的技术。我们的方法不仅限于检测给定提示的偏差,还通过遗传算法迭代优化提示池,主动寻找能暴露模型偏好的提示。该优化过程由新颖的偏好严重度评分驱动(我们在已知偏好数据集上验证了其有效性),该评分通过对比生成图像的分布与基于提示文本生成的LLM文本变体分布来计算。相关代码和示例已发布于项目网页。
English
Text-to-Image (TTI) models generate images based on text prompts, which often leave certain aspects of the desired image ambiguous. When faced with these ambiguities, TTI models have been shown to exhibit biases in their interpretations. These biases can have societal impacts, e.g., when showing only a certain race for a stated occupation. They can also affect user experience when creating redundancy within a set of generated images instead of spanning diverse possibilities. Here, we introduce MineTheGap - a method for automatically mining prompts that cause a TTI model to generate biased outputs. Our method goes beyond merely detecting bias for a given prompt. Rather, it leverages a genetic algorithm to iteratively refine a pool of prompts, seeking for those that expose biases. This optimization process is driven by a novel bias score, which ranks biases according to their severity, as we validate on a dataset with known biases. For a given prompt, this score is obtained by comparing the distribution of generated images to the distribution of LLM-generated texts that constitute variations on the prompt. Code and examples are available on the project's webpage.