LANCE：通过生成语言引导的反事实图像来对视觉模型进行压力测试

摘要

我们提出了一种自动化算法，通过生成语言引导的反事实测试图像（LANCE）来对经过训练的视觉模型进行压力测试。我们的方法利用了最近在大型语言建模和基于文本的图像编辑方面取得的进展，通过增加一个多样、逼真且具有挑战性的测试图像套件，而无需改变模型权重，来扩充一个IID测试集。我们在我们生成的数据上对一系列预训练模型的性能进行基准测试，并观察到显著且一致的性能下降。我们进一步分析了模型对不同类型编辑的敏感性，并展示了它在揭示ImageNet中以前未知的类别级模型偏见方面的适用性。

English

We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet.

LANCE：通过生成语言引导的反事实图像来对视觉模型进行压力测试

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

摘要

Support