LANCE：透過生成語言引導的反事實圖像來對視覺模型進行壓力測試

摘要

我們提出了一種自動化算法，通過生成語言引導的反事實測試圖像（LANCE）來對訓練過的視覺模型進行壓力測試。我們的方法利用了最近在大型語言建模和基於文本的圖像編輯方面的進展，通過增加一套多樣、逼真且具有挑戰性的測試圖像，來擴充一個IID測試集，而不會改變模型權重。我們在我們生成的數據上對多種預訓練模型的性能進行了基準測試，觀察到顯著且一致的性能下降。我們進一步分析了模型對不同類型編輯的敏感性，並展示了它在揭示ImageNet中以前未知的類別級模型偏見方面的應用。

English

We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet.

LANCE：透過生成語言引導的反事實圖像來對視覺模型進行壓力測試

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

摘要

Support