LANCE: 言語誘導型反事実的画像生成による視覚モデルのストレステスト

要旨

我々は、訓練済みの視覚モデルをストレステストするための自動化アルゴリズムを提案する。この手法では、言語ガイドによる反実仮想テスト画像（LANCE）を生成する。我々の手法は、大規模言語モデリングとテキストベースの画像編集の最近の進展を活用し、モデルの重みを変更することなく、多様で現実的かつ挑戦的なテスト画像群をIIDテストセットに追加する。生成されたデータに対して、多様な事前訓練済みモデルの性能をベンチマークし、有意かつ一貫した性能低下を観察した。さらに、異なるタイプの編集に対するモデルの感度を分析し、ImageNetにおける未知のクラスレベルのモデルバイアスを表面化する適用可能性を実証する。

English

We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet.

LANCE: 言語誘導型反事実的画像生成による視覚モデルのストレステスト

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

要旨

Support