RePOPE: POPEベンチマークにおけるアノテーションエラーの影響

要旨

データアノテーションはコストがかかるため、ベンチマークデータセットでは既存の画像データセットのラベルを組み込むことが多い。本研究では、MSCOCOのラベルエラーが頻繁に使用される物体幻覚ベンチマークPOPEに与える影響を評価する。ベンチマーク画像を再アノテーションし、異なるサブセット間でアノテーションエラーの不均衡を特定した。修正されたラベル（RePOPEと称する）を用いて複数のモデルを評価した結果、モデルのランキングに顕著な変化が観察され、ラベル品質の影響が浮き彫りになった。コードとデータはhttps://github.com/YanNeu/RePOPEで公開されている。

English

Since data annotation is costly, benchmark datasets often incorporate labels from established image datasets. In this work, we assess the impact of label errors in MSCOCO on the frequently used object hallucination benchmark POPE. We re-annotate the benchmark images and identify an imbalance in annotation errors across different subsets. Evaluating multiple models on the revised labels, which we denote as RePOPE, we observe notable shifts in model rankings, highlighting the impact of label quality. Code and data are available at https://github.com/YanNeu/RePOPE .

RePOPE: POPEベンチマークにおけるアノテーションエラーの影響

RePOPE: Impact of Annotation Errors on the POPE Benchmark

要旨

Support