RePOPE: POPE 벤치마크에서의 주석 오류 영향 분석

초록

데이터 주석 작업은 비용이 많이 들기 때문에, 벤치마크 데이터셋은 종종 기존 이미지 데이터셋의 레이블을 통합합니다. 본 연구에서는 MSCOCO의 레이블 오류가 자주 사용되는 객체 환각 벤치마크인 POPE에 미치는 영향을 평가합니다. 벤치마크 이미지를 재주석하고, 다양한 하위 집단 간 주석 오류의 불균형을 확인했습니다. 수정된 레이블(RePOPE로 명명)을 사용하여 여러 모델을 평가한 결과, 모델 순위에 상당한 변화가 관찰되어 레이블 품질의 영향을 강조했습니다. 코드와 데이터는 https://github.com/YanNeu/RePOPE에서 확인할 수 있습니다.

English

Since data annotation is costly, benchmark datasets often incorporate labels from established image datasets. In this work, we assess the impact of label errors in MSCOCO on the frequently used object hallucination benchmark POPE. We re-annotate the benchmark images and identify an imbalance in annotation errors across different subsets. Evaluating multiple models on the revised labels, which we denote as RePOPE, we observe notable shifts in model rankings, highlighting the impact of label quality. Code and data are available at https://github.com/YanNeu/RePOPE .

RePOPE: POPE 벤치마크에서의 주석 오류 영향 분석

RePOPE: Impact of Annotation Errors on the POPE Benchmark

초록

Support