FACET: コンピュータビジョン評価ベンチマークにおける公平性

要旨

コンピュータビジョンモデルは、性別や肌の色などの属性に応じて性能の差異が存在することが知られています。これは、分類や検出などのタスクにおいて、画像内の人物の人口統計学的特性に基づいて、特定のクラスに対するモデルの性能が異なることを意味します。これらの差異は存在することが示されていますが、これまでコンピュータビジョンモデルの一般的な使用例におけるこれらの差異を測定する統一的なアプローチはありませんでした。私たちは、FACET（FAirness in Computer Vision EvaluaTion）という新しいベンチマークを提案します。これは、画像分類、物体検出、セグメンテーションといった最も一般的なビジョンタスクのための32,000枚の公開評価データセットです。FACETのすべての画像に対して、専門のレビュアーを雇い、知覚された肌の色や髪のタイプなどの人物関連属性を手動で注釈付けし、手動でバウンディングボックスを描き、ディスクジョッキーやギタリストなどの細かい人物関連クラスをラベル付けしました。さらに、FACETを使用して最先端のビジョンモデルをベンチマークし、敏感な人口統計学的属性にわたる潜在的な性能の差異と課題についてより深い理解を提示します。収集した網羅的な注釈を使用して、単一の人口統計学的属性および交差的なアプローチ（例：髪の色と知覚された肌の色）を使用してモデルを調査します。私たちの結果は、分類、検出、セグメンテーション、および視覚的グラウンディングモデルが、人口統計学的属性および属性の交差にわたって性能の差異を示すことを示しています。これらの害は、データセットに含まれるすべての人々がこれらのビジョンタスクにおいて公平かつ公正な扱いを受けていないことを示唆しています。私たちのベンチマークを使用した現在および将来の結果が、より公平で堅牢なビジョンモデルに貢献することを願っています。FACETはhttps://facet.metademolab.com/で公開されています。

English

Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com/

FACET: コンピュータビジョン評価ベンチマークにおける公平性

FACET: Fairness in Computer Vision Evaluation Benchmark

要旨

Support