FACET：計算機視覺評估基準中的公平性

摘要

電腦視覺模型在性別和膚色等屬性上已知存在表現差異。這意味著在分類和檢測等任務中，模型的表現會根據圖像中人物的人口統計特徵而有所不同。這些差異已被證實存在，但迄今為止尚未有統一方法來衡量電腦視覺模型常見用例中的這些差異。我們提出了一個名為FACET（FAirness in Computer Vision EvaluaTion）的新基準，這是一個包含32k圖像的大型、公開可用的評估集，用於一些最常見的視覺任務 - 圖像分類、物體檢測和分割。對於FACET中的每張圖像，我們聘請專家審查員手動標註人物相關屬性，如感知的膚色和髮型，手動繪製邊界框並標記細粒度的人物相關類別，如碟片騎師或吉他手。此外，我們使用FACET來對最先進的視覺模型進行基準測試，並對敏感人口統計屬性之間的潛在表現差異和挑戰進行更深入的理解。通過收集的詳盡標註，我們使用單一人口統計屬性以及交集方法（例如髮色和感知的膚色）來探測模型。我們的結果顯示，分類、檢測、分割和視覺定位模型在人口統計屬性和屬性交集上都存在表現差異。這些損害表明，數據集中代表的所有人在這些視覺任務中並未獲得公平和公正的對待。我們希望使用我們的基準測試的現有和未來結果將有助於建立更公平、更強大的視覺模型。FACET可在https://facet.metademolab.com/ 公開獲取。

English

Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com/

FACET：計算機視覺評估基準中的公平性

FACET: Fairness in Computer Vision Evaluation Benchmark

摘要

Support