FACET: 컴퓨터 비전 평가 벤치마크를 위한 공정성

초록

컴퓨터 비전 모델은 성별 및 피부톤과 같은 속성에 따라 성능 차이가 존재하는 것으로 알려져 있습니다. 이는 분류 및 탐지와 같은 작업을 수행할 때, 이미지 속 사람들의 인구통계학적 특성에 따라 특정 클래스에 대한 모델 성능이 달라짐을 의미합니다. 이러한 차이가 존재한다는 사실은 이미 입증되었지만, 지금까지 컴퓨터 비전 모델의 일반적인 사용 사례에서 이러한 차이를 측정하기 위한 통합된 접근 방식은 없었습니다. 우리는 FACET(FAirness in Computer Vision EvaluaTion)이라는 새로운 벤치마크를 제안합니다. FACET은 이미지 분류, 객체 탐지, 세그멘테이션과 같은 가장 일반적인 비전 작업을 위한 32,000개의 이미지로 구성된 대규모 공개 평가 데이터셋입니다. FACET의 모든 이미지에 대해, 우리는 전문 검토자를 고용하여 인지된 피부톤 및 머리카락 유형과 같은 사람 관련 속성을 수동으로 주석 처리하고, 바운딩 박스를 수동으로 그리며, 디스크 자키나 기타리스트와 같은 세분화된 사람 관련 클래스를 라벨링했습니다. 또한, 우리는 FACET을 사용하여 최첨단 비전 모델을 벤치마킹하고, 민감한 인구통계학적 속성에 걸친 잠재적 성능 차이와 도전 과제에 대한 깊은 이해를 제시합니다. 수집된 포괄적인 주석을 사용하여, 우리는 단일 인구통계학적 속성뿐만 아니라 교차적 접근 방식(예: 머리카락 색상과 인지된 피부톤)을 사용하여 모델을 탐구합니다. 우리의 결과는 분류, 탐지, 세그멘테이션 및 시각적 그라운딩 모델이 인구통계학적 속성과 속성의 교차에 걸쳐 성능 차이를 보인다는 것을 보여줍니다. 이러한 문제는 데이터셋에 포함된 모든 사람들이 이러한 비전 작업에서 공정하고 형평성 있는 처리를 받지 못함을 시사합니다. 우리는 우리의 벤치마크를 사용한 현재 및 미래의 결과가 더 공정하고 견고한 비전 모델에 기여하기를 바랍니다. FACET은 https://facet.metademolab.com/에서 공개적으로 이용 가능합니다.

English

Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com/

FACET: 컴퓨터 비전 평가 벤치마크를 위한 공정성

FACET: Fairness in Computer Vision Evaluation Benchmark

초록

Support