ChatPaper.aiChatPaper

FACET:计算机视觉评估基准中的公平性

FACET: Fairness in Computer Vision Evaluation Benchmark

August 31, 2023
作者: Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross
cs.AI

摘要

计算机视觉模型在性别和肤色等属性上存在已知的性能差异。这意味着在诸如分类和检测的任务中,模型的性能会基于图像中人群的人口统计学特征而有所不同。这些差异已被证明存在,但直到现在还没有统一的方法来衡量计算机视觉模型常见用例中的这些差异。我们提出了一个名为FACET(FAirness in Computer Vision EvaluaTion)的新基准,这是一个包含32k图像的大型、公开可用的评估集,用于一些最常见的视觉任务 - 图像分类、目标检测和分割。对于FACET中的每个图像,我们雇佣专家评审员手动注释人物相关属性,如肤色和发型,手动绘制边界框,并标记细粒度的人物相关类别,如碟片骑师或吉他手。此外,我们使用FACET来评估最先进的视觉模型,并深入了解敏感人口统计属性之间的潜在性能差异和挑战。通过收集详尽的注释,我们使用单一人口统计属性以及交叉方法(例如头发颜色和肤色)来测试模型。我们的结果表明,分类、检测、分割和视觉定位模型在人口统计属性和属性交叉上表现出性能差异。这些伤害表明,并非所有出现在数据集中的人在这些视觉任务中都能获得公平和公正的对待。我们希望使用我们的基准的当前和未来结果将有助于构建更公平、更健壮的视觉模型。FACET可在https://facet.metademolab.com/ 上公开获取。
English
Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com/
PDF182December 15, 2024