ChatPaper.aiChatPaper

像标注者一样思考:数据集标注指令的生成

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

June 24, 2023
作者: Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan
cs.AI

摘要

现代深度学习必不可少的是大规模数据集。倡导者认为,理解这些方法需要数据集透明度(例如“数据集策划、动机、组成、收集过程等…”)。然而,几乎没有人建议公开详细定义和提供给标注者的视觉类别示例 - 这些信息对于理解每个数据集中注释结构至关重要。这些标签是公共数据集的核心,然而很少有数据集包括用于生成它们的指示。我们引入了一个新任务,即标签指令生成,以解决缺失的公开可用标签指令。在标签指令生成中,我们拿一个已经合理标注的数据集,并:1)生成一组在数据集中每个类别上视觉上代表性的示例;2)为每个示例提供相应的文本标签。我们引入了一个框架,无需模型训练即可解决此任务,并包括一个新创建的快速检索系统,利用一个大型、预训练的视觉和语言模型。这个框架充当人类标注者的代理,可以帮助生成最终的标签指令集并评估其质量。我们的框架生成了数据集类别的多种不同的视觉和文本表示。优化后的指令集在NuImages上比我们最强基线高出了7.06 mAP,在COCO上高出了12.9 mAP。
English
Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the structure of the annotations present in each dataset. These labels are at the heart of public datasets, yet few datasets include the instructions that were used to generate them. We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories. The optimized instruction set outperforms our strongest baseline across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO.
PDF81December 15, 2024