ChatPaper.aiChatPaper

以標註者思維:資料集標記指示的生成

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

June 24, 2023
作者: Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan
cs.AI

摘要

現代深度學習必不可少的是大規模數據集。支持者主張,理解這些方法需要數據集的透明度(例如“數據集的編輯、動機、組成、收集過程等”)。然而,幾乎沒有人建議公開發布提供給標註者的詳細定義和視覺類別示例 - 這些信息對於理解每個數據集中標註結構至關重要。這些標籤是公共數據集的核心,但很少有數據集包含用於生成它們的指示。我們提出了一個新任務,標籤指示生成,以解決缺失的公開可用標籤指示。在標籤指示生成中,我們採用一個合理標註的數據集並:1)生成一組在數據集中每個類別中具有視覺代表性的示例;2)為每個示例提供對應的文本標籤。我們引入了一個框架,無需模型訓練即可解決此任務,並包括一個新創建的快速檢索系統,利用一個大型、預先訓練的視覺和語言模型。這個框架充當人類標註者的代理,可以幫助生成最終的標籤指示集並評估其質量。我們的框架生成了數據集類別的多個多樣的視覺和文本表示。優化的指示集在NuImages上比我們最強基線高出7.06 mAP,在COCO上高出12.9 mAP,跨5次折疊。
English
Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the structure of the annotations present in each dataset. These labels are at the heart of public datasets, yet few datasets include the instructions that were used to generate them. We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories. The optimized instruction set outperforms our strongest baseline across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO.
PDF81December 15, 2024