以標註者思維:資料集標記指示的生成
Thinking Like an Annotator: Generation of Dataset Labeling Instructions
June 24, 2023
作者: Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan
cs.AI
摘要
現代深度學習必不可少的是大規模數據集。支持者主張,理解這些方法需要數據集的透明度(例如“數據集的編輯、動機、組成、收集過程等”)。然而,幾乎沒有人建議公開發布提供給標註者的詳細定義和視覺類別示例 - 這些信息對於理解每個數據集中標註結構至關重要。這些標籤是公共數據集的核心,但很少有數據集包含用於生成它們的指示。我們提出了一個新任務,標籤指示生成,以解決缺失的公開可用標籤指示。在標籤指示生成中,我們採用一個合理標註的數據集並:1)生成一組在數據集中每個類別中具有視覺代表性的示例;2)為每個示例提供對應的文本標籤。我們引入了一個框架,無需模型訓練即可解決此任務,並包括一個新創建的快速檢索系統,利用一個大型、預先訓練的視覺和語言模型。這個框架充當人類標註者的代理,可以幫助生成最終的標籤指示集並評估其質量。我們的框架生成了數據集類別的多個多樣的視覺和文本表示。優化的指示集在NuImages上比我們最強基線高出7.06 mAP,在COCO上高出12.9 mAP,跨5次折疊。
English
Large-scale datasets are essential to modern day deep learning. Advocates
argue that understanding these methods requires dataset transparency (e.g.
"dataset curation, motivation, composition, collection process, etc...").
However, almost no one has suggested the release of the detailed definitions
and visual category examples provided to annotators - information critical to
understanding the structure of the annotations present in each dataset. These
labels are at the heart of public datasets, yet few datasets include the
instructions that were used to generate them. We introduce a new task, Labeling
Instruction Generation, to address missing publicly available labeling
instructions. In Labeling Instruction Generation, we take a reasonably
annotated dataset and: 1) generate a set of examples that are visually
representative of each category in the dataset; 2) provide a text label that
corresponds to each of the examples. We introduce a framework that requires no
model training to solve this task and includes a newly created rapid retrieval
system that leverages a large, pre-trained vision and language model. This
framework acts as a proxy to human annotators that can help to both generate a
final labeling instruction set and evaluate its quality. Our framework
generates multiple diverse visual and text representations of dataset
categories. The optimized instruction set outperforms our strongest baseline
across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO.