アノテーターのように考える：データセットラベリング指示の生成

要旨

大規模なデータセットは、現代の深層学習において不可欠です。支持者たちは、これらの手法を理解するためにはデータセットの透明性（例えば、「データセットのキュレーション、動機、構成、収集プロセスなど」）が必要だと主張しています。しかし、アノテーターに提供された詳細な定義や視覚的カテゴリーの例を公開することを提案する人はほとんどいませんでした。これらの情報は、各データセットに存在するアノテーションの構造を理解するために極めて重要です。これらのラベルは公開データセットの核心をなすものでありながら、それらを生成するために使用された指示を含むデータセットはほとんどありません。我々は、公開されているラベリング指示が欠如している問題に対処するために、新しいタスクである「ラベリング指示生成」を提案します。ラベリング指示生成では、適切にアノテーションされたデータセットを基に、1) データセット内の各カテゴリーを視覚的に代表する一連の例を生成し、2) 各例に対応するテキストラベルを提供します。我々は、このタスクを解決するためにモデルのトレーニングを必要としないフレームワークを導入し、大規模な事前学習済み視覚と言語モデルを活用した新たな高速検索システムを含めます。このフレームワークは、最終的なラベリング指示セットを生成し、その品質を評価するのに役立つ人間のアノテーターの代理として機能します。我々のフレームワークは、データセットのカテゴリーを多様な視覚的およびテキスト表現で生成します。最適化された指示セットは、NuImagesでは7.06 mAP、COCOでは12.9 mAPで、最も強力なベースラインを上回りました。

English

Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the structure of the annotations present in each dataset. These labels are at the heart of public datasets, yet few datasets include the instructions that were used to generate them. We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories. The optimized instruction set outperforms our strongest baseline across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO.

アノテーターのように考える：データセットラベリング指示の生成

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

要旨

Support