YOLOv9: プログラマブル勾配情報を用いて学びたいことを学ぶ

要旨

今日の深層学習手法は、モデルの予測結果が真値に最も近づくように、最も適切な目的関数を設計することに焦点を当てています。同時に、予測に十分な情報を取得できる適切なアーキテクチャを設計する必要があります。既存の手法は、入力データが層ごとの特徴抽出と空間変換を経る際に、大量の情報が失われるという事実を無視しています。本論文では、データが深層ネットワークを通過する際の情報損失、すなわち情報ボトルネックと可逆関数という重要な問題について深く掘り下げます。我々は、深層ネットワークが複数の目的を達成するために必要な様々な変化に対応するため、プログラム可能な勾配情報（PGI）という概念を提案しました。PGIは、目的関数を計算するためにターゲットタスクに対する完全な入力情報を提供し、信頼性の高い勾配情報を取得してネットワークの重みを更新することができます。さらに、勾配経路計画に基づいた新しい軽量ネットワークアーキテクチャ――Generalized Efficient Layer Aggregation Network（GELAN）を設計しました。GELANのアーキテクチャは、PGIが軽量モデルで優れた結果を得ていることを確認しています。我々は、提案したGELANとPGIをMS COCOデータセットに基づく物体検証で検証しました。その結果、GELANは従来の畳み込み演算子のみを使用して、深さ方向畳み込みに基づいて開発された最先端の手法よりも優れたパラメータ利用率を達成することが示されました。PGIは、軽量から大規模までの様々なモデルに使用できます。完全な情報を取得するために使用できるため、大規模なデータセットで事前学習された最先端のモデルよりも、ゼロから学習したモデルが優れた結果を達成することができます。比較結果は図1に示されています。ソースコードは以下にあります：https://github.com/WongKinYiu/yolov9。

English

Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost. This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions. We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN's architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1. The source codes are at: https://github.com/WongKinYiu/yolov9.

YOLOv9: プログラマブル勾配情報を用いて学びたいことを学ぶ

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

要旨

Support