YOLOv9:使用可程式化梯度資訊學習想要學習的內容
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
February 21, 2024
作者: Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
cs.AI
摘要
當今的深度學習方法著重於如何設計最適當的目標函數,以使模型的預測結果最接近真實情況。同時,必須設計一個合適的架構,以便獲取足夠的信息進行預測。現有方法忽略了一個事實,即當輸入數據經過逐層特徵提取和空間轉換時,將會丟失大量信息。本文將深入探討數據在通過深度網絡傳輸時的重要問題,即信息瓶頸和可逆函數。我們提出了可編程梯度信息(PGI)的概念,以應對深度網絡實現多目標所需的各種變化。PGI可以為目標任務提供完整的輸入信息,以計算目標函數,從而獲得可靠的梯度信息來更新網絡權重。此外,我們設計了一種新的輕量級網絡架構——通用高效層聚合網絡(GELAN),基於梯度路徑規劃。GELAN的架構證實了PGI在輕量級模型上取得了優異結果。我們在 MS COCO 數據集上的目標檢測中驗證了所提出的 GELAN 和 PGI。結果顯示,GELAN僅使用傳統卷積運算符,實現了比基於深度卷積的最先進方法更好的參數利用率。PGI可用於各種模型,從輕量級到大型模型均可使用。它可用於獲取完整信息,使得從頭開始訓練的模型可以取得比基於大型數據集預訓練的最先進模型更好的結果,比較結果請參見圖1。源代碼位於:https://github.com/WongKinYiu/yolov9。
English
Today's deep learning methods focus on how to design the most appropriate
objective functions so that the prediction results of the model can be closest
to the ground truth. Meanwhile, an appropriate architecture that can facilitate
acquisition of enough information for prediction has to be designed. Existing
methods ignore a fact that when input data undergoes layer-by-layer feature
extraction and spatial transformation, large amount of information will be
lost. This paper will delve into the important issues of data loss when data is
transmitted through deep networks, namely information bottleneck and reversible
functions. We proposed the concept of programmable gradient information (PGI)
to cope with the various changes required by deep networks to achieve multiple
objectives. PGI can provide complete input information for the target task to
calculate objective function, so that reliable gradient information can be
obtained to update network weights. In addition, a new lightweight network
architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based
on gradient path planning is designed. GELAN's architecture confirms that PGI
has gained superior results on lightweight models. We verified the proposed
GELAN and PGI on MS COCO dataset based object detection. The results show that
GELAN only uses conventional convolution operators to achieve better parameter
utilization than the state-of-the-art methods developed based on depth-wise
convolution. PGI can be used for variety of models from lightweight to large.
It can be used to obtain complete information, so that train-from-scratch
models can achieve better results than state-of-the-art models pre-trained
using large datasets, the comparison results are shown in Figure 1. The source
codes are at: https://github.com/WongKinYiu/yolov9.