ChatPaper.aiChatPaper

YOLOv9:使用可程式化梯度資訊學習想要學習的內容

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

February 21, 2024
作者: Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
cs.AI

摘要

當今的深度學習方法著重於如何設計最適當的目標函數,以使模型的預測結果最接近真實情況。同時,必須設計一個合適的架構,以便獲取足夠的信息進行預測。現有方法忽略了一個事實,即當輸入數據經過逐層特徵提取和空間轉換時,將會丟失大量信息。本文將深入探討數據在通過深度網絡傳輸時的重要問題,即信息瓶頸和可逆函數。我們提出了可編程梯度信息(PGI)的概念,以應對深度網絡實現多目標所需的各種變化。PGI可以為目標任務提供完整的輸入信息,以計算目標函數,從而獲得可靠的梯度信息來更新網絡權重。此外,我們設計了一種新的輕量級網絡架構——通用高效層聚合網絡(GELAN),基於梯度路徑規劃。GELAN的架構證實了PGI在輕量級模型上取得了優異結果。我們在 MS COCO 數據集上的目標檢測中驗證了所提出的 GELAN 和 PGI。結果顯示,GELAN僅使用傳統卷積運算符,實現了比基於深度卷積的最先進方法更好的參數利用率。PGI可用於各種模型,從輕量級到大型模型均可使用。它可用於獲取完整信息,使得從頭開始訓練的模型可以取得比基於大型數據集預訓練的最先進模型更好的結果,比較結果請參見圖1。源代碼位於:https://github.com/WongKinYiu/yolov9。
English
Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost. This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions. We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN's architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1. The source codes are at: https://github.com/WongKinYiu/yolov9.
PDF493December 15, 2024