YOLOv9:使用可编程梯度信息学习想要学习的内容
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
February 21, 2024
作者: Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
cs.AI
摘要
当今的深度学习方法侧重于如何设计最合适的目标函数,以使模型的预测结果与基本事实最接近。同时,必须设计一个能够为预测获取足够信息的适当架构。现有方法忽略了一个事实,即当输入数据经过逐层特征提取和空间转换时,将会丢失大量信息。本文将深入探讨数据在通过深度网络传输时出现的重要问题,即信息瓶颈和可逆函数。我们提出了可编程梯度信息(PGI)的概念,以应对深度网络实现多个目标所需的各种变化。PGI可以为目标任务提供完整的输入信息,以计算目标函数,从而获得可靠的梯度信息来更新网络权重。此外,基于梯度路径规划设计了一种新的轻量级网络架构——广义高效层聚合网络(GELAN)。GELAN的架构证实了PGI在轻量级模型上取得了优越的结果。我们在基于 MS COCO 数据集的目标检测上验证了提出的 GELAN 和 PGI。结果显示,GELAN 仅使用传统卷积算子即可实现比基于深度卷积开发的最先进方法更好的参数利用率。PGI可用于各种模型,从轻量级到大型模型。它可用于获取完整信息,使得从头开始训练的模型可以获得比使用大型数据集预训练的最先进模型更好的结果,比较结果见图1。源代码位于:https://github.com/WongKinYiu/yolov9。
English
Today's deep learning methods focus on how to design the most appropriate
objective functions so that the prediction results of the model can be closest
to the ground truth. Meanwhile, an appropriate architecture that can facilitate
acquisition of enough information for prediction has to be designed. Existing
methods ignore a fact that when input data undergoes layer-by-layer feature
extraction and spatial transformation, large amount of information will be
lost. This paper will delve into the important issues of data loss when data is
transmitted through deep networks, namely information bottleneck and reversible
functions. We proposed the concept of programmable gradient information (PGI)
to cope with the various changes required by deep networks to achieve multiple
objectives. PGI can provide complete input information for the target task to
calculate objective function, so that reliable gradient information can be
obtained to update network weights. In addition, a new lightweight network
architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based
on gradient path planning is designed. GELAN's architecture confirms that PGI
has gained superior results on lightweight models. We verified the proposed
GELAN and PGI on MS COCO dataset based object detection. The results show that
GELAN only uses conventional convolution operators to achieve better parameter
utilization than the state-of-the-art methods developed based on depth-wise
convolution. PGI can be used for variety of models from lightweight to large.
It can be used to obtain complete information, so that train-from-scratch
models can achieve better results than state-of-the-art models pre-trained
using large datasets, the comparison results are shown in Figure 1. The source
codes are at: https://github.com/WongKinYiu/yolov9.Summary
AI-Generated Summary