YOLOv9: 프로그래밍 가능한 그래디언트 정보를 활용하여 원하는 것을 학습하는 방법

초록

오늘날의 딥러닝 방법론은 모델의 예측 결과가 실제 값에 최대한 가까워질 수 있도록 가장 적절한 목적 함수를 설계하는 데 초점을 맞추고 있다. 동시에, 예측을 위해 충분한 정보를 획득할 수 있도록 적절한 아키텍처를 설계해야 한다. 기존의 방법들은 입력 데이터가 계층별 특징 추출과 공간 변환을 거치면서 대량의 정보가 손실된다는 사실을 간과하고 있다. 본 논문은 데이터가 딥 네트워크를 통해 전달될 때 발생하는 데이터 손실의 중요한 문제, 즉 정보 병목 현상과 가역 함수에 대해 심층적으로 탐구한다. 우리는 다양한 목표를 달성하기 위해 딥 네트워크가 요구하는 다양한 변화에 대응하기 위해 프로그래머블 그래디언트 정보(Programmable Gradient Information, PGI) 개념을 제안하였다. PGI는 목표 작업을 위한 완전한 입력 정보를 제공하여 목적 함수를 계산할 수 있도록 하여, 신뢰할 수 있는 그래디언트 정보를 얻어 네트워크 가중치를 업데이트할 수 있게 한다. 또한, 그래디언트 경로 계획을 기반으로 한 새로운 경량 네트워크 아키텍처인 일반화된 효율적 계층 집합 네트워크(Generalized Efficient Layer Aggregation Network, GELAN)를 설계하였다. GELAN의 아키텍처는 PGI가 경량 모델에서 우수한 결과를 얻었음을 확인한다. 우리는 제안된 GELAN과 PGI를 MS COCO 데이터셋 기반 객체 탐지에서 검증하였다. 결과는 GELAN이 깊이별 합성곱(depth-wise convolution)을 기반으로 개발된 최신 방법들보다 더 나은 매개변수 활용을 달성하기 위해 기존의 합성곱 연산자만을 사용함을 보여준다. PGI는 경량 모델부터 대형 모델까지 다양한 모델에 사용될 수 있다. 이를 통해 완전한 정보를 얻을 수 있으므로, 대규모 데이터셋으로 사전 훈련된 최신 모델보다 처음부터 훈련된 모델이 더 나은 결과를 달성할 수 있다. 비교 결과는 그림 1에 나와 있다. 소스 코드는 https://github.com/WongKinYiu/yolov9에서 확인할 수 있다.

English

Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost. This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions. We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN's architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1. The source codes are at: https://github.com/WongKinYiu/yolov9.

YOLOv9: 프로그래밍 가능한 그래디언트 정보를 활용하여 원하는 것을 학습하는 방법

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

초록

Summary

Support

Support