잔여 연결 재고찰: 안정적이고 효율적인 딥 네트워크를 위한 직교 업데이트

초록

잔차 연결(Residual connections)은 심층 신경망에서 핵심적인 역할을 하며, 기울기 소실 문제를 완화함으로써 더 깊은 네트워크 구성을 가능하게 합니다. 그러나 표준 잔차 업데이트에서는 모듈의 출력이 입력 스트림에 직접 더해집니다. 이는 기존 스트림 방향을 주로 강화하거나 조절하는 업데이트로 이어질 수 있으며, 결과적으로 모듈이 완전히 새로운 특징을 학습할 수 있는 잠재력을 충분히 활용하지 못할 가능성이 있습니다. 본 연구에서는 직교 잔차 업데이트(Orthogonal Residual Update)를 제안합니다. 이 방법에서는 모듈의 출력을 입력 스트림에 대해 분해하고, 이 스트림에 직교하는 성분만을 더합니다. 이러한 설계는 모듈이 주로 새로운 표현 방향을 제공하도록 유도하여 더 풍부한 특징 학습을 촉진하고, 동시에 더 효율적인 학습을 가능하게 합니다. 우리는 이 직교 업데이트 전략이 다양한 아키텍처(ResNetV2, Vision Transformers)와 데이터셋(CIFARs, TinyImageNet, ImageNet-1k)에서 일반화 정확도와 학습 안정성을 개선함을 입증했습니다. 예를 들어, ViT-B 모델의 경우 ImageNet-1k에서 +4.3%p의 상위 1위 정확도 향상을 달성했습니다.

English

Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +4.3\%p top-1 accuracy gain for ViT-B on ImageNet-1k.

잔여 연결 재고찰: 안정적이고 효율적인 딥 네트워크를 위한 직교 업데이트

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

초록

Support