重訪殘差連接:正交更新實現穩定高效的深度網絡
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
May 17, 2025
作者: Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Younjae Yu
cs.AI
摘要
殘差連接對於深度神經網絡至關重要,它通過緩解梯度消失問題來實現更深的網絡結構。然而,在標準的殘差更新中,模塊的輸出直接添加到輸入流中。這可能導致更新主要強化或調節現有的流方向,從而可能未充分利用模塊學習全新特徵的能力。在本研究中,我們引入了正交殘差更新:我們將模塊的輸出相對於輸入流進行分解,並僅添加與該流正交的分量。這一設計旨在引導模塊主要貢獻新的表示方向,促進更豐富的特徵學習,同時提升訓練效率。我們證明,我們的正交更新策略在多種架構(ResNetV2、視覺Transformer)和數據集(CIFARs、TinyImageNet、ImageNet-1k)上提高了泛化準確性和訓練穩定性,例如,在ImageNet-1k上為ViT-B帶來了+4.3%的top-1準確率提升。
English
Residual connections are pivotal for deep neural networks, enabling greater
depth by mitigating vanishing gradients. However, in standard residual updates,
the module's output is directly added to the input stream. This can lead to
updates that predominantly reinforce or modulate the existing stream direction,
potentially underutilizing the module's capacity for learning entirely novel
features. In this work, we introduce Orthogonal Residual Update: we decompose
the module's output relative to the input stream and add only the component
orthogonal to this stream. This design aims to guide modules to contribute
primarily new representational directions, fostering richer feature learning
while promoting more efficient training. We demonstrate that our orthogonal
update strategy improves generalization accuracy and training stability across
diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs,
TinyImageNet, ImageNet-1k), achieving, for instance, a +4.3\%p top-1 accuracy
gain for ViT-B on ImageNet-1k.Summary
AI-Generated Summary