ChatPaper.aiChatPaper

重访残差连接:正交更新助力深度网络的稳定与高效

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

May 17, 2025
作者: Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Younjae Yu
cs.AI

摘要

残差连接对于深度神经网络至关重要,它通过缓解梯度消失问题,使得网络能够达到更深的层次。然而,在标准的残差更新中,模块的输出直接加到了输入流上。这种做法可能导致更新主要强化或调整现有流的方向,从而可能未能充分利用模块学习全新特征的能力。在本研究中,我们引入了正交残差更新:我们将模块的输出相对于输入流进行分解,并仅添加与该流正交的分量。这一设计旨在引导模块主要贡献新的表示方向,促进更丰富的特征学习,同时提升训练效率。我们证明,这种正交更新策略在多种架构(如ResNetV2、视觉Transformer)和数据集(如CIFARs、TinyImageNet、ImageNet-1k)上均能提升泛化精度和训练稳定性,例如,在ImageNet-1k上,ViT-B的top-1准确率提升了+4.3%。
English
Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +4.3\%p top-1 accuracy gain for ViT-B on ImageNet-1k.

Summary

AI-Generated Summary

PDF32May 26, 2025