残差接続の再考：安定かつ効率的なディープネットワークのための直交更新

要旨

残差接続は、勾配消失問題を緩和することで深層ニューラルネットワークの深さを増す上で極めて重要です。しかし、標準的な残差更新では、モジュールの出力が入力ストリームに直接加算されます。これにより、既存のストリーム方向を主に強化または調整する更新が行われ、モジュールが全く新しい特徴を学習する能力を十分に活用できない可能性があります。本研究では、直交残差更新を提案します。モジュールの出力を入力ストリームに対して分解し、このストリームに直交する成分のみを加算します。この設計は、モジュールが主に新しい表現方向を提供するよう導き、より豊富な特徴学習を促進するとともに、より効率的な学習を実現することを目的としています。我々の直交更新戦略が、様々なアーキテクチャ（ResNetV2、Vision Transformers）とデータセット（CIFARs、TinyImageNet、ImageNet-1k）において、汎化精度と学習安定性を向上させることを実証しました。例えば、ViT-BにおいてImageNet-1kで+4.3%pのトップ1精度向上を達成しました。

English

Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +4.3\%p top-1 accuracy gain for ViT-B on ImageNet-1k.

残差接続の再考：安定かつ効率的なディープネットワークのための直交更新

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

要旨

Support