ハイパーコネクション

要旨

私たちは、ハイパーコネクションを提案します。これは、残差接続の代替手段として機能するシンプルで効果的な手法です。このアプローチは、残差接続の変種で見られる一般的な欠点、つまり勾配の消失と表現の崩壊の間のシーソーエフェクトに特に対処しています。理論的には、ハイパーコネクションにより、ネットワークが異なる深さの特徴間の接続の強度を調整し、レイヤーを動的に再配置することが可能となります。我々は、大規模言語モデルの事前学習に焦点を当てた実験を行い、密なモデルや疎なモデルを含む場合、ハイパーコネクションが残差接続よりも著しい性能向上を示すことを確認しました。視覚タスクに関する追加の実験も同様の改善を示しています。この手法が広範囲のAI問題において広く適用可能であり、有益であると期待しています。

English

We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.

ハイパーコネクション

Hyper-Connections

要旨

Support