超连接
Hyper-Connections
September 29, 2024
作者: Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou
cs.AI
摘要
我们提出了超连接,这是一种简单而有效的方法,可以作为残差连接的替代方案。这种方法专门解决了残差连接变体中常见的缺点,比如梯度消失和表示坍塌之间的跷跷板效应。从理论上讲,超连接允许网络调整不同深度特征之间连接的强度,并动态重新排列层。我们进行了重点关注大型语言模型的预训练实验,包括密集模型和稀疏模型,结果显示超连接相比残差连接有显著的性能提升。在视觉任务上进行的额外实验也展示了类似的改进。我们预计这种方法将在广泛的人工智能问题中具有广泛的适用性和益处。
English
We present hyper-connections, a simple yet effective method that can serve as
an alternative to residual connections. This approach specifically addresses
common drawbacks observed in residual connection variants, such as the seesaw
effect between gradient vanishing and representation collapse. Theoretically,
hyper-connections allow the network to adjust the strength of connections
between features at different depths and dynamically rearrange layers. We
conduct experiments focusing on the pre-training of large language models,
including dense and sparse models, where hyper-connections show significant
performance improvements over residual connections. Additional experiments
conducted on vision tasks also demonstrate similar improvements. We anticipate
that this method will be broadly applicable and beneficial across a wide range
of AI problems.Summary
AI-Generated Summary