ChatPaper.aiChatPaper

超连接

Hyper-Connections

September 29, 2024
作者: Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou
cs.AI

摘要

我们提出了超连接,这是一种简单而有效的方法,可以作为残差连接的替代方案。这种方法专门解决了残差连接变体中常见的缺点,比如梯度消失和表示坍塌之间的跷跷板效应。从理论上讲,超连接允许网络调整不同深度特征之间连接的强度,并动态重新排列层。我们进行了重点关注大型语言模型的预训练实验,包括密集模型和稀疏模型,结果显示超连接相比残差连接有显著的性能提升。在视觉任务上进行的额外实验也展示了类似的改进。我们预计这种方法将在广泛的人工智能问题中具有广泛的适用性和益处。
English
We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.

Summary

AI-Generated Summary

PDF234November 13, 2024