mHC:流形约束超连接
mHC: Manifold-Constrained Hyper-Connections
December 31, 2025
作者: Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang
cs.AI
摘要
近期,以超连接(HC)为代表的研究通过扩展残差流宽度并多样化连接模式,拓展了过去十年间建立的普适性残差连接范式。尽管这种多样化带来了显著的性能提升,但它从根本上损害了残差连接固有的恒等映射特性,导致严重的训练不稳定性与受限的可扩展性,同时还产生了显著的内存访问开销。为解决这些挑战,我们提出流形约束超连接(mHC)——该通用框架将HC的残差连接空间投影至特定流形以恢复恒等映射特性,同时结合严格的基础设施优化以确保效率。实证研究表明,mHC能有效支持大规模训练,提供切实的性能提升与卓越的可扩展性。我们预期mHC作为HC的灵活实用扩展,将有助于深化对拓扑架构设计的理解,并为基础模型的演进指明有前景的方向。
English
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.