VeOmni:通過模型中心分佈式配方庫擴展任意模態模型訓練
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
August 4, 2025
作者: Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu
cs.AI
摘要
近期,大型语言模型(LLMs)的进展推动了全模态理解与生成的显著进步。然而,由于处理多种模态所需的异构模型架构,训练全模态LLMs仍面临重大挑战,这要求进行复杂的系统设计以实现高效的大规模训练。现有框架通常将模型定义与并行逻辑紧密耦合,导致可扩展性受限,并为端到端全模态训练带来大量工程开销。我们提出了一种模块化且高效的训练框架——\veomni,以加速全模态LLMs的开发。\veomni引入了以模型为中心的分布式方案,将通信与计算解耦,从而在全模态LLMs上实现高效的三维并行。\veomni还具备灵活的配置接口,支持以最少的代码变更无缝集成新模态。使用\veomni,一个拥有300亿参数的全模态专家混合(MoE)模型可在128个GPU上通过三维并行实现超过2,800 tokens/sec/GPU的吞吐量,并扩展至160K上下文长度,展示了其在训练大规模全模态LLMs方面的卓越效率与可扩展性。
English
Recent advances in large language models (LLMs) have driven impressive
progress in omni-modal understanding and generation. However, training
omni-modal LLMs remains a significant challenge due to the heterogeneous model
architectures required to process diverse modalities, necessitating
sophisticated system design for efficient large-scale training. Existing
frameworks typically entangle model definition with parallel logic, incurring
limited scalability and substantial engineering overhead for end-to-end
omni-modal training. % We present \veomni, a modular and efficient training
framework to accelerate the development of omni-modal LLMs. \veomni introduces
model-centric distributed recipes that decouples communication from
computation, enabling efficient 3D parallelism on omni-modal LLMs. \veomni also
features a flexible configuration interface supporting seamless integration of
new modalities with minimal code change. % Using \veomni, a omni-modal
mixture-of-experts (MoE) model with 30B parameters can be trained with over
2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D
parallelism on 128 GPUs, showcasing its superior efficiency and scalability for
training large omni-modal LLMs.