ChatPaper.aiChatPaper

VeOmni:通過模型中心分佈式配方庫擴展任意模態模型訓練

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

August 4, 2025
作者: Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu
cs.AI

摘要

近期,大型语言模型(LLMs)的进展推动了全模态理解与生成的显著进步。然而,由于处理多种模态所需的异构模型架构,训练全模态LLMs仍面临重大挑战,这要求进行复杂的系统设计以实现高效的大规模训练。现有框架通常将模型定义与并行逻辑紧密耦合,导致可扩展性受限,并为端到端全模态训练带来大量工程开销。我们提出了一种模块化且高效的训练框架——\veomni,以加速全模态LLMs的开发。\veomni引入了以模型为中心的分布式方案,将通信与计算解耦,从而在全模态LLMs上实现高效的三维并行。\veomni还具备灵活的配置接口,支持以最少的代码变更无缝集成新模态。使用\veomni,一个拥有300亿参数的全模态专家混合(MoE)模型可在128个GPU上通过三维并行实现超过2,800 tokens/sec/GPU的吞吐量,并扩展至160K上下文长度,展示了其在训练大规模全模态LLMs方面的卓越效率与可扩展性。
English
Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. % We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. % Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.
PDF152August 5, 2025