ChatPaper.aiChatPaper

VeOmni:通过模型中心化分布式配方库扩展任意模态模型训练

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

August 4, 2025
作者: Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu
cs.AI

摘要

近期,大型语言模型(LLMs)的突破性进展推动了全模态理解与生成的显著进步。然而,训练全模态LLMs仍面临重大挑战,这主要源于处理多种模态所需的异构模型架构,以及实现高效大规模训练所需的复杂系统设计。现有框架通常将模型定义与并行逻辑紧密耦合,导致可扩展性受限,并为端到端全模态训练带来巨大的工程开销。我们提出了一种模块化且高效的训练框架,旨在加速全模态LLMs的开发。该框架引入了以模型为中心的分布式策略,将通信与计算解耦,从而在全模态LLMs上实现高效的三维并行。此外,该框架还具备灵活的配置接口,支持以最小代码改动无缝集成新模态。利用这一框架,一个拥有300亿参数的全模态专家混合(MoE)模型能够在128个GPU上通过三维并行实现超过每秒每GPU 2,800个令牌的训练吞吐量,并扩展至16万上下文长度,充分展示了其在训练大规模全模态LLMs方面卓越的效率和可扩展性。
English
Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. % We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. % Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.
PDF152August 5, 2025