长猫闪电全能模型技术报告

摘要

我们推出LongCat-Flash-Omni——一款拥有5600亿参数的开源全模态尖端模型，专精实时音视频交互。该模型采用课程启发式渐进训练策略，从简单到复杂逐步推进多模态序列建模任务，在保持强大单模态能力的同时获得全面多模态理解能力。基于采用高性能零计算专家捷径连接混合架构的LongCat-Flash模型，LongCat-Flash-Omni集成了高效多模态感知与语音重建模块。尽管参数量高达5600亿（激活参数270亿），该模型仍能实现低延迟实时音视频交互。针对训练基础设施，我们开发了模态解耦并行方案，专门应对大规模多模态训练中固有的数据与模型异质性挑战。这一创新方法能维持纯文本训练90%以上的吞吐量，展现出卓越效率。大量评估表明，LongCat-Flash-Omni在开源模型的全模态基准测试中达到领先性能，同时在文本、图像、视频理解以及音频理解与生成等广泛模态专项任务中表现出高度竞争力。我们全面阐述了模型架构设计、训练流程与数据策略，并将模型开源以促进学界后续研发。

English

We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspired progressive training strategy that transitions from simpler to increasingly complex modality sequence modeling tasks, LongCat-Flash-Omni attains comprehensive multimodal capabilities while maintaining strong unimodal capability. Building upon LongCat-Flash, which adopts a high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, LongCat-Flash-Omni integrates efficient multimodal perception and speech reconstruction modules. Despite its immense size of 560B parameters (with 27B activated), LongCat-Flash-Omni achieves low-latency real-time audio-visual interaction. For training infrastructure, we developed a modality-decoupled parallelism scheme specifically designed to manage the data and model heterogeneity inherent in large-scale multimodal training. This innovative approach demonstrates exceptional efficiency by sustaining over 90% of the throughput achieved by text-only training. Extensive evaluations show that LongCat-Flash-Omni achieves state-of-the-art performance on omni-modal benchmarks among open-source models. Furthermore, it delivers highly competitive results across a wide range of modality-specific tasks, including text, image, and video understanding, as well as audio understanding and generation. We provide a comprehensive overview of the model architecture design, training procedures, and data strategies, and open-source the model to foster future research and development in the community.