長貓圖像技術報告
LongCat-Image Technical Report
December 8, 2025
作者: Meituan LongCat Team, Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, Xunliang Cai, Yayong Guan, Jie Hu
cs.AI
摘要
我們推出 LongCat-Image,這是一款開創性的開源雙語(中英)圖像生成基礎模型,旨在解決當前主流模型在多語言文字渲染、照片真實感、部署效率及開發者易用性方面的核心挑戰。1)我們通過在預訓練、中期訓練與 SFT 階段實施嚴格的數據策劃策略,並在強化學習階段配合使用精選獎勵模型,使該模型成為新一代技術標杆,具備卓越的文字渲染能力與驚人的照片真實感,同時顯著提升美學品質。2)值得注意的是,該模型為漢字渲染樹立了新的行業標準。即使面對複雜生僻字,其覆蓋範圍與準確性均超越主流開源及商業解決方案。3)憑藉緊湊型設計,模型實現了顯著效能優化。核心擴散模型僅需 60 億參數,遠小於領域內常見的近 200 億或更大規模的混合專家架構,這確保了顯存佔用最小化與推理高速化,大幅降低部署成本。除生成任務外,LongCat-Image 在圖像編輯領域同樣表現卓越,在標準基準測試中取得技術標杆級成果,其編輯一致性優於其他開源方案。4)為全面賦能開發社群,我們構建了迄今最完整的開源生態系統:不僅發布涵蓋文本生成圖像與圖像編輯的多個模型版本(包括中期訓練與完整訓練後的檢查點),更公開完整訓練工具鏈。我們相信 LongCat-Image 的開放性將為開發者與研究者提供強力支持,推動視覺內容創作的前沿發展。
English
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models. 1) We achieve this through rigorous data curation strategies across the pre-training, mid-training, and SFT stages, complemented by the coordinated use of curated reward models during the RL phase. This strategy establishes the model as a new state-of-the-art (SOTA), delivering superior text-rendering capabilities and remarkable photorealism, and significantly enhancing aesthetic quality. 2) Notably, it sets a new industry standard for Chinese character rendering. By supporting even complex and rare characters, it outperforms both major open-source and commercial solutions in coverage, while also achieving superior accuracy. 3) The model achieves remarkable efficiency through its compact design. With a core diffusion model of only 6B parameters, it is significantly smaller than the nearly 20B or larger Mixture-of-Experts (MoE) architectures common in the field. This ensures minimal VRAM usage and rapid inference, significantly reducing deployment costs. Beyond generation, LongCat-Image also excels in image editing, achieving SOTA results on standard benchmarks with superior editing consistency compared to other open-source works. 4) To fully empower the community, we have established the most comprehensive open-source ecosystem to date. We are releasing not only multiple model versions for text-to-image and image editing, including checkpoints after mid-training and post-training stages, but also the entire toolchain of training procedure. We believe that the openness of LongCat-Image will provide robust support for developers and researchers, pushing the frontiers of visual content creation.