康定斯基5.0:面向图像与视频生成的基础模型家族
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
November 19, 2025
作者: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Maria Kovaleva, Nikolai Vaulin, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Nikita Kiselev, Alexander Varlamov, Dmitrii Mikhailov, Vladimir Polovnikov, Andrey Shutkin, Ilya Vasiliev, Julia Agafonova, Anastasiia Kargapoltseva, Anna Dmitrienko, Anastasia Maltseva, Anna Averchenkova, Olga Kim, Tatiana Nikulina, Denis Dimitrov
cs.AI
摘要
本报告介绍了Kandinsky 5.0,这是一系列用于高分辨率图像及十秒视频合成的最先进基础模型。该框架包含三大核心模型系列:Kandinsky 5.0 Image Lite——一组拥有60亿参数的高效图像生成模型;Kandinsky 5.0 Video Lite——快速轻量级、具备20亿参数的文本转视频及图像转视频模型;以及Kandinsky 5.0 Video Pro——拥有190亿参数,能够实现卓越视频生成质量的模型。我们全面回顾了多阶段训练流程中的数据管理生命周期,包括收集、处理、筛选与聚类,这一流程涉及广泛的预训练,并融合了如自监督微调(SFT)和基于强化学习(RL)的训练后优化等质量提升技术。此外,我们展示了新颖的架构、训练及推理优化策略,这些策略使Kandinsky 5.0能够在多种任务中实现高速生成并达到业界领先的性能,这一点已通过人类评估得到验证。作为一个大规模、公开可用的生成框架,Kandinsky 5.0充分发挥了其预训练及后续阶段的潜力,适用于广泛的生成应用场景。我们期望,本报告连同我们开源代码及训练检查点的发布,将极大地推动高质量生成模型的研究与发展,提升其在学术界的可及性。
English
This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.