ChatPaper.aiChatPaper

康定斯基5.0:图像与视频生成基础模型家族

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

November 19, 2025
作者: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Maria Kovaleva, Nikolai Vaulin, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Nikita Kiselev, Alexander Varlamov, Dmitrii Mikhailov, Vladimir Polovnikov, Andrey Shutkin, Ilya Vasiliev, Julia Agafonova, Anastasiia Kargapoltseva, Anna Dmitrienko, Anastasia Maltseva, Anna Averchenkova, Olga Kim, Tatiana Nikulina, Denis Dimitrov
cs.AI

摘要

本报告介绍了Kandinsky 5.0,一系列面向高分辨率图像及10秒视频合成的尖端基础模型。该框架包含三大核心模型系列:Kandinsky 5.0 Image Lite——一组拥有60亿参数的图像生成模型;Kandinsky 5.0 Video Lite——快速轻量、具备20亿参数的文本转视频及图像转视频模型;以及Kandinsky 5.0 Video Pro——拥有190亿参数,能实现卓越视频生成质量的模型。我们全面回顾了多阶段训练管道中的数据策展生命周期,包括收集、处理、过滤与聚类,这一过程涉及广泛的预训练,并融合了自监督微调(SFT)和基于强化学习(RL)的训练后优化等质量提升技术。此外,我们展示了新颖的架构、训练及推理优化策略,使Kandinsky 5.0在多项任务中实现高速生成并达到业界领先性能,这一点已通过人工评估得到验证。作为一个大规模、公开可用的生成框架,Kandinsky 5.0充分发挥其预训练及后续阶段的潜力,适用于广泛的生成应用场景。我们期望,本报告连同开源代码及训练检查点的发布,将极大地推动高质量生成模型的发展与研究界的可及性。
English
This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.
PDF1254November 21, 2025