YaART：又一个ART渲染技术

摘要

在快速发展的生成模型领域中，高效且高保真度的文本到图像扩散系统的开发代表着一个重要的前沿。本研究介绍了YaART，这是一个新颖的生产级文本到图像级联扩散模型，利用人类反馈强化学习（RLHF）来对齐人类偏好。在YaART的开发过程中，我们特别关注模型和训练数据集大小的选择，这些方面之前并未系统地针对文本到图像级联扩散模型进行研究。特别是，我们全面分析了这些选择如何影响训练过程的效率以及生成图像的质量，这在实践中非常重要。此外，我们证明了在高质量图像较小数据集上训练的模型可以成功地与在较大数据集上训练的模型竞争，建立了一个更高效的扩散模型训练场景。从质量的角度来看，YaART在许多现有的最先进模型中一直是用户一致偏好的选择。

English

In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

YaART：又一个ART渲染技术

YaART: Yet Another ART Rendering Technology

摘要

Support