YaART: 또 다른 ART 렌더링 기술

초록

급속히 발전하는 생성 모델 분야에서, 효율적이고 고품질의 텍스트-이미지 확산 시스템 개발은 중요한 전선을 대표합니다. 본 연구는 인간 선호도에 맞춰진 Reinforcement Learning from Human Feedback(RLHF)를 사용한 새로운 프로덕션 등급의 텍스트-이미지 캐스케이드 확산 모델인 YaART를 소개합니다. YaART 개발 과정에서, 우리는 특히 모델과 훈련 데이터셋 크기의 선택에 초점을 맞췄는데, 이는 텍스트-이미지 캐스케이드 확산 모델에 대해 이전에 체계적으로 연구되지 않았던 측면입니다. 특히, 이러한 선택이 훈련 과정의 효율성과 생성된 이미지의 품질에 미치는 영향을 종합적으로 분석했으며, 이는 실제적으로 매우 중요한 요소입니다. 더 나아가, 우리는 더 작은 데이터셋에서 고품질 이미지로 훈련된 모델이 더 큰 데이터셋으로 훈련된 모델과 성공적으로 경쟁할 수 있음을 입증함으로써, 확산 모델 훈련의 더 효율적인 시나리오를 확립했습니다. 품질 측면에서, YaART는 사용자들에 의해 기존의 많은 최첨단 모델들보다 꾸준히 선호되었습니다.

English

In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

YaART: 또 다른 ART 렌더링 기술

YaART: Yet Another ART Rendering Technology

초록

Support