香肠:文本到图像模型的高效预训练
Wuerstchen: Efficient Pretraining of Text-to-Image Models
June 1, 2023
作者: Pablo Pernias, Dominic Rampas, Marc Aubreville
cs.AI
摘要
我们介绍了一种名为Wuerstchen的文本到图像合成新技术,它将竞争性能与前所未有的成本效益和在受限硬件上轻松训练相结合。借鉴了机器学习领域的最新进展,我们的方法利用强潜在图像压缩率下的潜在扩散策略,显著减少了通常与最先进模型相关的计算负担,同时保留甚至增强了生成图像的质量。Wuerstchen在推断时实现了显著的速度改进,从而使实时应用更具可行性。我们方法的一个关键优势在于其仅需9200个GPU小时的适度训练要求,大幅削减了通常的成本,而不会影响最终性能。在与最先进技术的比较中,我们发现该方法具有强大的竞争力。本文打开了一条新的研究道路,优先考虑性能和计算可访问性,从而使复杂AI技术的使用民主化。通过Wuerstchen,我们展示了在文本到图像合成领域迈出的引人注目的一步,为未来研究提供了一条创新路径。
English
We introduce Wuerstchen, a novel technique for text-to-image synthesis that
unites competitive performance with unprecedented cost-effectiveness and ease
of training on constrained hardware. Building on recent advancements in machine
learning, our approach, which utilizes latent diffusion strategies at strong
latent image compression rates, significantly reduces the computational burden,
typically associated with state-of-the-art models, while preserving, if not
enhancing, the quality of generated images. Wuerstchen achieves notable speed
improvements at inference time, thereby rendering real-time applications more
viable. One of the key advantages of our method lies in its modest training
requirements of only 9,200 GPU hours, slashing the usual costs significantly
without compromising the end performance. In a comparison against the
state-of-the-art, we found the approach to yield strong competitiveness. This
paper opens the door to a new line of research that prioritizes both
performance and computational accessibility, hence democratizing the use of
sophisticated AI technologies. Through Wuerstchen, we demonstrate a compelling
stride forward in the realm of text-to-image synthesis, offering an innovative
path to explore in future research.