ChatPaper.aiChatPaper

侏罗纪世界重制版:通过零样本长图像到图像翻译将古代化石复活

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

August 14, 2023
作者: Alexander Martin, Haitian Zheng, Jie An, Jiebo Luo
cs.AI

摘要

凭借对自然语言领域的深刻理解,我们在跨越大领域差距进行翻译并使骨架重生方面取得了令人期待的成果。在这项工作中,我们使用文本引导的潜在扩散模型进行零样本图像到图像的翻译(I2I),跨越大领域差距(longI2I),其中需要生成大量新的视觉特征和新的几何形状以进入目标领域。能够在大领域差距上进行翻译在刑事学、占星术、环境保护和古生物学等各种实际应用中具有广泛的应用。在这项工作中,我们引入了一个新任务Skull2Animal,用于在头骨和活体动物之间进行翻译。在这个任务中,我们发现无引导的生成对抗网络(GANs)无法跨越大领域差距进行翻译。我们探讨了引导扩散和图像编辑模型的使用,提供了一个新的基准模型Revive-2I,能够通过文本提示的潜在扩散模型执行零样本I2I。我们发现,在长距离I2I中引导是必要的,因为为了弥合大领域差距,需要有关目标领域的先验知识。此外,我们发现提示提供了关于目标领域的最佳和最可扩展的信息,因为分类器引导的扩散模型需要针对特定用例进行重新训练,并且由于它们训练的各种图像种类繁多,对目标领域的约束不够强。
English
With a strong understanding of the target domain from natural language, we produce promising results in translating across large domain gaps and bringing skeletons back to life. In this work, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain. Being able to perform translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology. In this work, we introduce a new task Skull2Animal for translating between skulls and living animals. On this task, we find that unguided Generative Adversarial Networks (GANs) are not capable of translating across large domain gaps. Instead of these traditional I2I methods, we explore the use of guided diffusion and image editing models and provide a new benchmark model, Revive-2I, capable of performing zero-shot I2I via text-prompting latent diffusion models. We find that guidance is necessary for longI2I because, to bridge the large domain gap, prior knowledge about the target domain is needed. In addition, we find that prompting provides the best and most scalable information about the target domain as classifier-guided diffusion models require retraining for specific use cases and lack stronger constraints on the target domain because of the wide variety of images they are trained on.
PDF71December 15, 2024