ChatPaper.aiChatPaper

侏羅紀世界重製:透過零樣本長距離圖像到圖像翻譯將古代化石重現生機

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

August 14, 2023
作者: Alexander Martin, Haitian Zheng, Jie An, Jiebo Luo
cs.AI

摘要

憑藉對自然語言目標領域的深刻理解,我們在跨越龐大領域差距並使骨架重生的翻譯方面取得了令人期待的成果。在這項工作中,我們使用了以文本引導的潛在擴散模型,用於零樣本圖像到圖像的翻譯(I2I),跨越了龐大的領域差距(longI2I),需要生成大量新的視覺特徵和新的幾何形狀以進入目標領域。能夠在龐大領域差距上進行翻譯在刑事學、占星學、環境保護和古生物學等現實世界應用中具有廣泛的應用。在這項工作中,我們引入了一個新任務Skull2Animal,用於在頭顱骨和活體動物之間進行翻譯。在這個任務中,我們發現未經引導的生成對抗網絡(GANs)無法跨越龐大的領域差距進行翻譯。我們探索了引導擴散和圖像編輯模型的應用,提出了一個新的基準模型Revive-2I,能夠通過文本提示潛在擴散模型執行零樣本I2I。我們發現,在longI2I中引導是必要的,因為為了彌合龐大的領域差距,需要有關目標領域的先前知識。此外,我們發現提示提供了有關目標領域的最佳和最具擴展性的信息,因為分類器引導的擴散模型需要重新訓練以應對特定用例,並且由於它們訓練的圖像種類繁多,對目標領域的約束力較弱。
English
With a strong understanding of the target domain from natural language, we produce promising results in translating across large domain gaps and bringing skeletons back to life. In this work, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain. Being able to perform translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology. In this work, we introduce a new task Skull2Animal for translating between skulls and living animals. On this task, we find that unguided Generative Adversarial Networks (GANs) are not capable of translating across large domain gaps. Instead of these traditional I2I methods, we explore the use of guided diffusion and image editing models and provide a new benchmark model, Revive-2I, capable of performing zero-shot I2I via text-prompting latent diffusion models. We find that guidance is necessary for longI2I because, to bridge the large domain gap, prior knowledge about the target domain is needed. In addition, we find that prompting provides the best and most scalable information about the target domain as classifier-guided diffusion models require retraining for specific use cases and lack stronger constraints on the target domain because of the wide variety of images they are trained on.
PDF71December 15, 2024