十的生成能力
Generative Powers of Ten
December 4, 2023
作者: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski
cs.AI
摘要
我們提出了一種方法,使用文本到圖像模型來生成跨多個圖像尺度一致的內容,實現對場景的極端語義縮放,例如從森林的廣角景觀到昆蟲坐在樹枝上的特寫。我們通過聯合多尺度擴散抽樣方法實現這一點,該方法鼓勵在不同尺度上保持一致性,同時保留每個單獨抽樣過程的完整性。由於每個生成的尺度受不同的文本提示引導,我們的方法使得比傳統的超分辨率方法更深層次的縮放成為可能,後者可能在極不同的尺度上難以創建新的語境結構。我們在圖像超分辨率和外部繪製的替代技術方面在質量上將我們的方法與其他技術進行了比較,並表明我們的方法在生成一致的多尺度內容方面最為有效。
English
We present a method that uses a text-to-image model to generate consistent
content across multiple image scales, enabling extreme semantic zooms into a
scene, e.g., ranging from a wide-angle landscape view of a forest to a macro
shot of an insect sitting on one of the tree branches. We achieve this through
a joint multi-scale diffusion sampling approach that encourages consistency
across different scales while preserving the integrity of each individual
sampling process. Since each generated scale is guided by a different text
prompt, our method enables deeper levels of zoom than traditional
super-resolution methods that may struggle to create new contextual structure
at vastly different scales. We compare our method qualitatively with
alternative techniques in image super-resolution and outpainting, and show that
our method is most effective at generating consistent multi-scale content.