十的生成能力
Generative Powers of Ten
December 4, 2023
作者: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski
cs.AI
摘要
我们提出了一种方法,利用文本到图像模型生成跨多个图像尺度一致的内容,实现对场景的极端语义缩放,例如,从森林的广角景观视图到昆虫停在树枝上的微距镜头。我们通过联合多尺度扩散采样方法实现这一目标,该方法鼓励在不同尺度上保持一致性,同时保留每个单独采样过程的完整性。由于每个生成的尺度受不同的文本提示指导,我们的方法能够实现比传统超分辨率方法更深层次的缩放,传统方法可能难以在完全不同的尺度上创建新的上下文结构。我们在图像超分辨率和外部绘制的替代技术上定性地比较了我们的方法,并表明我们的方法在生成一致的多尺度内容方面效果最佳。
English
We present a method that uses a text-to-image model to generate consistent
content across multiple image scales, enabling extreme semantic zooms into a
scene, e.g., ranging from a wide-angle landscape view of a forest to a macro
shot of an insect sitting on one of the tree branches. We achieve this through
a joint multi-scale diffusion sampling approach that encourages consistency
across different scales while preserving the integrity of each individual
sampling process. Since each generated scale is guided by a different text
prompt, our method enables deeper levels of zoom than traditional
super-resolution methods that may struggle to create new contextual structure
at vastly different scales. We compare our method qualitatively with
alternative techniques in image super-resolution and outpainting, and show that
our method is most effective at generating consistent multi-scale content.