생성적 10의 거듭제곱

초록

본 논문에서는 텍스트-이미지 모델을 활용하여 다양한 이미지 스케일 간 일관된 콘텐츠를 생성하는 방법을 제안한다. 이를 통해 장면에 대한 극단적인 의미론적 줌을 가능하게 하며, 예를 들어 숲의 광각 풍경 뷰에서 나뭇가지 위에 앉아 있는 곤충의 매크로 샷까지의 범위를 아우른다. 우리는 각각의 샘플링 프로세스의 무결성을 유지하면서도 다양한 스케일 간 일관성을 촉진하는 공동 다중 스케일 확산 샘플링 접근법을 통해 이를 달성한다. 각 생성된 스케일은 서로 다른 텍스트 프롬프트에 의해 안내되므로, 본 방법은 전통적인 초해상도 방법보다 더 깊은 수준의 줌을 가능하게 한다. 전통적인 방법은 크게 다른 스케일에서 새로운 맥락적 구조를 생성하는 데 어려움을 겪을 수 있다. 우리는 이미지 초해상도 및 아웃페인팅 분야의 대안적 기법들과의 정성적 비교를 통해, 본 방법이 일관된 다중 스케일 콘텐츠 생성에 가장 효과적임을 보여준다.

English

We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.

생성적 10의 거듭제곱

Generative Powers of Ten

초록

Support