SyncDreamer: 単一視点画像からの多視点整合性のある画像生成

要旨

本論文では、単一視点画像から多視点整合性のある画像を生成する新たな拡散モデルを提案する。事前学習済みの大規模2D拡散モデルを用いた最近の研究Zero123は、物体の単一視点画像からもっともらしい新規視点を生成する能力を示している。しかし、生成された画像の幾何学と色の整合性を維持することは依然として課題である。この問題に対処するため、我々は多視点画像の同時確率分布をモデル化し、単一の逆プロセスで多視点整合性のある画像を生成可能にする同期型多視点拡散モデルを提案する。SyncDreamerは、3Dを意識した特徴量注意機構を通じて、逆プロセスの各ステップにおいて生成される全ての画像の中間状態を同期させる。これにより、異なる視点間の対応する特徴量を関連付ける。実験結果から、SyncDreamerは異なる視点間で高い整合性を持つ画像を生成することが示され、新規視点合成、テキストから3D、画像から3Dといった様々な3D生成タスクに適していることが確認された。

English

In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

SyncDreamer: 単一視点画像からの多視点整合性のある画像生成

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

要旨

Support