ワンダーズーム：マルチスケール3Dワールド生成

要旨

本論文では、単一画像から複数の空間スケールにわたるコンテンツを持つ3Dシーンを生成する新しい手法「WonderZoom」を提案する。既存の3Dワールド生成モデルは単一スケールの合成に限定され、異なる粒度で一貫性のあるシーンコンテンツを生成できない。根本的な課題は、空間サイズが大きく異なるコンテンツの生成とレンダリングが可能なスケール対応3D表現の欠如である。WonderZoomは以下の二つの革新的技術によりこの課題に取り組む：（1）マルチスケール3Dシーンの生成とリアルタイムレンダリングのためのスケール適応型ガウシアンサーフェル、（2）より精細なスケールの3Dコンテンツを反復的に生成するプログレッシブ詳細合成器。本手法により、ユーザーは3D領域に「ズームイン」し、風景から微視的特徴まで、従来存在しなかった精細なディテールを自己回帰的に合成できる。実験により、WonderZoomが品質と整合性の両面で最新のビデオ及び3Dモデルを大幅に上回り、単一画像からのマルチスケール3Dワールド創成を可能にすることを実証する。生成されたマルチスケール3Dワールドのビデオ結果とインタラクティブビューアーをhttps://wonderzoom.github.io/で公開している。

English

We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/

ワンダーズーム：マルチスケール3Dワールド生成

WonderZoom: Multi-Scale 3D World Generation

要旨

Support