SPATIALGEN: レイアウトに基づく3D室内シーン生成

要旨

屋内環境の高精細3Dモデル作成は、設計、仮想現実、ロボティクスにおける応用において不可欠です。しかし、手動での3Dモデリングは依然として時間と労力を要する作業です。近年の生成AIの進歩により自動シーン合成が可能になりましたが、既存の手法では視覚的品質、多様性、意味的一貫性、ユーザー制御のバランスを取ることが課題となっています。この課題の主なボトルネックは、このタスクに特化した大規模で高品質なデータセットの不足です。このギャップを埋めるため、12,328の構造化された注釈付きシーン、57,440の部屋、470万枚のフォトリアルな2Dレンダリングを含む包括的な合成データセットを導入します。このデータセットを活用し、現実的で意味的に一貫した3D屋内シーンを生成する新しいマルチビューマルチモーダル拡散モデルであるSpatialGenを提案します。3Dレイアウトとテキストプロンプトから導出された参照画像を入力として、我々のモデルは任意の視点から外観（カラー画像）、幾何学（シーン座標マップ）、意味（セマンティックセグメンテーションマップ）を合成し、モダリティ間の空間的一貫性を保ちます。実験では、SpatialGenが従来の手法よりも優れた結果を一貫して生成することが確認されました。我々は、コミュニティを支援し、屋内シーンの理解と生成の分野を進展させるため、データとモデルをオープンソースとして公開します。

English

Creating high-fidelity 3D models of indoor environments is essential for applications in design, virtual reality, and robotics. However, manual 3D modeling remains time-consuming and labor-intensive. While recent advances in generative AI have enabled automated scene synthesis, existing methods often face challenges in balancing visual quality, diversity, semantic consistency, and user control. A major bottleneck is the lack of a large-scale, high-quality dataset tailored to this task. To address this gap, we introduce a comprehensive synthetic dataset, featuring 12,328 structured annotated scenes with 57,440 rooms, and 4.7M photorealistic 2D renderings. Leveraging this dataset, we present SpatialGen, a novel multi-view multi-modal diffusion model that generates realistic and semantically consistent 3D indoor scenes. Given a 3D layout and a reference image (derived from a text prompt), our model synthesizes appearance (color image), geometry (scene coordinate map), and semantic (semantic segmentation map) from arbitrary viewpoints, while preserving spatial consistency across modalities. SpatialGen consistently generates superior results to previous methods in our experiments. We are open-sourcing our data and models to empower the community and advance the field of indoor scene understanding and generation.

SPATIALGEN: レイアウトに基づく3D室内シーン生成

SPATIALGEN: Layout-guided 3D Indoor Scene Generation

要旨

Support