SPATIALGEN: 레이아웃 기반 3D 실내 장면 생성

초록

실내 환경의 고품질 3D 모델을 생성하는 것은 디자인, 가상 현실, 로보틱스 분야의 응용에 필수적입니다. 그러나 수동 3D 모델링은 여전히 시간이 많이 들고 노동 집약적입니다. 최근 생성형 AI의 발전으로 자동화된 장면 합성이 가능해졌지만, 기존 방법들은 시각적 품질, 다양성, 의미론적 일관성, 사용자 제어 간의 균형을 맞추는 데 어려움을 겪고 있습니다. 이러한 문제의 주요 병목 현상은 이 작업에 적합한 대규모 고품질 데이터셋의 부재입니다. 이 격차를 해결하기 위해, 우리는 12,328개의 구조화된 주석이 달린 장면, 57,440개의 방, 그리고 470만 개의 사실적인 2D 렌더링으로 구성된 포괄적인 합성 데이터셋을 소개합니다. 이 데이터셋을 활용하여, 우리는 현실적이고 의미론적으로 일관된 3D 실내 장면을 생성하는 새로운 다중 뷰 다중 모달 디퓨전 모델인 SpatialGen을 제시합니다. 3D 레이아웃과 텍스트 프롬프트에서 파생된 참조 이미지가 주어지면, 우리의 모델은 임의의 시점에서 외관(컬러 이미지), 기하학(장면 좌표 맵), 의미론(의미론적 분할 맵)을 합성하면서 모달리티 간의 공간적 일관성을 유지합니다. SpatialGen은 실험에서 이전 방법들보다 우수한 결과를 일관되게 생성합니다. 우리는 데이터와 모델을 오픈소스로 공개하여 커뮤니티를 지원하고 실내 장면 이해 및 생성 분야의 발전을 촉진하고자 합니다.

English

Creating high-fidelity 3D models of indoor environments is essential for applications in design, virtual reality, and robotics. However, manual 3D modeling remains time-consuming and labor-intensive. While recent advances in generative AI have enabled automated scene synthesis, existing methods often face challenges in balancing visual quality, diversity, semantic consistency, and user control. A major bottleneck is the lack of a large-scale, high-quality dataset tailored to this task. To address this gap, we introduce a comprehensive synthetic dataset, featuring 12,328 structured annotated scenes with 57,440 rooms, and 4.7M photorealistic 2D renderings. Leveraging this dataset, we present SpatialGen, a novel multi-view multi-modal diffusion model that generates realistic and semantically consistent 3D indoor scenes. Given a 3D layout and a reference image (derived from a text prompt), our model synthesizes appearance (color image), geometry (scene coordinate map), and semantic (semantic segmentation map) from arbitrary viewpoints, while preserving spatial consistency across modalities. SpatialGen consistently generates superior results to previous methods in our experiments. We are open-sourcing our data and models to empower the community and advance the field of indoor scene understanding and generation.

SPATIALGEN: 레이아웃 기반 3D 실내 장면 생성

SPATIALGEN: Layout-guided 3D Indoor Scene Generation

초록

Support