I-Scene: 3D 인스턴스 모델은 암묵적 일반화 가능 공간 학습자

초록

일반화는 여전히 대화형 3D 장면 생성의 핵심 과제로 남아 있습니다. 기존의 학습 기반 접근법은 제한된 장면 데이터셋에 공간 이해를 근거로 두어 새로운 레이아웃에 대한 일반화를 제한합니다. 우리는 대신 사전 학습된 3D 인스턴스 생성기를 재프로그래밍하여 장면 수준 학습자로 작동하도록 하고, 데이터셋에 종속된 지도 학습을 모델 중심의 공간 지도 학습으로 대체합니다. 이 재프로그래밍은 생성기의 전이 가능한 공간 지식을 해제하여 보지 않은 레이아웃과 새로운 객체 구성에 대한 일반화를 가능하게 합니다. 놀랍게도, 훈련 장면이 무작위로 구성된 객체라 하더라도 공간 추론이 여전히 나타납니다. 이는 생성기의 전이 가능한 장면 사전 지식이 순수한 기하학적 단서로부터 근접성, 지지, 대칭성을 추론하는 풍부한 학습 신호를 제공함을 보여줍니다. 널리 사용되는 정규 공간을 대체하여, 우리는 장면 공간을 뷰 중심의 공식으로 구현하여 인스턴스 모델로부터 직접 공간 관계를 학습하는 완전 순전파 방식의 일반화 가능한 장면 생성기를 도출합니다. 정량적 및 정성적 결과는 3D 인스턴스 생성기가 암묵적인 공간 학습자이자 추론자임을 보여주며, 대화형 3D 장면 이해 및 생성을 위한 파운데이션 모델로의 방향을 제시합니다. 프로젝트 페이지: https://luling06.github.io/I-Scene-project/

English

Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited scene dataset, restricting generalization to new layouts. We instead reprogram a pre-trained 3D instance generator to act as a scene level learner, replacing dataset-bounded supervision with model-centric spatial supervision. This reprogramming unlocks the generator transferable spatial knowledge, enabling generalization to unseen layouts and novel object compositions. Remarkably, spatial reasoning still emerges even when the training scenes are randomly composed objects. This demonstrates that the generator's transferable scene prior provides a rich learning signal for inferring proximity, support, and symmetry from purely geometric cues. Replacing widely used canonical space, we instantiate this insight with a view-centric formulation of the scene space, yielding a fully feed-forward, generalizable scene generator that learns spatial relations directly from the instance model. Quantitative and qualitative results show that a 3D instance generator is an implicit spatial learner and reasoner, pointing toward foundation models for interactive 3D scene understanding and generation. Project page: https://luling06.github.io/I-Scene-project/

I-Scene: 3D 인스턴스 모델은 암묵적 일반화 가능 공간 학습자

I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

초록

Support