HiScene:基於等距視圖生成的分層三維場景構建
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
April 17, 2025
作者: Wenqi Dong, Bangbang Yang, Zesong Yang, Yuan Li, Tao Hu, Hujun Bao, Yuewen Ma, Zhaopeng Cui
cs.AI
摘要
場景級3D生成代表了多媒體與計算機圖形學中的一個關鍵前沿,然而現有方法要么受限於物體類別,要么缺乏適用於互動應用的編輯靈活性。本文提出HiScene,一種新穎的分層框架,它彌合了2D圖像生成與3D物體生成之間的鴻溝,並能生成具有組合特徵和美學場景內容的高保真場景。我們的核心洞見在於將場景視為等距視角下的分層“物體”,其中房間作為一個複雜物體,可進一步分解為可操控的單元。這種分層方法使我們能夠生成與2D表示對齊的3D內容,同時保持組合結構。為了確保每個分解實例的完整性和空間對齊,我們開發了一種基於視頻擴散的模態補全技術,有效處理物體間的遮擋與陰影,並引入形狀先驗注入以確保場景內的空間一致性。實驗結果表明,我們的方法能產生更自然的物體排列和完整的物體實例,適合互動應用,同時保持物理合理性並與用戶輸入對齊。
English
Scene-level 3D generation represents a critical frontier in multimedia and
computer graphics, yet existing approaches either suffer from limited object
categories or lack editing flexibility for interactive applications. In this
paper, we present HiScene, a novel hierarchical framework that bridges the gap
between 2D image generation and 3D object generation and delivers high-fidelity
scenes with compositional identities and aesthetic scene content. Our key
insight is treating scenes as hierarchical "objects" under isometric views,
where a room functions as a complex object that can be further decomposed into
manipulatable items. This hierarchical approach enables us to generate 3D
content that aligns with 2D representations while maintaining compositional
structure. To ensure completeness and spatial alignment of each decomposed
instance, we develop a video-diffusion-based amodal completion technique that
effectively handles occlusions and shadows between objects, and introduce shape
prior injection to ensure spatial coherence within the scene. Experimental
results demonstrate that our method produces more natural object arrangements
and complete object instances suitable for interactive applications, while
maintaining physical plausibility and alignment with user inputs.Summary
AI-Generated Summary