InternScenes:一個大規模可模擬的室內場景數據集,具備真實佈局
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
September 13, 2025
作者: Weipeng Zhong, Peizhou Cao, Yichen Jin, Li Luo, Wenzhe Cai, Jingli Lin, Hanqing Wang, Zhaoyang Lyu, Tai Wang, Bo Dai, Xudong Xu, Jiangmiao Pang
cs.AI
摘要
具身人工智慧的進步在很大程度上依賴於大規模、可模擬的三維場景數據集,這些數據集以場景多樣性和逼真的佈局為特徵。然而,現有的數據集通常存在數據規模或多樣性不足、佈局過於簡化而缺少小物件,以及嚴重的物體碰撞等問題。為解決這些缺陷,我們推出了InternScenes,這是一個新穎的大規模可模擬室內場景數據集,通過整合三種不同的場景來源——真實世界掃描、程序生成場景和設計師創建場景——構成了約40,000個多樣化場景,包含196萬個三維物體,覆蓋15種常見場景類型和288個物體類別。我們特別保留了場景中大量的小物件,從而形成了平均每個區域41.5個物體的逼真且複雜的佈局。我們全面的數據處理流程通過為真實世界掃描創建實物到模擬的複製品來確保可模擬性,通過在這些場景中加入可交互物體來增強互動性,並通過物理模擬解決物體碰撞問題。我們通過兩個基準應用展示了InternScenes的價值:場景佈局生成和點目標導航。兩者均顯示了複雜且逼真的佈局所帶來的新挑戰。更重要的是,InternScenes為擴大這兩項任務的模型訓練規模鋪平了道路,使得在如此複雜的場景中進行生成和導航成為可能。我們承諾開源數據、模型和基準,以惠及整個社區。
English
The advancement of Embodied AI heavily relies on large-scale, simulatable 3D
scene datasets characterized by scene diversity and realistic layouts. However,
existing datasets typically suffer from limitations in data scale or diversity,
sanitized layouts lacking small items, and severe object collisions. To address
these shortcomings, we introduce InternScenes, a novel large-scale
simulatable indoor scene dataset comprising approximately 40,000 diverse scenes
by integrating three disparate scene sources, real-world scans, procedurally
generated scenes, and designer-created scenes, including 1.96M 3D objects and
covering 15 common scene types and 288 object classes. We particularly preserve
massive small items in the scenes, resulting in realistic and complex layouts
with an average of 41.5 objects per region. Our comprehensive data processing
pipeline ensures simulatability by creating real-to-sim replicas for real-world
scans, enhances interactivity by incorporating interactive objects into these
scenes, and resolves object collisions by physical simulations. We demonstrate
the value of InternScenes with two benchmark applications: scene layout
generation and point-goal navigation. Both show the new challenges posed by the
complex and realistic layouts. More importantly, InternScenes paves the way for
scaling up the model training for both tasks, making the generation and
navigation in such complex scenes possible. We commit to open-sourcing the
data, models, and benchmarks to benefit the whole community.