CAST:基於RGB影像的組件對齊三維場景重建
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
February 18, 2025
作者: Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, Jingyi Yu
cs.AI
摘要
從單一RGB圖像中恢復高品質的3D場景是計算機圖形學中的一項挑戰性任務。現有方法往往受制於特定領域的限制或生成物件的品質不佳。為解決這些問題,我們提出了CAST(基於單一RGB圖像的組件對齊3D場景重建),這是一種新穎的3D場景重建與恢復方法。CAST首先從輸入圖像中提取物件級的2D分割和相對深度信息,隨後利用基於GPT的模型分析物件間的空間關係,從而理解場景中物件如何相互關聯,確保重建的連貫性。接著,CAST採用一個遮擋感知的大規模3D生成模型,獨立生成每個物件的完整幾何形狀,並使用MAE和點雲條件來減輕遮擋和部分物件信息的影響,確保與源圖像的幾何和紋理精確對齊。為了將每個物件與場景對齊,對齊生成模型計算必要的變換,使生成的網格能夠精確放置並整合到場景的點雲中。最後,CAST引入了一個物理感知的校正步驟,利用細粒度關係圖生成約束圖,該圖指導物件姿態的優化,確保物理一致性和空間連貫性。通過使用有向距離場(SDF),模型有效解決了遮擋、物件穿透和懸浮物件等問題,確保生成的場景準確反映現實世界的物理交互。CAST可應用於機器人技術,實現高效的實物到模擬工作流程,並為機器人系統提供真實、可擴展的模擬環境。
English
Recovering high-quality 3D scenes from a single RGB image is a challenging
task in computer graphics. Current methods often struggle with domain-specific
limitations or low-quality object generation. To address these, we propose CAST
(Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel
method for 3D scene reconstruction and recovery. CAST starts by extracting
object-level 2D segmentation and relative depth information from the input
image, followed by using a GPT-based model to analyze inter-object spatial
relationships. This enables the understanding of how objects relate to each
other within the scene, ensuring more coherent reconstruction. CAST then
employs an occlusion-aware large-scale 3D generation model to independently
generate each object's full geometry, using MAE and point cloud conditioning to
mitigate the effects of occlusions and partial object information, ensuring
accurate alignment with the source image's geometry and texture. To align each
object with the scene, the alignment generation model computes the necessary
transformations, allowing the generated meshes to be accurately placed and
integrated into the scene's point cloud. Finally, CAST incorporates a
physics-aware correction step that leverages a fine-grained relation graph to
generate a constraint graph. This graph guides the optimization of object
poses, ensuring physical consistency and spatial coherence. By utilizing Signed
Distance Fields (SDF), the model effectively addresses issues such as
occlusions, object penetration, and floating objects, ensuring that the
generated scene accurately reflects real-world physical interactions. CAST can
be leveraged in robotics, enabling efficient real-to-simulation workflows and
providing realistic, scalable simulation environments for robotic systems.Summary
AI-Generated Summary