ChatPaper.aiChatPaper

CAST:基于RGB图像的组件对齐三维场景重建

CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

February 18, 2025
作者: Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, Jingyi Yu
cs.AI

摘要

从单一RGB图像中恢复高质量的三维场景是计算机图形学中的一项艰巨任务。现有方法常受限于特定领域或生成物体质量不高的问题。为此,我们提出了CAST(基于组件对齐的单RGB图像三维场景重建),一种新颖的三维场景重建与恢复方法。CAST首先从输入图像中提取物体级别的二维分割信息及相对深度数据,随后利用基于GPT的模型分析物体间的空间关系,从而理解场景中各物体如何相互关联,确保重建过程更加连贯。接着,CAST采用一个具备遮挡感知能力的大规模三维生成模型,独立生成每个物体的完整几何形状,通过MAE(掩码自编码器)和点云条件化来减轻遮挡和物体信息不完整的影响,确保生成结果与源图像的几何结构和纹理精确对齐。为了将每个物体与场景对齐,对齐生成模型计算出必要的变换参数,使得生成的网格能够准确放置并融入场景的点云中。最后,CAST引入了一个物理感知的校正步骤,利用细粒度关系图生成约束图,该图指导物体姿态的优化,确保物理一致性和空间连贯性。通过使用有符号距离场(SDF),模型有效解决了遮挡、物体穿透及漂浮物体等问题,确保生成的场景真实反映现实世界的物理交互。CAST可应用于机器人领域,实现高效的现实到仿真的工作流程,为机器人系统提供真实且可扩展的仿真环境。
English
Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction and recovery. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relationships. This enables the understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object's full geometry, using MAE and point cloud conditioning to mitigate the effects of occlusions and partial object information, ensuring accurate alignment with the source image's geometry and texture. To align each object with the scene, the alignment generation model computes the necessary transformations, allowing the generated meshes to be accurately placed and integrated into the scene's point cloud. Finally, CAST incorporates a physics-aware correction step that leverages a fine-grained relation graph to generate a constraint graph. This graph guides the optimization of object poses, ensuring physical consistency and spatial coherence. By utilizing Signed Distance Fields (SDF), the model effectively addresses issues such as occlusions, object penetration, and floating objects, ensuring that the generated scene accurately reflects real-world physical interactions. CAST can be leveraged in robotics, enabling efficient real-to-simulation workflows and providing realistic, scalable simulation environments for robotic systems.

Summary

AI-Generated Summary

PDF52May 15, 2025