MonoArt:基于渐进式结构推理的单目关节三维重建
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
March 19, 2026
作者: Haitian Li, Haozhe Xie, Junxiang Xu, Beichen Wen, Fangzhou Hong, Ziwei Liu
cs.AI
摘要
从单幅图像重建铰接式三维物体需要根据有限的视觉证据联合推断物体几何、部件结构及运动参数。关键难点在于运动线索与物体结构之间的纠缠关系,这使得直接回归铰接状态变得不稳定。现有方法通过多视角监督、基于检索的组装或辅助视频生成来应对这一挑战,但往往以牺牲可扩展性或效率为代价。我们提出MonoArt——一个基于渐进式结构推理的统一框架。该方法并非直接从图像特征预测铰接状态,而是在单一架构内逐步将视觉观察转化为规范几何、结构化部件表示和运动感知嵌入。这种结构化推理过程无需外部运动模板或多阶段流程,即可实现稳定且可解释的铰接推断。在PartNet-Mobility数据集上的大量实验表明,该方法在重建精度和推理速度方面均达到最先进水平。该框架还可进一步推广至机器人操作和铰接式场景重建任务。
English
Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.