MonoArt：基於單目影像的漸進式結構推理三維關節重建

摘要

從單一影像重建關節式3D物體，需在有限視覺證據下同時推斷物體幾何、部件結構與運動參數。核心難題在於運動線索與物體結構的相互糾纏，導致直接回歸關節參數的方法不穩定。現有方法透過多視角監督、基於檢索的組裝或輔助影片生成來應對此挑戰，但常需犧牲擴展性或效率。我們提出MonoArt——一個基於漸進式結構推理的統一框架。有別於從影像特徵直接預測關節，MonoArt在單一架構內逐步將視覺觀測轉化為標準化幾何、結構化部件表徵與運動感知嵌入。此結構化推理過程無需外部運動模板或多階段流程，即可實現穩定且可解釋的關節推斷。在PartNet-Mobility數據集上的大量實驗表明，該方法在重建精度與推理速度上均達到頂尖水平。此框架更可泛化至機器人操作與關節化場景重建任務。

English

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.

MonoArt：基於單目影像的漸進式結構推理三維關節重建

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

摘要

Support