PixARMesh：基于自回归网格建模的单视角场景原生重建

摘要

我們提出PixARMesh方法，能夠從單張RGB圖像自迴歸地重建完整3D室內場景網格。有別於依賴隱式符號距離場與事後佈局優化的現有方法，PixARMesh通過統一模型聯合預測物體佈局與幾何結構，在單次前向傳播中即可生成具有連貫性且符合美術標準的網格。基於網格生成模型的最新進展，我們通過跨注意力機制將像素對齊圖像特徵與全局場景上下文融入點雲編碼器，實現從單張圖像進行精確空間推理。場景生成過程採用自迴歸方式，從包含上下文、位姿和網格的統一令牌流中逐步構建，最終產生具有高保真幾何結構的緊湊網格。在合成與真實數據集上的實驗表明，PixARMesh在重建質量方面達到最先進水平，同時能生成可直接應用於下游任務的輕量級高質量網格。

English

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing lightweight, high-quality meshes ready for downstream applications.

PixARMesh：基于自回归网格建模的单视角场景原生重建

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

摘要

Support