PixARMesh：基于自回归网格建模的单视角场景重建方法

摘要

我们提出PixARMesh方法，能够从单张RGB图像自回归地重建完整三维室内场景网格。与依赖隐式符号距离场和事后布局优化的现有方法不同，PixARMesh通过统一模型联合预测物体布局与几何结构，在单次前向传播中即可生成连贯且达到美术级标准的网格。基于网格生成模型的最新进展，我们通过跨注意力机制将像素对齐的图像特征与全局场景上下文融入点云编码器，从而实现了单图像精准空间推理。场景通过包含上下文、位姿和网格的统一标记流自回归生成，最终产出具有高保真几何结构的紧凑网格。在合成数据集和真实数据集上的实验表明，PixARMesh在重建质量上达到业界最优水平，同时生成可直接用于下游应用的轻量化高质量网格。

English

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing lightweight, high-quality meshes ready for downstream applications.

PixARMesh：基于自回归网格建模的单视角场景重建方法

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

摘要

Support