DiffProxy:基于扩散生成密集代理的多视角人体网格重建
DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
January 5, 2026
作者: Renke Wang, Zhenyu Zhang, Ying Tai, Jian Yang
cs.AI
摘要
基于多视角图像的人体网格重建面临一个根本性挑战:真实世界数据集包含不完美的真实标注,会导致模型训练产生偏差;而具有精确标注的合成数据则存在领域差异问题。本文提出DiffProxy这一创新框架,通过生成多视角一致的人体代理模型来解决网格重建问题。该框架的核心在于利用基于扩散模型的生成先验,弥合合成数据训练与真实场景泛化之间的鸿沟。其关键创新包括:(1) 采用多条件机制生成多视角一致、像素对齐的人体代理;(2) 引入支持灵活视觉提示的手部细化模块以增强局部细节;(3) 设计不确定性感知的测试时缩放方法,提升优化过程中对挑战性案例的鲁棒性。这些设计确保网格重建过程能有效利用合成数据的精确真值标注和扩散管道的生成优势。仅使用合成数据训练的DiffProxy在五个真实世界基准测试中均达到最先进性能,尤其在存在遮挡和局部视角的挑战性场景中展现出强大的零样本泛化能力。项目页面:https://wrk226.github.io/DiffProxy.html
English
Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models' training, while synthetic data with precise supervision suffers from domain gap. In this paper, we propose DiffProxy, a novel framework that generates multi-view consistent human proxies for mesh recovery. Central to DiffProxy is leveraging the diffusion-based generative priors to bridge the synthetic training and real-world generalization. Its key innovations include: (1) a multi-conditional mechanism for generating multi-view consistent, pixel-aligned human proxies; (2) a hand refinement module that incorporates flexible visual prompts to enhance local details; and (3) an uncertainty-aware test-time scaling method that increases robustness to challenging cases during optimization. These designs ensure that the mesh recovery process effectively benefits from the precise synthetic ground truth and generative advantages of the diffusion-based pipeline. Trained entirely on synthetic data, DiffProxy achieves state-of-the-art performance across five real-world benchmarks, demonstrating strong zero-shot generalization particularly on challenging scenarios with occlusions and partial views. Project page: https://wrk226.github.io/DiffProxy.html