UP2You:從無約束照片集中快速重建自我形象
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
September 29, 2025
作者: Zeyu Cai, Ziyang Li, Xiaoben Li, Boqian Li, Zeyu Wang, Zhenyu Zhang, Yuliang Xiu
cs.AI
摘要
我們推出UP2You,首個無需調參即可從極度不受約束的野外二維照片中重建高保真三維穿衣人像的解決方案。與以往需要“乾淨”輸入(如全身圖像且遮擋最少,或校準良好的跨視角捕捉)的方法不同,UP2You直接處理原始、非結構化的照片,這些照片在姿態、視角、裁剪和遮擋方面可能存在顯著差異。我們不將數據壓縮為標記以進行緩慢的在線文本到三維優化,而是引入了一種數據校正範式,能在單次前向傳播中高效地將不受約束的輸入轉換為乾淨、正交的多視圖圖像,簡化三維重建過程。UP2You的核心是一個姿態相關特徵聚合模塊(PCFA),它根據目標姿態有選擇地融合來自多個參考圖像的信息,實現更好的身份保持,並在更多觀測下保持幾乎恆定的內存佔用。我們還引入了一種基於感知器的多參考形狀預測器,消除了對預先捕捉身體模板的需求。在4D-Dress、PuzzleIOI及野外捕捉數據上的廣泛實驗表明,UP2You在幾何精度(PuzzleIOI上Chamfer降低15%,P2S降低18%)和紋理保真度(4D-Dress上PSNR提升21%,LPIPS降低46%)上均持續超越先前方法。UP2You高效(每人1.5分鐘),且功能多樣(支持任意姿態控制及無需訓練的多服裝三維虛擬試穿),使其在人類被隨意捕捉的現實場景中具有實用性。我們將發布模型與代碼,以促進這一未充分探索任務的未來研究。項目頁面:https://zcai0612.github.io/UP2You
English
We present UP2You, the first tuning-free solution for reconstructing
high-fidelity 3D clothed portraits from extremely unconstrained in-the-wild 2D
photos. Unlike previous approaches that require "clean" inputs (e.g., full-body
images with minimal occlusions, or well-calibrated cross-view captures), UP2You
directly processes raw, unstructured photographs, which may vary significantly
in pose, viewpoint, cropping, and occlusion. Instead of compressing data into
tokens for slow online text-to-3D optimization, we introduce a data rectifier
paradigm that efficiently converts unconstrained inputs into clean, orthogonal
multi-view images in a single forward pass within seconds, simplifying the 3D
reconstruction. Central to UP2You is a pose-correlated feature aggregation
module (PCFA), that selectively fuses information from multiple reference
images w.r.t. target poses, enabling better identity preservation and nearly
constant memory footprint, with more observations. We also introduce a
perceiver-based multi-reference shape predictor, removing the need for
pre-captured body templates. Extensive experiments on 4D-Dress, PuzzleIOI, and
in-the-wild captures demonstrate that UP2You consistently surpasses previous
methods in both geometric accuracy (Chamfer-15%, P2S-18% on PuzzleIOI) and
texture fidelity (PSNR-21%, LPIPS-46% on 4D-Dress). UP2You is efficient (1.5
minutes per person), and versatile (supports arbitrary pose control, and
training-free multi-garment 3D virtual try-on), making it practical for
real-world scenarios where humans are casually captured. Both models and code
will be released to facilitate future research on this underexplored task.
Project Page: https://zcai0612.github.io/UP2You