3D 凝结:野外环境下的3D感知图像对齐
3D Congealing: 3D-Aware Image Alignment in the Wild
April 2, 2024
作者: Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani
cs.AI
摘要
我们提出了3D Congealing,这是一个针对捕捉语义相似对象的2D图像进行3D感知对齐的新问题。给定一组未标记的互联网图像,我们的目标是关联输入图像中的共享语义部分,并将2D图像的知识聚合到一个共享的3D规范空间中。我们引入了一个通用框架,该框架在不假设形状模板、姿态或任何相机参数的情况下处理此任务。其核心是一个规范的3D表示,它封装了几何和语义信息。该框架优化了规范表示以及每个输入图像的姿态,以及一个逐图像的坐标映射,该映射将2D像素坐标扭曲到3D规范框架中,以考虑形状匹配。优化过程融合了来自预训练图像生成模型的先验知识和输入图像的语义信息。前者为此欠约束任务提供了强大的知识指导,而后者提供了必要的信息以缓解预训练模型中的训练数据偏差。我们的框架可用于各种任务,如对应匹配、姿态估计和图像编辑,在具有挑战性的光照条件下和在野外的在线图像集合上,在真实世界的图像数据集上取得了强大的结果。
English
We propose 3D Congealing, a novel problem of 3D-aware alignment for 2D images
capturing semantically similar objects. Given a collection of unlabeled
Internet images, our goal is to associate the shared semantic parts from the
inputs and aggregate the knowledge from 2D images to a shared 3D canonical
space. We introduce a general framework that tackles the task without assuming
shape templates, poses, or any camera parameters. At its core is a canonical 3D
representation that encapsulates geometric and semantic information. The
framework optimizes for the canonical representation together with the pose for
each input image, and a per-image coordinate map that warps 2D pixel
coordinates to the 3D canonical frame to account for the shape matching. The
optimization procedure fuses prior knowledge from a pre-trained image
generative model and semantic information from input images. The former
provides strong knowledge guidance for this under-constraint task, while the
latter provides the necessary information to mitigate the training data bias
from the pre-trained model. Our framework can be used for various tasks such as
correspondence matching, pose estimation, and image editing, achieving strong
results on real-world image datasets under challenging illumination conditions
and on in-the-wild online image collections.Summary
AI-Generated Summary