pix2gestalt:通过合成整体进行非模态分割
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
January 25, 2024
作者: Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick
cs.AI
摘要
我们介绍了pix2gestalt,这是一个用于零样本全景分割的框架,它学习估计仅部分可见且被遮挡的整个对象的形状和外观。通过利用大规模扩散模型并将它们的表示迁移到这一任务中,我们学习了一个条件扩散模型,用于在具有挑战性的零样本情况下重建整个对象,包括违反自然和物理先验的示例,如艺术作品。作为训练数据,我们使用了一个包含被遮挡对象及其完整对应物的合成策划数据集。实验证明,我们的方法在已建立的基准测试中优于监督基线。此外,我们的模型还可用于显著提高现有对象识别和三维重建方法在存在遮挡情况下的性能。
English
We introduce pix2gestalt, a framework for zero-shot amodal segmentation,
which learns to estimate the shape and appearance of whole objects that are
only partially visible behind occlusions. By capitalizing on large-scale
diffusion models and transferring their representations to this task, we learn
a conditional diffusion model for reconstructing whole objects in challenging
zero-shot cases, including examples that break natural and physical priors,
such as art. As training data, we use a synthetically curated dataset
containing occluded objects paired with their whole counterparts. Experiments
show that our approach outperforms supervised baselines on established
benchmarks. Our model can furthermore be used to significantly improve the
performance of existing object recognition and 3D reconstruction methods in the
presence of occlusions.