pix2gestalt:通過合成整體來進行無模態分割
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
January 25, 2024
作者: Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick
cs.AI
摘要
我們介紹了 pix2gestalt,一個用於零樣本非物體分割的框架,該框架學習估計僅部分可見並被遮擋的整個物體的形狀和外觀。通過利用大規模擴散模型並將它們的表示轉移到這個任務中,我們學習了一個條件擴散模型,用於在具有挑戰性的零樣本情況下重建整個物體,包括打破自然和物理先驗的示例,如藝術品。作為訓練數據,我們使用了一個經過合成精心策劃的數據集,其中包含被遮擋的物體與它們的整體對應物。實驗表明,我們的方法在已建立的基準測試中優於監督基準。此外,我們的模型還可以用於顯著提高現有物體識別和三維重建方法在存在遮擋情況下的性能。
English
We introduce pix2gestalt, a framework for zero-shot amodal segmentation,
which learns to estimate the shape and appearance of whole objects that are
only partially visible behind occlusions. By capitalizing on large-scale
diffusion models and transferring their representations to this task, we learn
a conditional diffusion model for reconstructing whole objects in challenging
zero-shot cases, including examples that break natural and physical priors,
such as art. As training data, we use a synthetically curated dataset
containing occluded objects paired with their whole counterparts. Experiments
show that our approach outperforms supervised baselines on established
benchmarks. Our model can furthermore be used to significantly improve the
performance of existing object recognition and 3D reconstruction methods in the
presence of occlusions.