pix2gestalt: 全体像の合成によるアモーダルセグメンテーション

要旨

我々は、ゼロショットアモーダルセグメンテーションのためのフレームワークであるpix2gestaltを提案する。このフレームワークは、遮蔽物の背後に部分的にしか見えない物体全体の形状と外観を推定することを学習する。大規模な拡散モデルを活用し、その表現をこのタスクに転移させることで、自然や物理的な事前知識を破る例（例えば芸術作品）を含む、挑戦的なゼロショットケースにおいて物体全体を再構築するための条件付き拡散モデルを学習する。訓練データとして、遮蔽された物体とその全体像をペアにした合成データセットを使用する。実験結果は、我々のアプローチが確立されたベンチマークにおいて教師ありベースラインを上回ることを示している。さらに、我々のモデルは、遮蔽が存在する状況下での既存の物体認識および3D再構築手法の性能を大幅に向上させるために使用できる。

English

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.

pix2gestalt: 全体像の合成によるアモーダルセグメンテーション

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

要旨

Support