pix2gestalt: 전체 합성을 통한 무형상 분할

초록

우리는 부분적으로 가려진 물체의 전체 형태와 외관을 추정하는 제로샷 아모달 분할(zero-shot amodal segmentation)을 위한 프레임워크인 pix2gestalt를 소개한다. 대규모 확산 모델(diffusion model)을 활용하고 그 표현을 이 작업에 전이함으로써, 우리는 예술과 같이 자연적 및 물리적 사전 지식을 깨는 예제를 포함한 도전적인 제로샷 사례에서 전체 물체를 재구성하기 위한 조건부 확산 모델을 학습한다. 학습 데이터로는 가려진 물체와 그 전체 대응물이 쌍을 이루는 합성 데이터셋을 사용한다. 실험 결과, 우리의 접근 방식은 기존 벤치마크에서 지도 학습 기반 방법들을 능가하는 성능을 보여준다. 또한, 우리의 모델은 가려짐이 존재하는 상황에서 기존 물체 인식 및 3D 재구성 방법의 성능을 크게 향상시키는 데 사용될 수 있다.

English

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.

pix2gestalt: 전체 합성을 통한 무형상 분할

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

초록

Support