Sketch-A-Shape: 제로샷 스케치에서 3D 형태 생성

초록

최근 대규모 사전 학습 모델을 3D 비전의 다운스트림 작업, 예를 들어 텍스트-투-3D 생성과 같은 창의적인 응용 분야에 적용하는 데 있어 상당한 진전이 이루어졌다. 이러한 발전은 스케치로부터 3D 모양을 생성하는 데 있어 이러한 사전 학습 모델을 효과적으로 활용할 수 있는 방법에 대한 우리의 연구를 촉발시켰다. 이는 스케치와 3D 모양 간의 짝지어진 데이터셋의 부족과 스케치의 추상화 수준이 다양하다는 점 때문에 여전히 해결되지 않은 과제로 남아 있었다. 우리는 훈련 중에 합성 렌더링의 특징(고정된 대규모 사전 학습 비전 모델에서 얻은)을 3D 생성 모델에 조건화함으로써 추론 시 스케치로부터 3D 모양을 효과적으로 생성할 수 있음을 발견했다. 이는 대규모 사전 학습 비전 모델의 특징이 도메인 변화에 강건한 의미론적 신호를 포함하고 있음을 시사하며, 즉 RGB 렌더링만을 사용하더라도 추론 시 스케치로 일반화할 수 있음을 보여준다. 우리는 다양한 설계 요소를 조사하는 포괄적인 실험을 수행하고, 훈련 중에 짝지어진 데이터셋이 필요 없이도 각 입력 스케치의 추상화 수준에 관계없이 여러 3D 모양을 생성하는 데 있어 우리의 직관적인 접근법의 효과를 입증했다.

English

Significant progress has recently been made in creative applications of large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This motivates our investigation of how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has largely remained an open challenge due to the limited sketch-shape paired datasets and the varying level of abstraction in the sketches. We discover that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are resilient to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conduct a comprehensive set of experiments investigating different design factors and demonstrate the effectiveness of our straightforward approach for generation of multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.

Sketch-A-Shape: 제로샷 스케치에서 3D 형태 생성

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

초록

Support