VolumeDiffusion: 효율적인 볼륨 인코더를 활용한 유연한 텍스트-3D 생성

초록

본 논문은 텍스트-3D 생성(text-to-3D generation)을 위한 혁신적인 3D 볼륨 인코더를 소개한다. 확산 모델(diffusion model)의 학습 데이터를 확장하기 위해, 경량 네트워크를 개발하여 다중 뷰 이미지로부터 효율적으로 특징 볼륨(feature volumes)을 획득한다. 이후 3D 볼륨은 3D U-Net을 사용하여 텍스트-3D 생성을 위한 확산 모델에 학습된다. 본 연구는 또한 부정확한 객체 캡션과 고차원 특징 볼륨의 문제를 해결한다. 공개된 Objaverse 데이터셋으로 학습된 제안 모델은 텍스트 프롬프트로부터 다양하고 인식 가능한 샘플을 생성하는 데 있어 유망한 결과를 보여준다. 특히, 텍스트 단서를 통해 객체 부품 특성을 더 세밀하게 제어할 수 있으며, 단일 객체 내에서 여러 개념을 원활하게 결합함으로써 모델의 창의성을 촉진한다. 본 연구는 효율적이고 유연하며 확장 가능한 표현 방법론을 도입함으로써 3D 생성 기술의 발전에 크게 기여한다. 코드는 https://github.com/tzco/VolumeDiffusion에서 확인할 수 있다.

English

This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccurate object captions and high-dimensional feature volumes. The proposed model, trained on the public Objaverse dataset, demonstrates promising outcomes in producing diverse and recognizable samples from text prompts. Notably, it empowers finer control over object part characteristics through textual cues, fostering model creativity by seamlessly combining multiple concepts within a single object. This research significantly contributes to the progress of 3D generation by introducing an efficient, flexible, and scalable representation methodology. Code is available at https://github.com/tzco/VolumeDiffusion.

VolumeDiffusion: 효율적인 볼륨 인코더를 활용한 유연한 텍스트-3D 생성

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

초록

Support