HiFA: 고급 확산 가이던스를 활용한 고해상도 텍스트-3D 생성

초록

자동 텍스트-3D 합성 기술은 3D 모델 최적화를 통해 상당한 발전을 이루어 왔습니다. 기존 방법들은 일반적으로 디퓨전 모델과 같은 사전 학습된 텍스트-이미지 생성 모델에 의존하며, Neural Radiance Fields(NeRF)의 2D 렌더링에 대한 점수를 제공하고 이를 NeRF 최적화에 활용합니다. 그러나 이러한 방법들은 3D 기하학에 대한 이해가 제한적이기 때문에 종종 아티팩트와 다중 뷰 간의 불일치 문제에 직면합니다. 이러한 한계를 해결하기 위해, 우리는 디퓨전 사전을 사용하여 최적화 손실을 재구성하는 방법을 제안합니다. 또한, 디퓨전 사전의 잠재력을 발휘할 수 있는 새로운 학습 접근 방식을 소개합니다. 3D 기하학 표현을 개선하기 위해, NeRF 렌더링 이미지에 대한 보조 깊이 감독을 적용하고 NeRF의 밀도 필드를 정규화합니다. 광범위한 실험을 통해 우리의 방법이 기존 연구를 능가하는 우수성을 입증하며, 향상된 사진 현실감과 개선된 다중 뷰 일관성을 달성함을 보여줍니다.

English

Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models, providing scores for 2D renderings of Neural Radiance Fields (NeRFs) and being utilized for optimizing NeRFs. However, these methods often encounter artifacts and inconsistencies across multiple views due to their limited understanding of 3D geometry. To address these limitations, we propose a reformulation of the optimization loss using the diffusion prior. Furthermore, we introduce a novel training approach that unlocks the potential of the diffusion prior. To improve 3D geometry representation, we apply auxiliary depth supervision for NeRF-rendered images and regularize the density field of NeRFs. Extensive experiments demonstrate the superiority of our method over prior works, resulting in advanced photo-realism and improved multi-view consistency.

HiFA: 고급 확산 가이던스를 활용한 고해상도 텍스트-3D 생성

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

초록

Support