ChatPaper.aiChatPaper

FlexiDreamer:使用FlexiCubes進行單張影像到3D生成

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

April 1, 2024
作者: Ruowen Zhao, Zhengyi Wang, Yikai Wang, Zihan Zhou, Jun Zhu
cs.AI

摘要

最近,從文字提示或單張圖像生成3D內容的質量和速度有了顯著進展。其中一個主要範式涉及生成一致的多視角圖像,然後進行稀疏視角重建。然而,由於直接將網格表示形式變形以接近目標拓撲的挑戰,大多數方法學在稀疏視角重建期間學習隱式表示(如NeRF),並通過後處理提取獲取目標網格。儘管隱式表示可以有效地建模豐富的3D信息,但其訓練通常需要很長的收斂時間。此外,從隱式場進行後提取操作也會導致不良的視覺異常。在本文中,我們提出了一種新穎的單圖像到3D生成框架FlexiDreamer,以端到端的方式重建目標網格。通過利用一種靈活的基於梯度的提取,即FlexiCubes,我們的方法避免了後處理帶來的缺陷,並促進了對目標網格的直接獲取。此外,我們還融入了一種多分辨率哈希網格編碼方案,逐步激活FlexiCubes中的隱式場中的編碼級別,以幫助捕捉幾何細節以進行逐步優化。值得注意的是,FlexiDreamer在單個NVIDIA A100 GPU上從單視圖圖像中恢復出密集的3D結構,大約需要1分鐘的時間,遠遠優於以往的方法。
English
3D content generation from text prompts or single images has made remarkable progress in quality and speed recently. One of its dominant paradigms involves generating consistent multi-view images followed by a sparse-view reconstruction. However, due to the challenge of directly deforming the mesh representation to approach the target topology, most methodologies learn an implicit representation (such as NeRF) during the sparse-view reconstruction and acquire the target mesh by a post-processing extraction. Although the implicit representation can effectively model rich 3D information, its training typically entails a long convergence time. In addition, the post-extraction operation from the implicit field also leads to undesirable visual artifacts. In this paper, we propose FlexiDreamer, a novel single image-to-3d generation framework that reconstructs the target mesh in an end-to-end manner. By leveraging a flexible gradient-based extraction known as FlexiCubes, our method circumvents the defects brought by the post-processing and facilitates a direct acquisition of the target mesh. Furthermore, we incorporate a multi-resolution hash grid encoding scheme that progressively activates the encoding levels into the implicit field in FlexiCubes to help capture geometric details for per-step optimization. Notably, FlexiDreamer recovers a dense 3D structure from a single-view image in approximately 1 minute on a single NVIDIA A100 GPU, outperforming previous methodologies by a large margin.

Summary

AI-Generated Summary

PDF242November 26, 2024