HarmonyView:在一張圖像到3D中協調一致性與多樣性
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
December 26, 2023
作者: Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim
cs.AI
摘要
最近在單張圖像3D生成方面取得的進展凸顯了多視角一致性的重要性,利用在互聯網規模圖像上預訓練的大規模擴散模型中的3D先驗。然而,在研究領域中,對於新視角多樣性的方面仍未得到充分探索,這是由於將2D圖像轉換為3D內容時存在的模糊性,可能出現眾多潛在形狀。在這裡,我們旨在通過同時解決一致性和多樣性來填補這一研究空白。然而,在這兩個方面之間取得平衡面臨著相當大的挑戰,因為它們固有地存在著權衡。本研究介紹了HarmonyView,這是一種簡單而有效的擴散採樣技術,擅長分解單張圖像3D生成中的兩個復雜方面:一致性和多樣性。這種方法為在採樣過程中更細緻地探索這兩個關鍵維度打開了一扇大門。此外,我們提出了一種基於CLIP圖像和文本編碼器的新評估指標,以全面評估生成視角的多樣性,這與人類評估者的判斷密切相符。在實驗中,HarmonyView實現了一種和諧的平衡,在一致性和多樣性方面展現出雙贏的情景。
English
Recent progress in single-image 3D generation highlights the importance of
multi-view coherency, leveraging 3D priors from large-scale diffusion models
pretrained on Internet-scale images. However, the aspect of novel-view
diversity remains underexplored within the research landscape due to the
ambiguity in converting a 2D image into 3D content, where numerous potential
shapes can emerge. Here, we aim to address this research gap by simultaneously
addressing both consistency and diversity. Yet, striking a balance between
these two aspects poses a considerable challenge due to their inherent
trade-offs. This work introduces HarmonyView, a simple yet effective diffusion
sampling technique adept at decomposing two intricate aspects in single-image
3D generation: consistency and diversity. This approach paves the way for a
more nuanced exploration of the two critical dimensions within the sampling
process. Moreover, we propose a new evaluation metric based on CLIP image and
text encoders to comprehensively assess the diversity of the generated views,
which closely aligns with human evaluators' judgments. In experiments,
HarmonyView achieves a harmonious balance, demonstrating a win-win scenario in
both consistency and diversity.