FlexPainter:靈活且多視角一致的紋理生成
FlexPainter: Flexible and Multi-View Consistent Texture Generation
June 3, 2025
作者: Dongyu Yan, Leyi Wu, Jiantao Lin, Luozhou Wang, Tianshuo Xu, Zhifei Chen, Zhen Yang, Lie Xu, Shunsi Zhang, Yingcong Chen
cs.AI
摘要
紋理貼圖生成是3D建模中的重要環節,直接影響渲染品質。近年來,基於擴散模型的方法為紋理生成開闢了新途徑。然而,受限的控制靈活性與有限的提示模式可能阻礙創作者獲得理想結果。此外,多視角生成圖像間的不一致性常導致紋理生成品質欠佳。為解決這些問題,我們提出了FlexPainter,一種新穎的紋理生成流程,它支持靈活的多模態條件引導,並實現高度一致的紋理生成。我們構建了一個共享的條件嵌入空間,以實現不同輸入模態間的靈活聚合。利用此嵌入空間,我們提出了一種基於圖像的CFG方法,分解結構與風格信息,實現基於參考圖像的風格化。借助圖像擴散先驗中的3D知識,我們首先使用網格表示同時生成多視角圖像,以增強全局理解。同時,我們在擴散採樣過程中引入了視角同步與自適應加權模塊,進一步確保局部一致性。最後,結合3D感知的紋理補全模型與紋理增強模型,生成無縫、高分辨率的紋理貼圖。全面實驗表明,我們的框架在靈活性與生成品質上均顯著優於現有最先進方法。
English
Texture map production is an important part of 3D modeling and determines the
rendering quality. Recently, diffusion-based methods have opened a new way for
texture generation. However, restricted control flexibility and limited prompt
modalities may prevent creators from producing desired results. Furthermore,
inconsistencies between generated multi-view images often lead to poor texture
generation quality. To address these issues, we introduce FlexPainter,
a novel texture generation pipeline that enables flexible multi-modal
conditional guidance and achieves highly consistent texture generation. A
shared conditional embedding space is constructed to perform flexible
aggregation between different input modalities. Utilizing such embedding space,
we present an image-based CFG method to decompose structural and style
information, achieving reference image-based stylization. Leveraging the 3D
knowledge within the image diffusion prior, we first generate multi-view images
simultaneously using a grid representation to enhance global understanding.
Meanwhile, we propose a view synchronization and adaptive weighting module
during diffusion sampling to further ensure local consistency. Finally, a
3D-aware texture completion model combined with a texture enhancement model is
used to generate seamless, high-resolution texture maps. Comprehensive
experiments demonstrate that our framework significantly outperforms
state-of-the-art methods in both flexibility and generation quality.