StyleMM：基於文本驅動對齊圖像轉換的風格化3D可變形人臉模型

摘要

我們介紹了StyleMM，這是一個新穎的框架，能夠基於用戶定義的文本描述來構建風格化的3D可變形模型（3DMM）。該方法建立在預先訓練的網格變形網絡和用於原始3DMM基礎的真實人臉紋理生成器之上，通過使用擴散模型進行文本引導的圖像到圖像（i2i）翻譯生成的風格化面部圖像來微調這些模型，這些圖像作為渲染網格的風格化目標。為了防止在i2i翻譯過程中出現不期望的身份、面部對齊或表情變化，我們引入了一種明確保留源圖像面部屬性的風格化方法。通過在圖像風格化過程中保持這些關鍵屬性，所提出的方法確保了通過基於圖像的訓練在3DMM參數空間中實現一致的3D風格轉移。一旦訓練完成，StyleMM能夠實現前饋生成風格化面部網格，並對形狀、表情和紋理參數進行顯式控制，生成具有一致頂點連接性和可動畫性的網格。定量和定性評估表明，我們的方法在身份級別的面部多樣性和風格化能力方面優於最先進的方法。代碼和視頻可在[kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page)獲取。

English

We introduce StyleMM, a novel framework that can construct a stylized 3D Morphable Model (3DMM) based on user-defined text descriptions specifying a target style. Building upon a pre-trained mesh deformation network and a texture generator for original 3DMM-based realistic human faces, our approach fine-tunes these models using stylized facial images generated via text-guided image-to-image (i2i) translation with a diffusion model, which serve as stylization targets for the rendered mesh. To prevent undesired changes in identity, facial alignment, or expressions during i2i translation, we introduce a stylization method that explicitly preserves the facial attributes of the source image. By maintaining these critical attributes during image stylization, the proposed approach ensures consistent 3D style transfer across the 3DMM parameter space through image-based training. Once trained, StyleMM enables feed-forward generation of stylized face meshes with explicit control over shape, expression, and texture parameters, producing meshes with consistent vertex connectivity and animatability. Quantitative and qualitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of identity-level facial diversity and stylization capability. The code and videos are available at [kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page).

StyleMM：基於文本驅動對齊圖像轉換的風格化3D可變形人臉模型

StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation

摘要

Support