StyleMM：基于文本驱动对齐图像转换的3D风格化可变形人脸模型

摘要

我们提出了StyleMM，一个创新框架，能够根据用户定义的文本描述构建风格化的三维可变形模型（3DMM）。该框架基于预训练的网格变形网络和用于原始3DMM真实人脸纹理生成的模型，通过文本引导的图像到图像（i2i）翻译与扩散模型生成的风格化面部图像作为渲染网格的风格化目标，对这些模型进行微调。为了防止在i2i翻译过程中出现身份、面部对齐或表情的不必要变化，我们引入了一种明确保留源图像面部属性的风格化方法。通过在图像风格化过程中保持这些关键属性，所提出的方法确保了通过基于图像的训练在3DMM参数空间内实现一致的3D风格迁移。训练完成后，StyleMM能够前馈生成风格化面部网格，并明确控制形状、表情和纹理参数，生成具有一致顶点连接性和可动画性的网格。定量和定性评估表明，我们的方法在身份级别的面部多样性和风格化能力方面优于现有最先进的方法。代码和视频可在[kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page)获取。

English

We introduce StyleMM, a novel framework that can construct a stylized 3D Morphable Model (3DMM) based on user-defined text descriptions specifying a target style. Building upon a pre-trained mesh deformation network and a texture generator for original 3DMM-based realistic human faces, our approach fine-tunes these models using stylized facial images generated via text-guided image-to-image (i2i) translation with a diffusion model, which serve as stylization targets for the rendered mesh. To prevent undesired changes in identity, facial alignment, or expressions during i2i translation, we introduce a stylization method that explicitly preserves the facial attributes of the source image. By maintaining these critical attributes during image stylization, the proposed approach ensures consistent 3D style transfer across the 3DMM parameter space through image-based training. Once trained, StyleMM enables feed-forward generation of stylized face meshes with explicit control over shape, expression, and texture parameters, producing meshes with consistent vertex connectivity and animatability. Quantitative and qualitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of identity-level facial diversity and stylization capability. The code and videos are available at [kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page).

StyleMM：基于文本驱动对齐图像转换的3D风格化可变形人脸模型

StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation

摘要

Support