StyleMe3D:基于多编码器与解耦先验的三维高斯风格化
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
April 21, 2025
作者: Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, Shengqi Liu, Yiying Yang, Xianfang Zeng, Gang Yu, Ming Li
cs.AI
摘要
3D高斯泼溅(3DGS)在逼真场景重建方面表现出色,但在处理风格化场景(如卡通、游戏)时却面临纹理碎片化、语义错位及对抽象美学适应力有限等挑战。为此,我们提出了StyleMe3D,一个全面的3D高斯泼溅风格迁移框架,它融合了多模态风格条件、多层次语义对齐及感知质量增强。我们的核心洞见包括:(1)仅优化RGB属性可在风格化过程中保持几何完整性;(2)分离低、中、高层次的语义对于连贯的风格迁移至关重要;(3)在孤立物体与复杂场景间的可扩展性是实际部署的关键。StyleMe3D引入了四项创新组件:动态风格分数蒸馏(DSSD),利用稳定扩散的潜在空间实现语义对齐;对比风格描述符(CSD),用于局部化、内容感知的纹理迁移;同步优化尺度(SOS),以解耦风格细节与结构一致性;以及3D高斯质量评估(3DG-QA),一种基于人类评分数据训练的可微分美学先验,用于抑制伪影并增强视觉和谐。在NeRF合成数据集(物体)和tandt db(场景)数据集上的评估表明,StyleMe3D在保留几何细节(如雕塑上的雕刻)和确保场景间风格一致性(如风景中的连贯光照)方面超越了现有最先进方法,同时保持了实时渲染能力。此工作架起了逼真3D高斯泼溅与艺术风格化之间的桥梁,为游戏、虚拟世界及数字艺术等领域开辟了新的应用前景。
English
3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction
but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented
textures, semantic misalignment, and limited adaptability to abstract
aesthetics. We propose StyleMe3D, a holistic framework for 3D GS style transfer
that integrates multi-modal style conditioning, multi-level semantic alignment,
and perceptual quality enhancement. Our key insights include: (1) optimizing
only RGB attributes preserves geometric integrity during stylization; (2)
disentangling low-, medium-, and high-level semantics is critical for coherent
style transfer; (3) scalability across isolated objects and complex scenes is
essential for practical deployment. StyleMe3D introduces four novel components:
Dynamic Style Score Distillation (DSSD), leveraging Stable Diffusion's latent
space for semantic alignment; Contrastive Style Descriptor (CSD) for localized,
content-aware texture transfer; Simultaneously Optimized Scale (SOS) to
decouple style details and structural coherence; and 3D Gaussian Quality
Assessment (3DG-QA), a differentiable aesthetic prior trained on human-rated
data to suppress artifacts and enhance visual harmony. Evaluated on NeRF
synthetic dataset (objects) and tandt db (scenes) datasets, StyleMe3D
outperforms state-of-the-art methods in preserving geometric details (e.g.,
carvings on sculptures) and ensuring stylistic consistency across scenes (e.g.,
coherent lighting in landscapes), while maintaining real-time rendering. This
work bridges photorealistic 3D GS and artistic stylization, unlocking
applications in gaming, virtual worlds, and digital art.Summary
AI-Generated Summary