CLIPGaussian:基于高斯溅射的通用多模态风格迁移
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
May 28, 2025
作者: Kornel Howil, Joanna Waczyńska, Piotr Borycki, Tadeusz Dziarmaga, Marcin Mazur, Przemysław Spurek
cs.AI
摘要
高斯溅射(Gaussian Splatting, GS)作为一种从二维图像渲染三维场景的高效表示方法,近期崭露头角,并已扩展至图像、视频及动态四维内容的处理。然而,将风格迁移应用于基于GS的表示,尤其是超越简单的色彩变换,仍面临挑战。本研究提出了CLIPGaussians,这是首个支持跨多种模态(包括二维图像、视频、三维物体及四维场景)的文本与图像引导风格迁移的统一框架。我们的方法直接作用于高斯基元,并作为插件模块无缝集成到现有的GS流程中,无需依赖大型生成模型或从头训练。CLIPGaussians方法实现了在三维和四维环境下色彩与几何的联合优化,在视频中保持时间一致性,同时维持模型规模不变。我们展示了在所有任务中卓越的风格保真度与一致性,验证了CLIPGaussians作为多模态风格迁移的通用高效解决方案的有效性。
English
Gaussian Splatting (GS) has recently emerged as an efficient representation
for rendering 3D scenes from 2D images and has been extended to images, videos,
and dynamic 4D content. However, applying style transfer to GS-based
representations, especially beyond simple color changes, remains challenging.
In this work, we introduce CLIPGaussians, the first unified style transfer
framework that supports text- and image-guided stylization across multiple
modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates
directly on Gaussian primitives and integrates into existing GS pipelines as a
plug-in module, without requiring large generative models or retraining from
scratch. CLIPGaussians approach enables joint optimization of color and
geometry in 3D and 4D settings, and achieves temporal coherence in videos,
while preserving a model size. We demonstrate superior style fidelity and
consistency across all tasks, validating CLIPGaussians as a universal and
efficient solution for multimodal style transfer.Summary
AI-Generated Summary