CLIPGaussian:基於高斯噴濺的通用與多模態風格遷移
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
May 28, 2025
作者: Kornel Howil, Joanna Waczyńska, Piotr Borycki, Tadeusz Dziarmaga, Marcin Mazur, Przemysław Spurek
cs.AI
摘要
高斯潑濺(Gaussian Splatting, GS)技術近期作為一種從二維圖像渲染三維場景的高效表示方法嶄露頭角,並已擴展應用於圖像、視頻及動態四維內容的處理。然而,將風格遷移應用於基於GS的表示,尤其是超越簡單色彩變化的層面,仍面臨挑戰。本研究提出了CLIPGaussians,首個支持跨多模態(包括二維圖像、視頻、三維物體及四維場景)的文本與圖像引導風格化的統一框架。我們的方法直接作用於高斯基元,並作為插件模塊無縫集成至現有GS流程中,無需依賴大型生成模型或從頭訓練。CLIPGaussians方法實現了三維與四維場景下色彩與幾何的聯合優化,在視頻中確保了時間一致性,同時保持了模型規模的緊湊性。我們在所有任務中展現了卓越的風格保真度與一致性,驗證了CLIPGaussians作為多模態風格遷移的通用且高效解決方案的有效性。
English
Gaussian Splatting (GS) has recently emerged as an efficient representation
for rendering 3D scenes from 2D images and has been extended to images, videos,
and dynamic 4D content. However, applying style transfer to GS-based
representations, especially beyond simple color changes, remains challenging.
In this work, we introduce CLIPGaussians, the first unified style transfer
framework that supports text- and image-guided stylization across multiple
modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates
directly on Gaussian primitives and integrates into existing GS pipelines as a
plug-in module, without requiring large generative models or retraining from
scratch. CLIPGaussians approach enables joint optimization of color and
geometry in 3D and 4D settings, and achieves temporal coherence in videos,
while preserving a model size. We demonstrate superior style fidelity and
consistency across all tasks, validating CLIPGaussians as a universal and
efficient solution for multimodal style transfer.