GaussianBlender:基于解耦隐空间的3D高斯即时风格化技术
GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces
December 3, 2025
作者: Melis Ocal, Xiaoyan Xing, Yue Li, Ngo Anh Vien, Sezer Karaoglu, Theo Gevers
cs.AI
摘要
三維風格化技術是遊戲開發、虛擬實境與數位藝術的核心領域,其對多樣化資產的需求催生了需具備可擴展性、支持快速高保真度操控的方法。現有的文本驅動三維風格化方法通常通過二維圖像編輯器進行知識蒸餾,不僅需要耗時的單資產優化流程,更因當前文本生成圖像模型的局限性而存在多視角不一致問題,導致其難以適用於大規模生產。本文提出突破性前饋框架 GaussianBlender,實現推理階段的即時文本驅動三維風格化編輯。該方法從空間分組的三維高斯表徵中學習具有可控信息共享機制的解耦結構化潛在空間,並通過潛在擴散模型對這些學習表徵實施文本條件化編輯。綜合評估表明,GaussianBlender 不僅能實現即時、高保真、幾何保持、多視角一致的風格化效果,更超越了需進行單實例測試時優化的方法,為大規模實用化三維風格化技術開闢了民主化路徑。
English
3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation. Existing text-to-3D stylization methods typically distill from 2D image editors, requiring time-intensive per-asset optimization and exhibiting multi-view inconsistency due to the limitations of current text-to-image models, which makes them impractical for large-scale production. In this paper, we introduce GaussianBlender, a pioneering feed-forward framework for text-driven 3D stylization that performs edits instantly at inference. Our method learns structured, disentangled latent spaces with controlled information sharing for geometry and appearance from spatially-grouped 3D Gaussians. A latent diffusion model then applies text-conditioned edits on these learned representations. Comprehensive evaluations show that GaussianBlender not only delivers instant, high-fidelity, geometry-preserving, multi-view consistent stylization, but also surpasses methods that require per-instance test-time optimization - unlocking practical, democratized 3D stylization at scale.