ChatPaper.aiChatPaper

Visionary:基於WebGPU驅動的高斯潑濺平台構建的世界模型載體

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

December 9, 2025
作者: Yuning Gong, Yifei Liu, Yifan Zhan, Muyao Niu, Xueying Li, Yuanjun Liao, Jiaming Chen, Yuanyuan Gao, Jiaqi Chen, Minming Chen, Li Zhou, Yuning Zhang, Wei Wang, Xiaoqing Hou, Huaxi Huang, Shixiang Tang, Le Ma, Dingwen Zhang, Xue Yang, Junchi Yan, Yanchi Zhang, Yinqiang Zheng, Xiao Sun, Zhihang Zhong
cs.AI

摘要

神經渲染技術,特別是3D高斯潑濺(3DGS)方法,正快速發展並成為構建世界模型的關鍵組件。然而現有的檢視器方案仍存在碎片化、笨重或受傳統管線限制等問題,導致部署門檻高且對動態內容與生成式模型支援有限。本研究提出Visionary——一個開放、網頁原生的即時高斯潑濺與網格渲染平台。該平台基於高效的WebGPU渲染器,結合逐幀ONNX推理技術,在保持輕量化「點擊即運行」瀏覽體驗的同時實現動態神經處理。我們引入標準化高斯生成器協定,不僅支援標準3DGS渲染,更允許即插即用算法逐幀生成或更新高斯單元。此推理機制還使我們能應用前饋生成式後處理技術。平台進一步提供three.js外掛程式庫與簡潔的TypeScript API,可無縫整合至現有網頁應用。實驗表明,在相同3DGS資產下,憑藉基於GPU的圖元排序技術,Visionary相較現有網頁檢視器實現更優的渲染效率。目前已支援多種變體,包括基於MLP的3DGS、4DGS、神經化身,以及風格轉換與增強網路。通過將推理與渲染直接統一在瀏覽器中,Visionary顯著降低了3DGS系列方法的複現、比較與部署門檻,成為重建式與生成式範式的統一世界模型載體。
English
Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.
PDF643December 11, 2025