**Visionary:基于WebGPU驱动高斯溅射平台的世界模型载体**
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
December 9, 2025
作者: Yuning Gong, Yifei Liu, Yifan Zhan, Muyao Niu, Xueying Li, Yuanjun Liao, Jiaming Chen, Yuanyuan Gao, Jiaqi Chen, Minming Chen, Li Zhou, Yuning Zhang, Wei Wang, Xiaoqing Hou, Huaxi Huang, Shixiang Tang, Le Ma, Dingwen Zhang, Xue Yang, Junchi Yan, Yanchi Zhang, Yinqiang Zheng, Xiao Sun, Zhihang Zhong
cs.AI
摘要
神经渲染技术,特别是3D高斯泼溅(3DGS)方法,正快速发展并成为构建世界模型的核心组件。然而,现有查看器解决方案仍存在碎片化、笨重或受传统管线限制等问题,导致部署门槛高且对动态内容与生成模型的支持有限。本文提出Visionary——一个开放、基于网页原生技术的实时高斯泼溅与网格渲染平台。该平台基于高效的WebGPU渲染器构建,支持逐帧ONNX推理,在保持轻量化“点击即用”浏览器体验的同时实现动态神经处理。我们引入了标准化高斯生成器合约,不仅支持标准3DGS渲染,还允许即插即用算法逐帧生成或更新高斯单元。这种推理机制还能实现前馈生成式后处理。平台进一步提供three.js插件库,通过简洁的TypeScript API可无缝集成至现有网页应用。实验表明,在相同3DGS资源下,基于GPU图元排序的Visionary相较现有网页查看器具有更优的渲染效率。目前平台已支持多种变体,包括基于MLP的3DGS、4DGS、神经化身以及风格转换/增强网络。通过将推理与渲染直接统一在浏览器中,Visionary显著降低了3DGS系列方法的复现、比较与部署门槛,成为重建与生成双范式的统一世界模型载体。
English
Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.