비전너리: 웹지피유 기반 가우시안 스플래팅 플랫폼으로 구축된 세계 모델 캐리어

초록

신경 렌더링, 특히 3D 가우시안 스플래팅(3DGS)은 빠르게 발전하며 세계 모델 구축의 핵심 구성 요소로 자리잡았습니다. 그러나 기존 뷰어 솔루션은 여전히 파편화되고 무겁거나 레거시 파이프라인에 제약을 받아 배포 장벽이 높고 동적 콘텐츠 및 생성형 모델에 대한 지원이 제한적입니다. 본 연구에서는 실시간 다양한 가우시안 스플래팅 및 메시 렌더링을 위한 개방형 웹 네이티브 플랫폼인 Visionary를 소개합니다. 프레임 단위 ONNX 추론을 지원하는 효율적인 WebGPU 렌더러를 기반으로 구축된 Visionary는 가벼운 '클릭 한 번으로 실행' 가능한 브라우저 경험을 유지하면서 동적 신경 처리를 가능하게 합니다. 이 플랫폼은 표준 3DGS 렌더링을 지원할 뿐만 아니라 플러그 앤 플레이 알고리즘이 매 프레임마다 가우시안을 생성하거나 업데이트할 수 있도록 하는 표준화된 가우시안 생성기 계약을 도입합니다. 이러한 추론은 순방향 생성형 후처리 적용도 가능하게 합니다. 또한 플랫폼은 기존 웹 애플리케이션에 원활하게 통합할 수 있도록 간결한 TypeScript API를 가진 three.js 라이브러리 플러그인을 제공합니다. 실험 결과, 동일한 3DGS 자산 하에서 Visionary는 GPU 기반 기본 요소 정렬 덕분에 현재 웹 뷰어 대비 우수한 렌더링 효율을 달성합니다. 이 플랫폼은 이미 MLP 기반 3DGS, 4DGS, 신경 아바타, 스타일 변환 또는 향상 네트워크를 포함한 여러 변형을 지원합니다. 브라우저에서 직접 추론과 렌더링을 통합함으로써 Visionary는 3DGS 계열 방법의 재현, 비교 및 배포 장벽을 크게 낮추며 재구성 및 생성 패러다임 모두를 위한 통합 세계 모델 캐리어 역할을 수행합니다.

English

Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.