分位点レンダリング：3次元ガウススプラッティングへの高次元特徴の効率的埋め込み

要旨

近年のコンピュータビジョンの進展により、3Dガウシアンスプラッティング（3D-GS）を活用して、オープン語彙セグメンテーション（OVS）が3D領域へ拡張されつつある。しかしながら、この進歩にも関わらず、オープン語彙クエリに必要とされる高次元特徴量を効率的にレンダリングすることは重大な課題である。既存手法はコードブックや特徴量圧縮を採用するが、これらは情報損失を引き起こし、セグメンテーション品質の低下を招く。この問題を解決するため、我々はQuantile Rendering (Q-Render) を提案する。これは、高次元特徴量を高忠実度を維持しながら効率的に扱う、3Dガウシアンに対する新たなレンダリング手法である。従来のボリュームレンダリングが光線と交差する全ての3Dガウシアンを密にサンプリングするのに対し、Q-Renderは光線上で支配的な影響を持つもののみを疎にサンプリングする。本Q-Renderを汎化可能な3Dニューラルネットワークに統合することで、ガウシアン特徴量を汎化的に予測するGaussian Splatting Network (GS-Net) も提案する。ScanNetおよびLeRFにおける大規模な実験により、本フレームワークが最先端手法を凌駕し、512次元特徴マップにおいて約43.7倍の高速化を実現しつつ、リアルタイムレンダリングを可能にすることを実証した。コードは公開予定である。

English

Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate ~43.7x speedup on 512-D feature maps. Code will be made publicly available.

分位点レンダリング：3次元ガウススプラッティングへの高次元特徴の効率的埋め込み

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

要旨

Support