CLiFT:用於計算高效與自適應神經渲染的壓縮光場令牌
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
July 11, 2025
作者: Zhengqing Wang, Yuefan Wu, Jiacheng Chen, Fuyang Zhang, Yasutaka Furukawa
cs.AI
摘要
本文提出了一種神經渲染方法,將場景表示為“壓縮光場標記(CLiFTs)”,保留了場景豐富的外觀和幾何信息。CLiFT通過壓縮標記實現了計算高效的渲染,同時能夠改變標記數量以表示場景或使用一個訓練好的網絡渲染新視角。具體而言,給定一組圖像,多視角編碼器利用相機姿態對圖像進行標記化。潛在空間K均值算法使用這些標記選取一組減少的射線作為聚類中心。多視角“壓縮器”將所有標記的信息壓縮至中心標記,從而構建CLiFTs。在測試時,給定目標視角和計算預算(即CLiFTs的數量),系統收集指定數量的鄰近標記,並使用計算自適應渲染器合成新視角。在RealEstate10K和DL3DV數據集上的大量實驗定量和定性驗證了我們的方法,在保持可比渲染質量的同時實現了顯著的數據壓縮,並獲得了最高的總體渲染評分,同時提供了數據大小、渲染質量和渲染速度之間的權衡。
English
This paper proposes a neural rendering approach that represents a scene as
"compressed light-field tokens (CLiFTs)", retaining rich appearance and
geometric information of a scene. CLiFT enables compute-efficient rendering by
compressed tokens, while being capable of changing the number of tokens to
represent a scene or render a novel view with one trained network. Concretely,
given a set of images, multi-view encoder tokenizes the images with the camera
poses. Latent-space K-means selects a reduced set of rays as cluster centroids
using the tokens. The multi-view ``condenser'' compresses the information of
all the tokens into the centroid tokens to construct CLiFTs. At test time,
given a target view and a compute budget (i.e., the number of CLiFTs), the
system collects the specified number of nearby tokens and synthesizes a novel
view using a compute-adaptive renderer. Extensive experiments on RealEstate10K
and DL3DV datasets quantitatively and qualitatively validate our approach,
achieving significant data reduction with comparable rendering quality and the
highest overall rendering score, while providing trade-offs of data size,
rendering quality, and rendering speed.