CLiFT：用于计算高效与自适应神经渲染的压缩光场令牌

摘要

本文提出了一种神经渲染方法，将场景表示为“压缩光场标记（CLiFTs）”，保留了场景丰富的表观和几何信息。CLiFT通过压缩标记实现了计算高效的渲染，同时能够调整标记数量以表示场景或利用单一训练网络渲染新视角。具体而言，给定一组图像，多视图编码器结合相机姿态对图像进行标记化处理。潜在空间K均值算法利用这些标记选择一组缩减后的光线作为聚类中心。多视图“压缩器”将所有标记的信息压缩至中心标记，从而构建CLiFTs。在测试阶段，给定目标视角和计算预算（即CLiFTs的数量），系统收集指定数量的邻近标记，并使用计算自适应渲染器合成新视角。在RealEstate10K和DL3DV数据集上的大量实验从定量和定性角度验证了我们的方法，在保持可比渲染质量的同时实现了显著的数据压缩，并获得了最高的整体渲染评分，同时提供了数据大小、渲染质量与渲染速度之间的权衡。

English

This paper proposes a neural rendering approach that represents a scene as "compressed light-field tokens (CLiFTs)", retaining rich appearance and geometric information of a scene. CLiFT enables compute-efficient rendering by compressed tokens, while being capable of changing the number of tokens to represent a scene or render a novel view with one trained network. Concretely, given a set of images, multi-view encoder tokenizes the images with the camera poses. Latent-space K-means selects a reduced set of rays as cluster centroids using the tokens. The multi-view ``condenser'' compresses the information of all the tokens into the centroid tokens to construct CLiFTs. At test time, given a target view and a compute budget (i.e., the number of CLiFTs), the system collects the specified number of nearby tokens and synthesizes a novel view using a compute-adaptive renderer. Extensive experiments on RealEstate10K and DL3DV datasets quantitatively and qualitatively validate our approach, achieving significant data reduction with comparable rendering quality and the highest overall rendering score, while providing trade-offs of data size, rendering quality, and rendering speed.

CLiFT：用于计算高效与自适应神经渲染的压缩光场令牌

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

摘要

Support