ChatPaper.aiChatPaper

Surflo:具有全局状态的一致3D表面流模型

Surflo: Consistent 3D Surface Flow Model with Global State

June 11, 2026
作者: Antoine Guédon, Shu Nakamura, Nicolas Dufour, Jiahui Lei, Ko Nishino, Angjoo Kanazawa
cs.AI

摘要

几何信息不受视角影响,这使得任何图像集合都是对单一三维状态的冗余编码。现有前馈重建模型未能利用这一点:逐视角方法会产生重叠且未对齐的点图,其规模随输入数量线性增长;而全局隐变量方法则限定于固定的低分辨率输出。我们提出Surflo,它将任意数量的无位姿RGB视角压缩为K个隐变量token——即单一全局状态——并通过流匹配将噪声点独立传输至表面,从而解码出有向三维表面点。这使输出摆脱了固定网格或token预算的束缚:同一隐变量可在单次前向过程中生成从数千到百万不等的点。为了抑制独立逐点解码固有的局部不一致性,我们引入一种推理时引导项,通过在ODE积分过程中注入光度梯度来关联邻近点。在表面指标上,Surflo达到或超越了前馈基线方法,其运行速度比需要数百视角的优化方法快一个数量级,并且是唯一将全局隐变量与任意分辨率解码相结合的前馈方法。
English
Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.