Surflo：具有全局状态的一致3D表面流模型

摘要

几何信息不受视角影响，这使得任何图像集合都是对单一三维状态的冗余编码。现有前馈重建模型未能利用这一点：逐视角方法会产生重叠且未对齐的点图，其规模随输入数量线性增长；而全局隐变量方法则限定于固定的低分辨率输出。我们提出Surflo，它将任意数量的无位姿RGB视角压缩为K个隐变量token——即单一全局状态——并通过流匹配将噪声点独立传输至表面，从而解码出有向三维表面点。这使输出摆脱了固定网格或token预算的束缚：同一隐变量可在单次前向过程中生成从数千到百万不等的点。为了抑制独立逐点解码固有的局部不一致性，我们引入一种推理时引导项，通过在ODE积分过程中注入光度梯度来关联邻近点。在表面指标上，Surflo达到或超越了前馈基线方法，其运行速度比需要数百视角的优化方法快一个数量级，并且是唯一将全局隐变量与任意分辨率解码相结合的前馈方法。

English

Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.