RoMa v2：更强更快更密集的特征匹配

摘要

密集特征匹配旨在估计两幅3D场景图像间的所有对应关系，因其高精度与强鲁棒性已成为当前黄金标准。然而现有密集匹配器在诸多复杂现实场景中仍存在失效或性能不佳的问题，且高精度模型往往速度缓慢，限制了其应用范围。本文通过一系列系统性改进多管齐下攻克这些弱点，最终构建出性能显著提升的新模型。我们特别设计了新颖的匹配架构与损失函数，结合精心构建的多样化训练数据分布，使模型能够解决众多复杂匹配任务。通过解耦的"先匹配后优化"两阶段流程，我们进一步加速训练过程，并借助定制CUDA内核显著降低了优化阶段的内存占用。此外，我们利用近期提出的DINOv3基础模型及其他多项创新洞见，有效提升了模型的鲁棒性与无偏性。大量实验表明，这一新型匹配器创造了最新技术水准，其精度显著超越前代模型。代码已开源：https://github.com/Parskatt/romav2

English

Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2