로마 v2: 더 강력하고 더 빠르며 더 조밀한 특징점 매칭

초록

고밀도 특징 매칭은 3D 장면의 두 이미지 간 모든 대응 관계를 추정하는 것을 목표로 하며, 높은 정확도와 강건성으로 인해 최근 표준 방법으로 자리잡았습니다. 그러나 기존 고밀도 매칭 방법들은 여전히 많은 까다로운 실제 시나리오에서 실패하거나 성능이 낮으며, 고정밀 모델들은 종종 속도가 느려 적용 가능성이 제한됩니다. 본 논문에서는 이러한 약점을 광범위하게 해결하기 위해 체계적인 개선 사항들을 제시하며, 이를 종합하여 상당히 향상된 모델을 구현합니다. 특히, 우리는 새로운 매칭 아키텍처와 손실 함수를 구성하고, 이를 신중하게 선별된 다양한 훈련 데이터 분포와 결합하여 모델이 많은 복잡한 매칭 작업을 해결할 수 있도록 합니다. 또한, 분리된 2단계 매칭-세밀화 파이프라인을 통해 훈련 속도를 높이고, 동시에 맞춤형 CUDA 커널을 통해 세밀화 단계의 메모리 사용량을 크게 줄입니다. 마지막으로, 최근의 DINOv3 기초 모델과 여러 다른 통찰력을 활용하여 모델을 더욱 강건하고 편향되지 않게 개선합니다. 광범위한 실험을 통해 우리의 새로운 매처가 기존 방법들보다 훨씬 정확하여 새로운 최첨단 기술을 정립함을 입증합니다. 코드는 https://github.com/Parskatt/romav2에서 확인할 수 있습니다.

English

Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2

로마 v2: 더 강력하고 더 빠르며 더 조밀한 특징점 매칭

RoMa v2: Harder Better Faster Denser Feature Matching

초록

Support