OmniGlue:具有基礎模型引導的通用特徵匹配
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
May 21, 2024
作者: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo
cs.AI
摘要
在影像匹配領域中,我們不斷見證到新穎的可學習特徵匹配技術的出現,這些技術在傳統基準測試中的表現不斷提升。然而,我們的研究顯示,儘管取得這些進展,這些方法在應用於真實世界時受限於其對新型影像領域的有限泛化能力。本文介紹了 OmniGlue,這是第一個以泛化為核心原則設計的可學習影像匹配器。OmniGlue利用視覺基礎模型的廣泛知識來引導特徵匹配過程,提升對訓練時未見過的領域的泛化能力。此外,我們提出了一種新穎的關鍵點位置引導的注意機制,可以區分空間和外觀信息,從而提高匹配描述符的性能。我們在包括場景級、物體中心和航拍影像在內的7個數據集上進行了全面的實驗。相對於一個直接可比的參考模型,OmniGlue的新穎組件使在未見過的領域取得了20.9%的相對增益,同時也優於最近的 LightGlue 方法9.5%的相對增益。程式碼和模型可在 https://hwjiang1510.github.io/OmniGlue 找到。
English
The image matching field has been witnessing a continuous emergence of novel
learnable feature matching techniques, with ever-improving performance on
conventional benchmarks. However, our investigation shows that despite these
gains, their potential for real-world applications is restricted by their
limited generalization capabilities to novel image domains. In this paper, we
introduce OmniGlue, the first learnable image matcher that is designed with
generalization as a core principle. OmniGlue leverages broad knowledge from a
vision foundation model to guide the feature matching process, boosting
generalization to domains not seen at training time. Additionally, we propose a
novel keypoint position-guided attention mechanism which disentangles spatial
and appearance information, leading to enhanced matching descriptors. We
perform comprehensive experiments on a suite of 7 datasets with varied image
domains, including scene-level, object-centric and aerial images. OmniGlue's
novel components lead to relative gains on unseen domains of 20.9% with
respect to a directly comparable reference model, while also outperforming the
recent LightGlue method by 9.5% relatively.Code and model can be found at
https://hwjiang1510.github.io/OmniGlue