全能胶水：基于基础模型指导的通用特征匹配

摘要

图像匹配领域不断涌现新的可学习特征匹配技术，其在传统基准测试中表现不断提升。然而，我们的研究显示，尽管取得了这些进展，但它们在面向新领域的实际应用潜力受到限制，因为它们对新领域的泛化能力有限。本文介绍了 OmniGlue，这是第一个以泛化作为核心原则设计的可学习图像匹配器。OmniGlue利用视觉基础模型的广泛知识来引导特征匹配过程，提升对训练时未见领域的泛化能力。此外，我们提出了一种新颖的关键点位置引导的注意机制，将空间和外观信息解耦，从而产生增强的匹配描述符。我们在包括场景级、物体中心和航拍图像在内的7个数据集上进行了全面实验。OmniGlue的新颖组件相对于一个直接可比的参考模型在未见领域上实现了20.9%的相对增益，同时也比最近的 LightGlue 方法相对提高了9.5%。代码和模型可在 https://hwjiang1510.github.io/OmniGlue 找到。

English

The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of 7 datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of 20.9% with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by 9.5% relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue

全能胶水：基于基础模型指导的通用特征匹配

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

摘要

Support