全能胶水:基于基础模型指导的通用特征匹配
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
May 21, 2024
作者: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo
cs.AI
摘要
图像匹配领域不断涌现新的可学习特征匹配技术,其在传统基准测试中表现不断提升。然而,我们的研究显示,尽管取得了这些进展,但它们在面向新领域的实际应用潜力受到限制,因为它们对新领域的泛化能力有限。本文介绍了 OmniGlue,这是第一个以泛化作为核心原则设计的可学习图像匹配器。OmniGlue利用视觉基础模型的广泛知识来引导特征匹配过程,提升对训练时未见领域的泛化能力。此外,我们提出了一种新颖的关键点位置引导的注意机制,将空间和外观信息解耦,从而产生增强的匹配描述符。我们在包括场景级、物体中心和航拍图像在内的7个数据集上进行了全面实验。OmniGlue的新颖组件相对于一个直接可比的参考模型在未见领域上实现了20.9%的相对增益,同时也比最近的 LightGlue 方法相对提高了9.5%。代码和模型可在 https://hwjiang1510.github.io/OmniGlue 找到。
English
The image matching field has been witnessing a continuous emergence of novel
learnable feature matching techniques, with ever-improving performance on
conventional benchmarks. However, our investigation shows that despite these
gains, their potential for real-world applications is restricted by their
limited generalization capabilities to novel image domains. In this paper, we
introduce OmniGlue, the first learnable image matcher that is designed with
generalization as a core principle. OmniGlue leverages broad knowledge from a
vision foundation model to guide the feature matching process, boosting
generalization to domains not seen at training time. Additionally, we propose a
novel keypoint position-guided attention mechanism which disentangles spatial
and appearance information, leading to enhanced matching descriptors. We
perform comprehensive experiments on a suite of 7 datasets with varied image
domains, including scene-level, object-centric and aerial images. OmniGlue's
novel components lead to relative gains on unseen domains of 20.9% with
respect to a directly comparable reference model, while also outperforming the
recent LightGlue method by 9.5% relatively.Code and model can be found at
https://hwjiang1510.github.io/OmniGlue