UFM：通向统一稠密对应关系的简洁路径——基于光流

摘要

密集图像对应是众多应用的核心，如视觉里程计、三维重建、物体关联与重识别。历史上，尽管目标均为两幅图像间内容的匹配，但密集对应问题在宽基线场景与光流估计中一直分别处理。本文提出了一种统一流与匹配模型（UFM），该模型在源图像与目标图像中共同可见的像素上进行统一数据训练。UFM采用了一种简单通用的Transformer架构，直接回归(u,v)流。相较于以往工作中典型的由粗到细的成本体积方法，UFM更易于训练，且在大流量估计上更为精确。UFM在光流方法（Unimatch）上精度提升了28%，同时相较于密集宽基线匹配器（RoMa）误差减少了62%，速度提升了6.7倍。UFM首次证明了统一训练能够在两个领域均超越专门化方法。这一成果为快速、通用的对应关系开辟了道路，并为多模态、长距离及实时对应任务指明了新方向。

English

Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

UFM：通向统一稠密对应关系的简洁路径——基于光流

UFM: A Simple Path towards Unified Dense Correspondence with Flow

摘要

Support