UFM:通向统一稠密对应关系的简洁路径——基于光流
UFM: A Simple Path towards Unified Dense Correspondence with Flow
June 10, 2025
作者: Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, Wenshan Wang
cs.AI
摘要
密集图像对应是众多应用的核心,如视觉里程计、三维重建、物体关联与重识别。历史上,尽管目标均为两幅图像间内容的匹配,但密集对应问题在宽基线场景与光流估计中一直分别处理。本文提出了一种统一流与匹配模型(UFM),该模型在源图像与目标图像中共同可见的像素上进行统一数据训练。UFM采用了一种简单通用的Transformer架构,直接回归(u,v)流。相较于以往工作中典型的由粗到细的成本体积方法,UFM更易于训练,且在大流量估计上更为精确。UFM在光流方法(Unimatch)上精度提升了28%,同时相较于密集宽基线匹配器(RoMa)误差减少了62%,速度提升了6.7倍。UFM首次证明了统一训练能够在两个领域均超越专门化方法。这一成果为快速、通用的对应关系开辟了道路,并为多模态、长距离及实时对应任务指明了新方向。
English
Dense image correspondence is central to many applications, such as visual
odometry, 3D reconstruction, object association, and re-identification.
Historically, dense correspondence has been tackled separately for
wide-baseline scenarios and optical flow estimation, despite the common goal of
matching content between two images. In this paper, we develop a Unified Flow &
Matching model (UFM), which is trained on unified data for pixels that are
co-visible in both source and target images. UFM uses a simple, generic
transformer architecture that directly regresses the (u,v) flow. It is easier
to train and more accurate for large flows compared to the typical
coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than
state-of-the-art flow methods (Unimatch), while also having 62% less error and
6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to
demonstrate that unified training can outperform specialized approaches across
both domains. This result enables fast, general-purpose correspondence and
opens new directions for multi-modal, long-range, and real-time correspondence
tasks.