UFM：邁向統一密集對應流動的簡明途徑

摘要

密集圖像對應是許多應用的核心，例如視覺里程計、三維重建、物體關聯以及再識別。歷史上，儘管目標都是匹配兩幅圖像中的內容，但密集對應問題在寬基線場景和光流估計中一直是分開處理的。本文中，我們開發了一種統一流與匹配模型（UFM），該模型針對源圖像與目標圖像中共同可見的像素，在統一數據上進行訓練。UFM採用了一種簡單、通用的變換器架構，直接回歸(u,v)流。與先前工作中典型的由粗到細的成本體積相比，UFM更易於訓練，且對於大流動更為精確。UFM比最先進的光流方法（Unimatch）精確度提高了28%，同時比密集寬基線匹配器（RoMa）錯誤率降低了62%，速度提升了6.7倍。UFM首次證明了統一訓練能夠在兩個領域中超越專門化的方法。這一成果為快速、通用的對應提供了可能，並為多模態、長距離及實時對應任務開辟了新的研究方向。

English

Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

UFM：邁向統一密集對應流動的簡明途徑

UFM: A Simple Path towards Unified Dense Correspondence with Flow

摘要

Support