MEMFOF:面向内存高效多帧光流估计的高分辨率训练
MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation
June 29, 2025
作者: Vladislav Bargatin, Egor Chistov, Alexander Yakovenko, Dmitriy Vatolin
cs.AI
摘要
近期光流估算技術的進步,雖提升了精確度,卻以GPU記憶體消耗的增長為代價,尤其是在處理高解析度(FullHD)輸入時。我們提出MEMFOF,一種記憶體效率高的多幀光流方法,它在多幀估算與GPU記憶體使用之間找到了優良的平衡點。值得注意的是,MEMFOF在運行時對1080p輸入僅需2.09 GB的GPU記憶體,訓練時則需28.5 GB,這使得我們的方法能夠在原生1080p解析度下進行訓練,無需裁剪或降採樣。我們系統性地重新審視了類似RAFT架構的設計選擇,整合了縮減的相關體積和高解析度訓練協議,並結合多幀估算,在多個基準測試中實現了頂尖性能,同時大幅降低了記憶體開銷。我們的方法在精確度和運行效率上均優於資源消耗更大的替代方案,驗證了其在高解析度下進行光流估算的穩健性。提交時,我們的方法在Spring基準測試中以1像素(1px)異常率3.289位居第一,在Sintel(clean)測試中以終點誤差(EPE)0.963領先,並在KITTI-2015上取得了最佳Fl-all誤差,為2.94%。相關程式碼可於https://github.com/msu-video-group/memfof 獲取。
English
Recent advances in optical flow estimation have prioritized accuracy at the
cost of growing GPU memory consumption, particularly for high-resolution
(FullHD) inputs. We introduce MEMFOF, a memory-efficient multi-frame optical
flow method that identifies a favorable trade-off between multi-frame
estimation and GPU memory usage. Notably, MEMFOF requires only 2.09 GB of GPU
memory at runtime for 1080p inputs, and 28.5 GB during training, which uniquely
positions our method to be trained at native 1080p without the need for
cropping or downsampling. We systematically revisit design choices from
RAFT-like architectures, integrating reduced correlation volumes and
high-resolution training protocols alongside multi-frame estimation, to achieve
state-of-the-art performance across multiple benchmarks while substantially
reducing memory overhead. Our method outperforms more resource-intensive
alternatives in both accuracy and runtime efficiency, validating its robustness
for flow estimation at high resolutions. At the time of submission, our method
ranks first on the Spring benchmark with a 1-pixel (1px) outlier rate of 3.289,
leads Sintel (clean) with an endpoint error (EPE) of 0.963, and achieves the
best Fl-all error on KITTI-2015 at 2.94%. The code is available at
https://github.com/msu-video-group/memfof.