MEMFOF: 메모리 효율적인 다중 프레임 광학 흐름 추정을 위한 고해상도 학습

초록

최근 광학 흐름(optical flow) 추정 기술의 발전은 정확도를 우선시하면서 GPU 메모리 소비량이 증가하는 경향을 보여왔으며, 특히 고해상도(FullHD) 입력에 있어서 더욱 두드러졌다. 본 연구에서는 다중 프레임 추정과 GPU 메모리 사용 간의 최적의 균형을 찾아내는 메모리 효율적인 다중 프레임 광학 흐름 방법인 MEMFOF를 소개한다. 특히, MEMFOF는 1080p 입력에 대해 런타임 시 단 2.09GB의 GPU 메모리를 요구하며, 학습 시에는 28.5GB를 사용한다. 이는 본 방법이 크롭핑(cropping)이나 다운샘플링(downsampling) 없이도 네이티브 1080p 해상도로 학습할 수 있음을 의미한다. 우리는 RAFT와 유사한 아키텍처의 설계 선택을 체계적으로 재검토하여, 감소된 상관 볼륨(correlation volume)과 고해상도 학습 프로토콜을 다중 프레임 추정과 통합함으로써, 메모리 오버헤드를 크게 줄이면서도 여러 벤치마크에서 최첨단 성능을 달성하였다. 본 방법은 정확도와 런타임 효율성 모두에서 더 많은 자원을 요구하는 대안들을 능가하며, 고해상도에서의 흐름 추정에 있어서의 견고성을 입증하였다. 제출 시점 기준으로, 본 방법은 Spring 벤치마크에서 1픽셀(1px) 이상 오차율(outlier rate) 3.289로 1위를 차지하였으며, Sintel(clean)에서 엔드포인트 오차(endpoint error, EPE) 0.963으로 선두를 달리고 있고, KITTI-2015에서 Fl-all 오차 2.94%로 최고 성적을 기록하였다. 코드는 https://github.com/msu-video-group/memfof에서 확인할 수 있다.

English

Recent advances in optical flow estimation have prioritized accuracy at the cost of growing GPU memory consumption, particularly for high-resolution (FullHD) inputs. We introduce MEMFOF, a memory-efficient multi-frame optical flow method that identifies a favorable trade-off between multi-frame estimation and GPU memory usage. Notably, MEMFOF requires only 2.09 GB of GPU memory at runtime for 1080p inputs, and 28.5 GB during training, which uniquely positions our method to be trained at native 1080p without the need for cropping or downsampling. We systematically revisit design choices from RAFT-like architectures, integrating reduced correlation volumes and high-resolution training protocols alongside multi-frame estimation, to achieve state-of-the-art performance across multiple benchmarks while substantially reducing memory overhead. Our method outperforms more resource-intensive alternatives in both accuracy and runtime efficiency, validating its robustness for flow estimation at high resolutions. At the time of submission, our method ranks first on the Spring benchmark with a 1-pixel (1px) outlier rate of 3.289, leads Sintel (clean) with an endpoint error (EPE) of 0.963, and achieves the best Fl-all error on KITTI-2015 at 2.94%. The code is available at https://github.com/msu-video-group/memfof.

MEMFOF: 메모리 효율적인 다중 프레임 광학 흐름 추정을 위한 고해상도 학습

MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation

초록

Support