RF-DETR:面向即時檢測變換器的神經網絡架構搜索
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
November 12, 2025
作者: Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, Neehar Peri
cs.AI
摘要
開放詞彙檢測器在COCO數據集上表現卓越,但往往難以泛化至包含預訓練未見分佈外類別的現實世界數據集。與其直接對重型視覺語言模型進行新領域微調,我們提出RF-DETR——一種輕量級專用檢測變換器,通過權重共享神經架構搜索為任意目標數據集生成精度-延遲帕累托曲線。我們的方法在目標數據集上微調預訓練基礎網絡,無需重新訓練即可評估數千種具有不同精度-延遲權衡的網絡配置。此外,我們重新審視NAS的"可調節參數"以提升DETR模型在多元目標領域的遷移能力。值得注意的是,RF-DETR在COCO和Roboflow100-VL數據集上顯著超越先前最先進的實時檢測方法:RF-DETR(nano)在COCO上達到48.0 AP,以相近延遲擊敗D-FINE(nano)達5.3 AP;而RF-DETR(2x-large)在Roboflow100-VL上以20倍運行速度超越GroundingDINO(tiny)達1.2 AP。據我們所知,RF-DETR(2x-large)是首個在COCO上突破60 AP的實時檢測器。代碼已開源於:https://github.com/roboflow/rf-detr
English
Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the "tunable knobs" for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves on prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is at https://github.com/roboflow/rf-detr