RF-DETR:面向实时检测变换器的神经架构搜索
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
November 12, 2025
作者: Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, Neehar Peri
cs.AI
摘要
开放词汇检测器在COCO数据集上表现优异,但往往难以泛化到包含预训练中不常见的分布外类别的真实世界数据集。与直接对重型视觉语言模型进行新领域微调不同,我们提出了RF-DETR——一种轻量级专用检测变换器,通过权重共享神经架构搜索为任意目标数据集发现精度-延迟帕累托曲线。我们的方法在目标数据集上微调预训练基础网络,无需重新训练即可评估数千种具有不同精度-延迟权衡的网络配置。此外,我们重新审视了NAS的"可调节旋钮"以提升DETR模型向不同目标领域的可迁移性。值得注意的是,RF-DETR在COCO和Roboflow100-VL数据集上显著超越了现有最先进的实时检测方法。RF-DETR(纳米版)在COCO上达到48.0 AP,在相近延迟下比D-FINE(纳米版)高出5.3 AP;RF-DETR(2倍大版)在Roboflow100-VL上以20倍速运行同时比GroundingDINO(微型版)高出1.2 AP。据我们所知,RF-DETR(2倍大版)是首个在COCO上突破60 AP的实时检测器。代码已开源:https://github.com/roboflow/rf-detr
English
Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the "tunable knobs" for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves on prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is at https://github.com/roboflow/rf-detr