ChatPaper.aiChatPaper

YOLO-Master:基于专家级Transformer与混合专家架构加速的增强型实时检测系统

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

December 29, 2025
作者: Xu Lin, Jinlong Peng, Zhenye Gan, Jiawen Zhu, Jun Liu
cs.AI

摘要

现有实时目标检测(RTOD)方法普遍采用类YOLO架构,以兼顾精度与速度的优越平衡。然而,这些模型依赖静态密集计算机制,对所有输入实施统一处理流程,导致表征能力与计算资源的错配——例如对简单场景过度分配资源,而对复杂场景处理不足。这种不匹配既造成计算冗余,又导致检测性能次优化。为突破此局限,我们提出YOLO-Master新型类YOLO框架,首次在RTOD中实现实例条件化的自适应计算。该框架通过引入高效稀疏专家混合(ES-MoE)模块,能根据输入场景复杂度动态分配计算资源。其核心在于采用轻量级动态路由网络,通过多样性增强目标引导专家在训练过程中实现专业化分工,促进专家间形成互补性专长。此外,路由网络自适应学习仅激活最相关专家,从而在提升检测性能的同时最小化推理计算开销。在五大基准数据集上的综合实验验证了YOLO-Master的优越性:在MS COCO数据集上,我们的模型以1.62毫秒延迟实现42.4%的AP指标,较YOLOv13-N提升0.8% mAP且推理速度加快17.8%。值得注意的是,该模型在挑战性密集场景中提升尤为显著,同时保持对常规输入的高效处理能力,并始终维持实时推理速度。代码将开源发布。
English
Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance. To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through a Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference. Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code will be available.
PDF71December 31, 2025