ChatPaper.aiChatPaper

基于DINO 1.5的Grounding:推进开放集目标检测的“边缘”

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

May 16, 2024
作者: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang
cs.AI

摘要

本文介绍了由IDEA Research开发的一套先进的开放式目标检测模型Grounding DINO 1.5,旨在推动开放式目标检测的“边缘”。该套件包括两个模型:Grounding DINO 1.5 Pro,一个设计用于更广泛场景下更强泛化能力的高性能模型,以及Grounding DINO 1.5 Edge,一个针对需要边缘部署的许多应用程序中所需更快速度进行优化的高效模型。Grounding DINO 1.5 Pro模型通过扩展模型架构、集成增强的视觉骨干,并将训练数据集扩展到超过2000万张带有定位注释的图像,从而实现了更丰富的语义理解。Grounding DINO 1.5 Edge模型虽然设计为具有减小特征尺度的高效性,但通过在相同的全面数据集上进行训练,保持了强大的检测能力。实证结果表明Grounding DINO 1.5的有效性,Grounding DINO 1.5 Pro模型在COCO检测基准上达到了54.3 AP,在LVIS-minival零样本迁移基准上达到了55.7 AP,创造了开放式目标检测的新纪录。此外,Grounding DINO 1.5 Edge模型在优化使用TensorRT后,在LVIS-minival基准上达到了75.2 FPS的速度,同时实现了36.2 AP的零样本性能,使其更适用于边缘计算场景。模型示例和API演示将在https://github.com/IDEA-Research/Grounding-DINO-1.5-API发布。
English
This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model optimized for faster speed demanded in many applications requiring edge deployment. The Grounding DINO 1.5 Pro model advances its predecessor by scaling up the model architecture, integrating an enhanced vision backbone, and expanding the training dataset to over 20 million images with grounding annotations, thereby achieving a richer semantic understanding. The Grounding DINO 1.5 Edge model, while designed for efficiency with reduced feature scales, maintains robust detection capabilities by being trained on the same comprehensive dataset. Empirical results demonstrate the effectiveness of Grounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 AP on the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection. Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT, achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 AP on the LVIS-minival benchmark, making it more suitable for edge computing scenarios. Model examples and demos with API will be released at https://github.com/IDEA-Research/Grounding-DINO-1.5-API

Summary

AI-Generated Summary

PDF312December 15, 2024