深化 DINO 1.5:推進「邊緣」的開放集物體檢測
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
May 16, 2024
作者: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang
cs.AI
摘要
本文介紹了由IDEA Research開發的一套先進的開放式物體檢測模型Grounding DINO 1.5,旨在推進開放式物體檢測的“邊緣”。該套件包括兩個模型:Grounding DINO 1.5 Pro,一款高性能模型,旨在在各種場景中具有更強的泛化能力;以及Grounding DINO 1.5 Edge,一款效率高的模型,優化了在許多需要邊緣部署的應用中所需的更快速度。Grounding DINO 1.5 Pro模型通過擴展模型架構、整合增強的視覺骨幹,並將訓練數據集擴展到超過2000萬張帶有定位標註的圖像,從而實現了更豐富的語義理解。Grounding DINO 1.5 Edge模型雖然設計為效率型,降低了特徵尺度,但通過在相同的全面數據集上進行訓練,保持了強大的檢測能力。實證結果顯示了Grounding DINO 1.5的有效性,Grounding DINO 1.5 Pro模型在COCO檢測基準上達到了54.3 AP,在LVIS-minival零樣本轉移基準上達到了55.7 AP,創下了開放式物體檢測的新紀錄。此外,Grounding DINO 1.5 Edge模型在優化為TensorRT後,在LVIS-minival基準上實現了75.2 FPS的速度,同時達到了36.2 AP的零樣本性能,使其更適合邊緣計算場景。模型示例和API演示將在https://github.com/IDEA-Research/Grounding-DINO-1.5-API上發布。
English
This paper introduces Grounding DINO 1.5, a suite of advanced open-set object
detection models developed by IDEA Research, which aims to advance the "Edge"
of open-set object detection. The suite encompasses two models: Grounding DINO
1.5 Pro, a high-performance model designed for stronger generalization
capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an
efficient model optimized for faster speed demanded in many applications
requiring edge deployment. The Grounding DINO 1.5 Pro model advances its
predecessor by scaling up the model architecture, integrating an enhanced
vision backbone, and expanding the training dataset to over 20 million images
with grounding annotations, thereby achieving a richer semantic understanding.
The Grounding DINO 1.5 Edge model, while designed for efficiency with reduced
feature scales, maintains robust detection capabilities by being trained on the
same comprehensive dataset. Empirical results demonstrate the effectiveness of
Grounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 AP
on the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shot
transfer benchmark, setting new records for open-set object detection.
Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT,
achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 AP
on the LVIS-minival benchmark, making it more suitable for edge computing
scenarios. Model examples and demos with API will be released at
https://github.com/IDEA-Research/Grounding-DINO-1.5-APISummary
AI-Generated Summary