跟隨任何事物:實時開放式偵測、追蹤和跟隨
Follow Anything: Open-set detection, tracking, and following in real-time
August 10, 2023
作者: Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus
cs.AI
摘要
追蹤和跟踪感興趣的物件對於幾個機器人技術的應用至關重要,從工業自動化到物流和倉儲,再到醫療保健和安全領域。本文介紹了一個機器人系統,可以實時檢測、追蹤和跟隨任何物件。我們的方法被稱為“跟隨任何物件”(FAn),是一個開放詞彙和多模型模型 —— 不僅限於訓練時見過的概念,並且可以應用於推論時使用文本、圖像或點擊查詢的新類別。利用來自大規模預訓練模型(基礎模型)的豐富視覺描述符,FAn 可以通過將多模式查詢(文本、圖像、點擊)與輸入圖像序列進行匹配來檢測和分割物件。這些檢測和分割的物件在圖像幀之間被追蹤,同時考慮遮擋和物件再次出現。我們在一個實際的機器人系統(微型空中載具)上展示了 FAn,並報告了它在實時控制迴路中無縫跟隨感興趣物件的能力。FAn 可以部署在配備輕量級(6-8 GB)顯卡的筆記本電腦上,實現每秒 6-20 幀的吞吐量。為了促進快速採用、部署和擴展性,我們在我們的項目網頁 https://github.com/alaamaalouf/FollowAnything 上開源了所有代碼。我們還鼓勵讀者觀看我們的 5 分鐘解說視頻,連結在此 https://www.youtube.com/watch?v=6Mgt3EPytrw 。
English
Tracking and following objects of interest is critical to several robotics
use cases, ranging from industrial automation to logistics and warehousing, to
healthcare and security. In this paper, we present a robotic system to detect,
track, and follow any object in real-time. Our approach, dubbed ``follow
anything'' (FAn), is an open-vocabulary and multimodal model -- it is not
restricted to concepts seen at training time and can be applied to novel
classes at inference time using text, images, or click queries. Leveraging rich
visual descriptors from large-scale pre-trained models (foundation models), FAn
can detect and segment objects by matching multimodal queries (text, images,
clicks) against an input image sequence. These detected and segmented objects
are tracked across image frames, all while accounting for occlusion and object
re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial
vehicle) and report its ability to seamlessly follow the objects of interest in
a real-time control loop. FAn can be deployed on a laptop with a lightweight
(6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To
enable rapid adoption, deployment, and extensibility, we open-source all our
code on our project webpage at https://github.com/alaamaalouf/FollowAnything .
We also encourage the reader the watch our 5-minutes explainer video in this
https://www.youtube.com/watch?v=6Mgt3EPytrw .