SAM 2:图像和视频中的任意物体分割
SAM 2: Segment Anything in Images and Videos
August 1, 2024
作者: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer
cs.AI
摘要
我们提出了Segment Anything Model 2(SAM 2),这是解决图像和视频中可提示分割问题的基础模型。我们构建了一个数据引擎,通过用户交互改进模型和数据,以收集迄今为止最大的视频分割数据集。我们的模型是一个简单的变压器架构,具有用于实时视频处理的流式内存。在我们的数据上训练的SAM 2在各种任务中表现出色。在视频分割中,我们观察到比先前方法更准确的结果,使用的交互次数减少了3倍。在图像分割中,我们的模型比Segment Anything Model(SAM)更准确且快6倍。我们相信我们的数据、模型和见解将成为视频分割和相关感知任务的重要里程碑。我们将发布我们模型的一个版本、数据集和一个交互式演示。
English
We present Segment Anything Model 2 (SAM 2), a foundation model towards
solving promptable visual segmentation in images and videos. We build a data
engine, which improves model and data via user interaction, to collect the
largest video segmentation dataset to date. Our model is a simple transformer
architecture with streaming memory for real-time video processing. SAM 2
trained on our data provides strong performance across a wide range of tasks.
In video segmentation, we observe better accuracy, using 3x fewer interactions
than prior approaches. In image segmentation, our model is more accurate and 6x
faster than the Segment Anything Model (SAM). We believe that our data, model,
and insights will serve as a significant milestone for video segmentation and
related perception tasks. We are releasing a version of our model, the dataset
and an interactive demo.Summary
AI-Generated Summary