把时间压缩到移动视频通道中：没有时间可浪费

摘要

目前用于视频理解的架构主要基于3D卷积块或2D卷积，并附加用于时间建模的额外操作。然而，这些方法都将时间轴视为视频序列的一个独立维度，这需要大量的计算和内存资源，从而限制了它们在移动设备上的使用。本文提出将视频序列的时间轴压缩到通道维度，并提出了一种轻量级视频识别网络，称为SqueezeTime，用于移动视频理解。为了增强所提出网络的时间建模能力，我们设计了一个通道-时间学习（CTL）块来捕获序列的时间动态。该模块有两个互补分支，其中一个分支用于学习时间重要性，另一个分支具有时间位置恢复能力，以增强跨时间对象建模能力。所提出的SqueezeTime在移动视频理解方面具有更轻量级和更快速的特点，并且准确率较高。在各种视频识别和动作检测基准测试上进行了大量实验，例如Kinetics400、Kinetics600、HMDB51、AVA2.1和THUMOS14，证明了我们模型的优越性。例如，我们的SqueezeTime在Kinetics400上比先前方法实现了+1.2%的准确率和+80%的GPU吞吐量增益。代码可在以下网址公开获取：https://github.com/xinghaochen/SqueezeTime 和 https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SqueezeTime。

English

Current architectures for video understanding mainly build upon 3D convolutional blocks or 2D convolutions with additional operations for temporal modeling. However, these methods all regard the temporal axis as a separate dimension of the video sequence, which requires large computation and memory budgets and thus limits their usage on mobile devices. In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as SqueezeTime, for mobile video understanding. To enhance the temporal modeling capability of the proposed network, we design a Channel-Time Learning (CTL) Block to capture temporal dynamics of the sequence. This module has two complementary branches, in which one branch is for temporal importance learning and another branch with temporal position restoring capability is to enhance inter-temporal object modeling ability. The proposed SqueezeTime is much lightweight and fast with high accuracies for mobile video understanding. Extensive experiments on various video recognition and action detection benchmarks, i.e., Kinetics400, Kinetics600, HMDB51, AVA2.1 and THUMOS14, demonstrate the superiority of our model. For example, our SqueezeTime achieves +1.2% accuracy and +80% GPU throughput gain on Kinetics400 than prior methods. Codes are publicly available at https://github.com/xinghaochen/SqueezeTime and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SqueezeTime.

把时间压缩到移动视频通道中：没有时间可浪费

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

摘要

Support