ChatPaper.aiChatPaper

FastVMT:消除视频动作迁移中的冗余

FastVMT: Eliminating Redundancy in Video Motion Transfer

February 5, 2026
作者: Yue Ma, Zhikai Wang, Tianhao Ren, Mingzhe Zheng, Hongyu Liu, Jiayi Guo, Mark Fong, Yuxuan Xue, Zixiang Zhao, Konrad Schindler, Qifeng Chen, Linfeng Zhang
cs.AI

摘要

视频运动迁移技术旨在根据文本提示生成视觉内容,同时传递参考视频中观测到的运动模式。现有方法主要采用扩散变换器(DiT)架构。为实现理想运行速度,部分研究尝试加速DiT计算,但未能解决结构性低效问题。本文发现并消除了早期工作中的两类计算冗余:运动冗余源于通用DiT架构未考虑帧间运动具有小幅度平滑特性;梯度冗余产生于忽略沿扩散轨迹的梯度缓变现象。针对运动冗余,我们通过掩码注意力层将交互权重计算限制在局部邻域,避免对遥远图像区域进行不必要计算。为利用梯度冗余,设计了重用历史扩散步骤梯度、跳过无效梯度计算的优化方案。实验表明,FastVMT在保持生成视频视觉保真度与时序一致性的前提下,平均实现3.43倍加速效果。
English
Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT architecture does not reflect the fact that frame-to-frame motion is small and smooth; gradient redundancy occurs if one ignores that gradients change slowly along the diffusion trajectory. To mitigate motion redundancy, we mask the corresponding attention layers to a local neighborhood such that interaction weights are not computed unnecessarily distant image regions. To exploit gradient redundancy, we design an optimization scheme that reuses gradients from previous diffusion steps and skips unwarranted gradient computations. On average, FastVMT achieves a 3.43x speedup without degrading the visual fidelity or the temporal consistency of the generated videos.
PDF12February 7, 2026