ChatPaper.aiChatPaper

FastVMT:视频动作迁移中的冗余消除技术

FastVMT: Eliminating Redundancy in Video Motion Transfer

February 5, 2026
作者: Yue Ma, Zhikai Wang, Tianhao Ren, Mingzhe Zheng, Hongyu Liu, Jiayi Guo, Mark Fong, Yuxuan Xue, Zixiang Zhao, Konrad Schindler, Qifeng Chen, Linfeng Zhang
cs.AI

摘要

影片動作遷移技術旨在根據文字提示生成視覺內容,同時複製參考影片中觀察到的運動模式來合成影片。現有方法主要採用擴散轉換器(DiT)架構,為實現理想運行效率,多種方法嘗試加速DiT計算,但未能解決結構性低效問題。本研究識別並消除了早期工作中的兩類計算冗餘:運動冗餘源於通用DiT架構未考慮幀間運動具備微小平滑的特性;梯度冗餘則因忽略擴散軌跡中梯度變化緩慢而產生。為緩解運動冗餘,我們對相應注意力層實施局部鄰域遮罩,避免為不相關的遠程圖像區域計算交互權重。針對梯度冗餘,設計了重用過往擴散步驟梯度、跳過非必要梯度計算的優化方案。實驗表明,FastVMT在保持生成影片視覺保真度與時間一致性的前提下,平均實現3.43倍加速效果。
English
Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT architecture does not reflect the fact that frame-to-frame motion is small and smooth; gradient redundancy occurs if one ignores that gradients change slowly along the diffusion trajectory. To mitigate motion redundancy, we mask the corresponding attention layers to a local neighborhood such that interaction weights are not computed unnecessarily distant image regions. To exploit gradient redundancy, we design an optimization scheme that reuses gradients from previous diffusion steps and skips unwarranted gradient computations. On average, FastVMT achieves a 3.43x speedup without degrading the visual fidelity or the temporal consistency of the generated videos.
PDF12February 7, 2026