MotionRAG：基于运动检索增强的图像到视频生成

摘要

随着扩散模型的进步，图像到视频生成已取得显著进展，然而生成具有真实运动感的视频仍极具挑战。这一难点源于准确建模运动的复杂性，包括捕捉物理约束、物体交互以及难以跨多样场景泛化的领域特定动态。为此，我们提出了MotionRAG，一个检索增强框架，通过上下文感知运动适应（CAMA）从相关参考视频中适配运动先验，从而提升运动真实感。关键技术创新包括：(i) 基于检索的管道，利用视频编码器和专用重采样器提取高层运动特征，以蒸馏语义运动表示；(ii) 通过因果Transformer架构实现的上下文学习运动适应方法；(iii) 基于注意力的运动注入适配器，无缝整合转移的运动特征到预训练的视频扩散模型中。大量实验表明，我们的方法在多个领域和多种基础模型上均实现了显著改进，且推理时计算开销极小。此外，模块化设计使得仅需更新检索数据库即可实现对新领域的零样本泛化，无需重新训练任何组件。本研究通过有效检索和转移运动先验，增强了视频生成系统的核心能力，促进了真实运动动态的合成。

English

Image-to-video generation has made remarkable progress with the advancements in diffusion models, yet generating videos with realistic motion remains highly challenging. This difficulty arises from the complexity of accurately modeling motion, which involves capturing physical constraints, object interactions, and domain-specific dynamics that are not easily generalized across diverse scenarios. To address this, we propose MotionRAG, a retrieval-augmented framework that enhances motion realism by adapting motion priors from relevant reference videos through Context-Aware Motion Adaptation (CAMA). The key technical innovations include: (i) a retrieval-based pipeline extracting high-level motion features using video encoder and specialized resamplers to distill semantic motion representations; (ii) an in-context learning approach for motion adaptation implemented through a causal transformer architecture; (iii) an attention-based motion injection adapter that seamlessly integrates transferred motion features into pretrained video diffusion models. Extensive experiments demonstrate that our method achieves significant improvements across multiple domains and various base models, all with negligible computational overhead during inference. Furthermore, our modular design enables zero-shot generalization to new domains by simply updating the retrieval database without retraining any components. This research enhances the core capability of video generation systems by enabling the effective retrieval and transfer of motion priors, facilitating the synthesis of realistic motion dynamics.

MotionRAG：基于运动检索增强的图像到视频生成

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

摘要

Support