ChatPaper.aiChatPaper

MotionEdit:以运动为核心的图像编辑基准测试与学习

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

December 11, 2025
作者: Yixin Wan, Lei Ke, Wenhao Yu, Kai-Wei Chang, Dong Yu
cs.AI

摘要

我们推出MotionEdit——一个专注于运动中心图像编辑的新型数据集,该任务旨在修改主体动作与交互的同时保持身份特征、结构完整性和物理合理性。与现有专注于静态外观修改或仅包含稀疏低质量运动编辑的数据集不同,MotionEdit通过从连续视频中提取并验证的真实运动变换,提供了描绘逼真运动转换的高保真图像对。这一新任务不仅具有科学挑战性,更具备实际意义,可支撑帧控视频合成与动画生成等下游应用。 为评估模型在这一新任务上的表现,我们提出MotionEdit-Bench基准测试,该基准通过运动中心编辑任务挑战模型性能,并采用生成式、判别式及偏好度量的多维评估体系。基准测试结果表明,运动编辑对当前最先进的基于扩散的编辑模型仍具极大挑战。为此,我们提出MotionNFT(运动引导的负感知微调)——一种通过计算输入图像与模型编辑图像间运动流与真实运动匹配度的运动对齐奖励,引导模型实现精准运动变换的后训练框架。在FLUX.1 Kontext和Qwen-Image-Edit上的大量实验表明,MotionNFT能在不牺牲通用编辑能力的前提下,持续提升基础模型在运动编辑任务中的编辑质量与运动保真度,验证了其有效性。
English
We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibility. Unlike existing image editing datasets that focus on static appearance changes or contain only sparse, low-quality motion edits, MotionEdit provides high-fidelity image pairs depicting realistic motion transformations extracted and verified from continuous videos. This new task is not only scientifically challenging but also practically significant, powering downstream applications such as frame-controlled video synthesis and animation. To evaluate model performance on the novel task, we introduce MotionEdit-Bench, a benchmark that challenges models on motion-centric edits and measures model performance with generative, discriminative, and preference-based metrics. Benchmark results reveal that motion editing remains highly challenging for existing state-of-the-art diffusion-based editing models. To address this gap, we propose MotionNFT (Motion-guided Negative-aware Fine Tuning), a post-training framework that computes motion alignment rewards based on how well the motion flow between input and model-edited images matches the ground-truth motion, guiding models toward accurate motion transformations. Extensive experiments on FLUX.1 Kontext and Qwen-Image-Edit show that MotionNFT consistently improves editing quality and motion fidelity of both base models on the motion editing task without sacrificing general editing ability, demonstrating its effectiveness.
PDF233December 13, 2025