ChatPaper.aiChatPaper

Aurora:基于工具使用智能体的统一视频编辑

Aurora: Unified Video Editing with a Tool-Using Agent

May 18, 2026
作者: Yongsheng Yu, Ziyun Zeng, Zhiyuan Xiao, Zhenghong Zhou, Hang Hua, Wei Xiong, Jiebo Luo
cs.AI

摘要

最近的视频编辑模型已收敛于统一的调控设计:一个单一的扩散Transformer联合处理文本、源视频和参考图像,并通过一组权重涵盖替换、移除、风格迁移和参考驱动的插入。这种设计灵活,但假设用户已提供模型就绪的文本、参考图像以及局部编辑的空间定位,而实际请求往往缺失这些内容。我们提出Aurora,一个智能体式视频编辑框架,将工具增强的视觉语言模型(VLM)智能体与统一的视频扩散Transformer配对。VLM智能体将原始用户请求映射为与Transformer调控通道一致的结构化编辑计划,从而在生成前解决文本和视觉上的欠指定问题。我们使用包含完整编辑计划和参考图像选择的监督数据,以及用于鲁棒工具使用和指令优化的偏好对,来训练VLM智能体。我们引入AgentEdit-Bench,用以评估在文本和视觉欠指定条件下的智能体增强视频编辑。在AgentEdit-Bench和两个现有视频编辑基准上的实验表明,Aurora相比仅依赖指令的基线方法有所改进,并且VLM智能体能迁移至兼容的冻结视频编辑模型。项目页面:https://yeates.github.io/Aurora-Page
English
Recent video editing models have converged on a unified conditioning design: a single diffusion transformer jointly consumes text, source video, and reference images, and one set of weights covers replacement, removal, style transfer, and reference-driven insertion. The design is flexible, but it assumes that the user already provides model-ready text, reference images, and spatial grounding for local edits, which real requests often omit. We present Aurora, an agentic video editing framework that pairs a tool-augmented vision-language model (VLM) agent with a unified video diffusion transformer. The VLM agent maps a raw user request to a structured edit plan aligned with the transformer's conditioning channels, thereby resolving textual and visual underspecification before generation. We train the VLM agent with supervised data for complete edit planning and reference-image selection, together with preference pairs for robust tool use and instruction refinement. We introduce AgentEdit-Bench to evaluate agent-enhanced video editing under textual and visual underspecification. Experiments on AgentEdit-Bench and two existing video editing benchmarks show that Aurora improves over instruction-only baselines and that the VLM agent transfers to compatible frozen video editing models. Project page: https://yeates.github.io/Aurora-Page