交互式视频：用户为中心的可控视频生成与协同多模态指令

摘要

我们介绍了InteractiveVideo，这是一个面向用户的视频生成框架。与传统的生成方法不同，传统方法是基于用户提供的图像或文本进行操作，我们的框架设计用于动态交互，允许用户通过各种直观的机制在整个生成过程中指导生成模型，例如文本和图像提示，绘画，拖放等。我们提出了一种协同多模态指导机制，旨在将用户的多模态指导无缝集成到生成模型中，从而促进用户输入和生成过程之间的合作和响应式交互。这种方法通过精确和有效的用户指导实现了生成结果的迭代和精细化改进。通过InteractiveVideo，用户可以灵活地精心定制视频的关键方面。他们可以绘制参考图像，编辑语义，并调整视频动作，直到满足他们的要求为止。代码、模型和演示可在以下网址找到：https://github.com/invictus717/InteractiveVideo

English

We introduce InteractiveVideo, a user-centric framework for video generation. Different from traditional generative approaches that operate based on user-provided images or text, our framework is designed for dynamic interaction, allowing users to instruct the generative model through various intuitive mechanisms during the whole generation process, e.g. text and image prompts, painting, drag-and-drop, etc. We propose a Synergistic Multimodal Instruction mechanism, designed to seamlessly integrate users' multimodal instructions into generative models, thus facilitating a cooperative and responsive interaction between user inputs and the generative process. This approach enables iterative and fine-grained refinement of the generation result through precise and effective user instructions. With InteractiveVideo, users are given the flexibility to meticulously tailor key aspects of a video. They can paint the reference image, edit semantics, and adjust video motions until their requirements are fully met. Code, models, and demo are available at https://github.com/invictus717/InteractiveVideo

交互式视频：用户为中心的可控视频生成与协同多模态指令

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

摘要

Support