互動式影片:以協同多模式指令生成以使用者為中心的可控影片
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
February 5, 2024
作者: Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue
cs.AI
摘要
我們介紹了InteractiveVideo,這是一個以使用者為中心的影片生成框架。與傳統的生成方法不同,傳統方法是基於使用者提供的圖像或文字,我們的框架設計用於動態互動,允許使用者通過各種直觀的機制在整個生成過程中指導生成模型,例如文字和圖像提示、繪畫、拖放等。我們提出了一種協同多模式指導機制,旨在將使用者的多模式指導無縫集成到生成模型中,從而促進使用者輸入與生成過程之間的合作和響應式互動。這種方法通過精確和有效的使用者指令實現了生成結果的迭代和精細調整。有了InteractiveVideo,使用者可以靈活地細致定製影片的關鍵方面。他們可以繪製參考圖像、編輯語義,並調整影片動作,直到滿足他們的要求。代碼、模型和演示可在以下鏈接找到:https://github.com/invictus717/InteractiveVideo
English
We introduce InteractiveVideo, a user-centric framework for video
generation. Different from traditional generative approaches that operate based
on user-provided images or text, our framework is designed for dynamic
interaction, allowing users to instruct the generative model through various
intuitive mechanisms during the whole generation process, e.g. text and image
prompts, painting, drag-and-drop, etc. We propose a Synergistic Multimodal
Instruction mechanism, designed to seamlessly integrate users' multimodal
instructions into generative models, thus facilitating a cooperative and
responsive interaction between user inputs and the generative process. This
approach enables iterative and fine-grained refinement of the generation result
through precise and effective user instructions. With
InteractiveVideo, users are given the flexibility to meticulously
tailor key aspects of a video. They can paint the reference image, edit
semantics, and adjust video motions until their requirements are fully met.
Code, models, and demo are available at
https://github.com/invictus717/InteractiveVideo