TokenFlow：一致擴散特徵用於一致影片編輯

摘要

生成式人工智慧革命最近已擴展至影片領域。然而，目前最先進的影片模型在視覺品質和用戶對生成內容的控制方面仍遠遠落後於圖像模型。在這項工作中，我們提出了一個框架，利用文本到圖像擴散模型的能力來進行以文本驅動的影片編輯任務。具體而言，給定一個源影片和一個目標文本提示，我們的方法生成一個高質量的影片，符合目標文本，同時保留輸入影片的空間佈局和運動。我們的方法基於一個關鍵觀察，即在編輯後的影片中可以通過在擴散特徵空間中強制實現一致性來獲得一致性。我們通過根據模型中已經存在的幀間對應明確地傳播擴散特徵來實現這一點。因此，我們的框架不需要任何訓練或微調，並且可以與任何現成的文本到圖像編輯方法配合使用。我們展示了在各種現實世界影片上的最先進編輯結果。網頁：https://diffusion-tokenflow.github.io/

English

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/

TokenFlow：一致擴散特徵用於一致影片編輯

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

摘要

Support