FlowAnchor : Stabiliser le signal d'édition pour la modification vidéo sans inversion

Résumé

Nous proposons FlowAnchor, un cadre sans entraînement pour l'édition vidéo stable et efficace, sans inversion et basée sur les flux. Les méthodes d'édition sans inversion ont récemment démontré une efficacité et une préservation structurelle impressionnantes pour les images en orientant directement la trajectoire d'échantillonnage avec un signal d'édition. Cependant, étendre ce paradigme aux vidéos reste difficile, échouant souvent dans les scènes multi-objets ou avec un nombre accru d'images. Nous identifions la cause racine comme étant l'instabilité du signal d'édition dans les espaces latents vidéo de haute dimension, qui provient d'une localisation spatiale imprécise et d'une atténuation de l'amplitude induite par la longueur. Pour surmonter ce défi, FlowAnchor ancre explicitement à la fois l'endroit à éditer et l'intensité de l'édition. Il introduit un Raffinement Attentionnel Conscient de l'Espace, qui impose un alignement cohérent entre le guidage textuel et les régions spatiales, et une Modulation d'Amplitude Adaptative, qui préserve de manière adaptative une force d'édition suffisante. Ensemble, ces mécanismes stabilisent le signal d'édition et guident l'évolution basée sur les flux vers la distribution cible souhaitée. Des expériences approfondies démontrent que FlowAnchor permet une édition vidéo plus fidèle, temporellement cohérente et efficace sur le plan informatique dans des scénarios difficiles incluant des objets multiples et des mouvements rapides. La page du projet est disponible à l'adresse https://cuc-mipg.github.io/FlowAnchor.github.io/.

English

We propose FlowAnchor, a training-free framework for stable and efficient inversion-free, flow-based video editing. Inversion-free editing methods have recently shown impressive efficiency and structure preservation in images by directly steering the sampling trajectory with an editing signal. However, extending this paradigm to videos remains challenging, often failing in multi-object scenes or with increased frame counts. We identify the root cause as the instability of the editing signal in high-dimensional video latent spaces, which arises from imprecise spatial localization and length-induced magnitude attenuation. To overcome this challenge, FlowAnchor explicitly anchors both where to edit and how strongly to edit. It introduces Spatial-aware Attention Refinement, which enforces consistent alignment between textual guidance and spatial regions, and Adaptive Magnitude Modulation, which adaptively preserves sufficient editing strength. Together, these mechanisms stabilize the editing signal and guide the flow-based evolution toward the desired target distribution. Extensive experiments demonstrate that FlowAnchor achieves more faithful, temporally coherent, and computationally efficient video editing across challenging multi-object and fast-motion scenarios. The project page is available at https://cuc-mipg.github.io/FlowAnchor.github.io/.

FlowAnchor : Stabiliser le signal d'édition pour la modification vidéo sans inversion

FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

Résumé

Support