Op-beleid Adversariële Stroomdistillatie voor Autoregressieve Videogeneratie

Samenvatting

Autoregressieve videogeneratoren zijn aantrekkelijk voor streaming, langdurige en interactieve toepassingen, maar het distilleren van sterke black-box-leraren naar causale studenten blijft moeilijk. De student moet leren onder zijn eigen rollout-distributie, terwijl praktische leraren alleen prompt-geconditioneerde voltooide video's kunnen tonen en kunnen verschillen in architectuur, capaciteit, temporeel ontwerp en bemonsteringsschema. Deze interface maakt supervised fine-tuning off-policy, op scores gebaseerde distillatie niet toepasbaar, en directe adversariële imitatie te schaars voor creditassignment tijdens denoising. Wij stellen Adversarial Flow Distillation (AFD) voor, een on-policy-raamwerk voor heterogene black-box-videodistillatie. AFD bevraagt de leraar en rolt de huidige student uit op dezelfde prompts, traint een prompt-gepaarde Bradley-Terry-discriminator om de discrepantie tussen leraar en student op schone samples te schatten, en converteert het resulterende on-policy-voordeel naar forward-process-flow-matching-updates op de eigen geruisde toestanden van de student. AFD biedt dus dichte snelheidsveldsupervisie zonder dat er lerarenscores, latents, denoisingtrajecten, stapafstemming of reverse-chain-reinforcement-learning nodig is. Experimenten met twee causale AR-studentenfamilies tonen aan dat AFD consistent de generatie van bewegings- en fysicagevoelige aspecten verbetert terwijl de algemene videokwaliteit behouden blijft, en ablatiestudies bevestigen het belang van adaptieve on-policy-feedback en forward-process-creditassignment. De methode vereist alleen schone lerarenvideo's en student-rollouts, wat een praktische route biedt voor het distilleren van propriëtaire of heterogene videogeneratoren naar efficiënte autoregressieve studenten.

English

Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout distribution, whereas practical teachers may expose only prompt-conditioned completed videos and may differ in architecture, capacity, temporal design, and sampling schedule. This interface makes supervised fine-tuning off-policy, score-based distillation inapplicable, and direct adversarial imitation too sparse for denoising-time credit assignment. We propose Adversarial Flow Distillation (AFD), an on-policy framework for heterogeneous black-box video distillation. AFD queries the teacher and rolls out the current student on the same prompts, trains a prompt-paired Bradley-Terry discriminator to estimate clean-sample teacher-student discrepancy, and converts the resulting on-policy advantage into forward-process flow-matching updates on the student's own noised states. Thus, AFD provides dense velocity-field supervision while requiring no teacher scores, latents, denoising trajectories, step alignment, or reverse-chain reinforcement learning. Experiments across two causal AR student families show that AFD consistently improves motion- and physics-sensitive generation while preserving general video quality, and ablations validate the importance of adaptive on-policy feedback and forward-process credit assignment. The method requires only clean teacher videos and student rollouts, providing a practical route for distilling proprietary or heterogeneous video generators into efficient autoregressive students.