PhysRVG: Physics-Aware Unified Reinforcement Learning voor Videogeneratieve Modellen

Samenvatting

Fysische principes zijn fundamenteel voor realistische visuele simulatie, maar blijven een significante tekortkoming in transformator-gebaseerde videogeneratie. Deze kloof benadrukt een kritische beperking in het weergeven van starre-lichamenbeweging, een kernprincipe van de klassieke mechanica. Terwijl computergraphics en fysica-gebaseerde simulators dergelijke botsingen eenvoudig kunnen modelleren met Newton-formules, verwerpen moderne pretrain-finetune paradigma's het concept van starre lichamen tijdens pixelgewijze globale denoising. Zelfs perfect correcte wiskundige beperkingen worden behandeld als suboptimale oplossingen (d.w.z. condities) tijdens modeloptimalisatie na training, wat de fysische realiteit van gegenereerde video's fundamenteel beperkt. Gemotiveerd door deze overwegingen introduceren wij, voor het eerst, een fysica-bewust reinforcement learning paradigma voor videogeneratiemodellen dat fysische botsingsregels direct afdwingt in hoogdimensionale ruimten, zodat fysicakennis strikt wordt toegepast in plaats van behandeld als condities. Vervolgens breiden we dit paradigma uit naar een uniform raamwerk, genaamd Mimicry-Discovery Cycle (MDcycle), dat substantiële fine-tuning mogelijk maakt terwijl het vermogen van het model om fysica-gebaseerde feedback te benutten volledig behouden blijft. Om onze aanpak te valideren, construeren we de nieuwe benchmark PhysRVGBench en voeren we uitgebreide kwalitatieve en kwantitatieve experimenten uit om de effectiviteit grondig te beoordelen.

English

Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation. This gap highlights a critical limitation in rendering rigid body motion, a core tenet of classical mechanics. While computer graphics and physics-based simulators can easily model such collisions using Newton formulas, modern pretrain-finetune paradigms discard the concept of object rigidity during pixel-level global denoising. Even perfectly correct mathematical constraints are treated as suboptimal solutions (i.e., conditions) during model optimization in post-training, fundamentally limiting the physical realism of generated videos. Motivated by these considerations, we introduce, for the first time, a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces, ensuring the physics knowledge is strictly applied rather than treated as conditions. Subsequently, we extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning while fully preserving the model's ability to leverage physics-grounded feedback. To validate our approach, we construct new benchmark PhysRVGBench and perform extensive qualitative and quantitative experiments to thoroughly assess its effectiveness.

PhysRVG: Physics-Aware Unified Reinforcement Learning voor Videogeneratieve Modellen

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Samenvatting

Support