De Hartslag van Beweging: Het Meten van de Fysieke Framesnelheid uit Visuele Dynamiek

Samenvatting

Hoewel recente generatieve videomodellen een opmerkelijke visuele realisme hebben bereikt en worden verkend als wereldmodellen, vereist echte fysische simulatie het beheersen van zowel ruimte als tijd. Huidige modellen kunnen visueel vloeiende kinematica produceren, maar ze missen een betrouwbare interne bewegingspuls om deze bewegingen te verankeren in een consistente, realistische tijdschaal. Deze temporele ambiguïteit vindt zijn oorsprong in de gangbare praktijk om ongericht te trainen op video's met sterk uiteenlopende real-world snelheden, waardoor deze worden gedwongen in gestandaardiseerde framesnelheden. Dit leidt tot wat wij chronometrische hallucinatie noemen: gegenereerde sequenties vertonen ambiguë, onstabiele en oncontroleerbare fysische bewegingssnelheden. Om dit aan te pakken, stellen wij Visual Chronometer voor, een voorspeller die de Physical Frames Per Second (PhyFPS) direct herleidt uit de visuele dynamiek van een invoervideo. Onze methode, getraind via gecontroleerde temporele hermonstering, schat de werkelijke tijdschaal in die wordt gesuggereerd door de beweging zelf, waarbij onbetrouwbare metadata worden omzeild. Om dit probleem systematisch te kwantificeren, stellen we twee benchmarks in: PhyFPS-Bench-Real en PhyFPS-Bench-Gen. Onze evaluaties onthullen een harde realiteit: state-of-the-art videogeneratoren lijden onder ernstige PhyFPS-misalignering en temporele instabiliteit. Ten slotte tonen we aan dat het toepassen van PhyFPS-correcties de door mensen waargenomen natuurlijkheid van AI-gegenereerde video's significant verbetert. Onze projectpagina is https://xiangbogaobarry.github.io/Visual_Chronometer/.

English

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.

De Hartslag van Beweging: Het Meten van de Fysieke Framesnelheid uit Visuele Dynamiek

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

Samenvatting

Support