움직임의 맥박: 시각적 역학으로부터 물리적 프레임 속도 측정하기

초록

최근 생성형 비디오 모델이 놀라운 시각적 사실감을 달성하고 월드 모델로 탐구되고 있지만, 진정한 물리적 시뮬레이션은 공간과 시간을 모두 숙달해야 합니다. 현재 모델은 시각적으로 매끄러운 운동학을 생성할 수 있지만, 이러한 동작을 일관된 실제 시간 척도에 정착시키기 위한 신뢰할 수 있는 내부 운동 펄스가 부족합니다. 이러한 시간적 모호성은 실제 속도가 크게 다른 비디오를 무분별하게 표준 프레임 속도로 강제하여 학습하는 일반적인 관행에서 비롯됩니다. 이로 인해 우리가 '연대 측정 환각'이라고 명명한 현상이 발생합니다. 즉, 생성된 시퀀스가 모호하고 불안정하며 제어 불가능한 물리적 운동 속도를 보여줍니다. 이를 해결하기 위해 우리는 입력 비디오의 시각적 역학에서 직접 물리적 초당 프레임 수(PhyFPS)를 복원하는 예측기인 Visual Chronometer를 제안합니다. 제어된 시간 리샘플링을 통해 학습된 우리의 방법은 신뢰할 수 없는 메타데이터를 우회하여 운동 자체가 함의하는 진정한 시간 척도를 추정합니다. 이 문제를 체계적으로 정량화하기 위해 PhyFPS-Bench-Real과 PhyFPS-Bench-Gen이라는 두 가지 벤치마크를 구축했습니다. 우리의 평가는 최첨단 비디오 생성기가 심각한 PhyFPS 오정렬과 시간적 불안정성을 겪고 있다는 가혹한 현실을 보여줍니다. 마지막으로, PhyFPS 보정을 적용하면 AI 생성 비디오의 인간 인지 자연스러움이 크게 개선됨을 입증합니다. 우리의 프로젝트 페이지는 https://xiangbogaobarry.github.io/Visual_Chronometer/ 입니다.

English

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.

움직임의 맥박: 시각적 역학으로부터 물리적 프레임 속도 측정하기

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

초록

Support