基於語意進程函式的影片分析與生成
Video Analysis and Generation via a Semantic Progress Function
April 24, 2026
作者: Gal Metzer, Sagi Polaczek, Ali Mahdavi-Amiri, Raja Giryes, Daniel Cohen-Or
cs.AI
摘要
圖像與影片生成模型所產生的轉換過程往往呈現高度非線性演變:內容長時間幾乎不變的平穩段後,會突然出現急遽的語義跳躍。為分析並修正此現象,我們引入語義進程函數——一種能捕捉給定序列中語義隨時間演變的一維表徵。針對每個影格,我們計算語義嵌入間的距離,並擬合出一條反映序列中累積語義變化的平滑曲線。該曲線偏離直線的程度揭示了語義節奏的不均勻性。基於此發現,我們提出語義線性化方法,通過對序列重新參數化(或重定時序),使語義變化以恆定速率展開,從而產生更平滑、連貫的過渡效果。除線性化外,我們的框架還提供了模型無關的基礎架構,可用於識別時間維度的異常、比較不同生成器的語義節奏,並將生成影片與真實世界影片序列引導至任意目標節奏。
English
Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or retimes) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.