通过语义进展函数的视频分析与生成
Video Analysis and Generation via a Semantic Progress Function
April 24, 2026
作者: Gal Metzer, Sagi Polaczek, Ali Mahdavi-Amiri, Raja Giryes, Daniel Cohen-Or
cs.AI
摘要
图像与视频生成模型所产生的变换常以高度非线性方式演进:内容几乎无变化的长段平稳期后,常伴随着突然的语义跃迁。为分析并修正这一现象,我们引入语义进度函数——一种能捕捉给定序列中语义随时间演变的一维表征。针对每一帧,我们计算语义嵌入间的距离,并拟合出反映序列累积语义变化的平滑曲线。该曲线与直线间的偏离揭示了语义节奏的不均匀性。基于此发现,我们提出语义线性化方法,通过对序列进行重参数化(或重定时),使语义变化以恒定速率展开,从而产生更平滑连贯的过渡效果。除线性化外,本框架还提供了模型无关的基础能力,可用于识别时序异常、比较不同生成器的语义节奏,并将生成视频及真实视频序列引导至任意目标节奏。
English
Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or retimes) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.