对话头部的跳帧平滑化
Jump Cut Smoothing for Talking Heads
January 9, 2024
作者: Xiaojuan Wang, Taesung Park, Yang Zhou, Eli Shechtman, Richard Zhang
cs.AI
摘要
跳切是观看体验中的突然、有时是不受欢迎的变化。我们提出了一个新颖的框架,用于在谈话视频中平滑这些跳切。我们利用视频中其他源帧中主体的外观,将其与由DensePose关键点和面部地标驱动的中级表示融合。为了实现运动,我们在切割周围的末帧之间插值关键点和地标。然后,我们使用一个图像转换网络从关键点和源帧中合成像素。由于关键点可能包含错误,我们提出了一个跨模态注意机制,以选择并为每个关键点从多个选项中挑选最合适的源。通过利用这种中级表示,我们的方法可以比强视频插值基线获得更强的结果。我们在谈话视频中的各种跳切上演示了我们的方法,如切割填充词、停顿,甚至随机切割。我们的实验表明,即使在谈话头像在跳切中旋转或移动剧烈的挑战性情况下,我们也能实现无缝过渡。
English
A jump cut offers an abrupt, sometimes unwanted change in the viewing
experience. We present a novel framework for smoothing these jump cuts, in the
context of talking head videos. We leverage the appearance of the subject from
the other source frames in the video, fusing it with a mid-level representation
driven by DensePose keypoints and face landmarks. To achieve motion, we
interpolate the keypoints and landmarks between the end frames around the cut.
We then use an image translation network from the keypoints and source frames,
to synthesize pixels. Because keypoints can contain errors, we propose a
cross-modal attention scheme to select and pick the most appropriate source
amongst multiple options for each key point. By leveraging this mid-level
representation, our method can achieve stronger results than a strong video
interpolation baseline. We demonstrate our method on various jump cuts in the
talking head videos, such as cutting filler words, pauses, and even random
cuts. Our experiments show that we can achieve seamless transitions, even in
the challenging cases where the talking head rotates or moves drastically in
the jump cut.