對話頭部的跳切平滑化

摘要

跳切是一種突兀且有時不受歡迎的觀看體驗變化。我們提出了一個新穎的框架，用於平滑這些跳切，在說話頭部視頻的背景下。我們利用視頻中其他來源幀中主題的外觀，將其與由DensePose關鍵點和面部地標驅動的中級表示融合。為了實現運動，我們在切換周圍的結束幀之間插值關鍵點和地標。然後，我們使用圖像轉換網絡從關鍵點和來源幀中，合成像素。由於關鍵點可能包含錯誤，我們提出了一種跨模態注意機制，以選擇並為每個關鍵點從多個選項中挑選最合適的來源。通過利用這種中級表示，我們的方法可以實現比強大的視頻插值基線更強大的結果。我們在說話頭部視頻中的各種跳切上展示了我們的方法，例如刪除填充詞、停頓，甚至隨機切割。我們的實驗表明，即使在說話頭部在跳切中旋轉或急劇移動的挑戰性情況下，我們也能實現無縫過渡。

English

A jump cut offers an abrupt, sometimes unwanted change in the viewing experience. We present a novel framework for smoothing these jump cuts, in the context of talking head videos. We leverage the appearance of the subject from the other source frames in the video, fusing it with a mid-level representation driven by DensePose keypoints and face landmarks. To achieve motion, we interpolate the keypoints and landmarks between the end frames around the cut. We then use an image translation network from the keypoints and source frames, to synthesize pixels. Because keypoints can contain errors, we propose a cross-modal attention scheme to select and pick the most appropriate source amongst multiple options for each key point. By leveraging this mid-level representation, our method can achieve stronger results than a strong video interpolation baseline. We demonstrate our method on various jump cuts in the talking head videos, such as cutting filler words, pauses, and even random cuts. Our experiments show that we can achieve seamless transitions, even in the challenging cases where the talking head rotates or moves drastically in the jump cut.

對話頭部的跳切平滑化

Jump Cut Smoothing for Talking Heads

摘要

Support