オーディオマッチカット：映画や動画における一致するオーディオトランジションの発見と作成

要旨

「マッチカット」は、類似した構図を持つ2つのショットが滑らかに遷移する一般的な映像編集技術です。マッチカットは視覚的なものが多いですが、特定のマッチカットでは音声の滑らかな遷移が含まれ、異なるソースからの音が融合して2つのショット間の区別がつかない遷移を生み出します。本論文では、映像や映画内で「音声マッチカット」を自動的に見つけ、作成する能力を探求します。音声マッチカットのための自己教師あり音声表現を作成し、マッチングするショットを推薦し、ブレンドされた音声を作成する粗から細への音声マッチパイプラインを開発します。さらに、提案された音声マッチカットタスクのためのデータセットを注釈し、複数の音声表現が音声マッチカット候補を見つける能力を比較します。最後に、滑らかな遷移を作成することを目的として、2つのマッチングする音声候補をブレンドする複数の方法を評価します。プロジェクトページと例は以下で利用可能です：https://denfed.github.io/audiomatchcut/

English

A "match cut" is a common video editing technique where a pair of shots that have a similar composition transition fluidly from one to another. Although match cuts are often visual, certain match cuts involve the fluid transition of audio, where sounds from different sources merge into one indistinguishable transition between two shots. In this paper, we explore the ability to automatically find and create "audio match cuts" within videos and movies. We create a self-supervised audio representation for audio match cutting and develop a coarse-to-fine audio match pipeline that recommends matching shots and creates the blended audio. We further annotate a dataset for the proposed audio match cut task and compare the ability of multiple audio representations to find audio match cut candidates. Finally, we evaluate multiple methods to blend two matching audio candidates with the goal of creating a smooth transition. Project page and examples are available at: https://denfed.github.io/audiomatchcut/

オーディオマッチカット：映画や動画における一致するオーディオトランジションの発見と作成

Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos

要旨

Support