SportsSloMo: 人間中心のビデオフレーム補間のための新たなベンチマークとベースライン

要旨

人間中心のビデオフレーム補間は、人々のエンターテイメント体験を向上させ、スポーツ分析業界での商業的応用、例えばスローモーションビデオの合成において大きな可能性を秘めています。コミュニティには複数のベンチマークデータセットが存在しますが、人間中心のシナリオに特化したものはありません。このギャップを埋めるため、YouTubeからクロールした高解像度（720p以上）のスローモーションスポーツビデオからなる13万以上のビデオクリップと100万のビデオフレームを含むベンチマーク、SportsSloMoを紹介します。私たちは、このベンチマークでいくつかの最先端の手法を再学習し、その結果、他のデータセットと比較して精度が低下することを示しました。これは、私たちのベンチマークの難しさを強調し、人間の身体が非常に変形しやすく、スポーツビデオではオクルージョンが頻繁に発生するため、最高のパフォーマンスを発揮する手法にとっても大きな課題であることを示唆しています。精度を向上させるために、人間を意識した事前知識を考慮した2つの損失項を導入しました。ここでは、パノプティックセグメンテーションと人間のキーポイント検出にそれぞれ補助的な監視を追加します。これらの損失項はモデルに依存せず、どのビデオフレーム補間手法にも簡単に組み込むことができます。実験結果は、提案した損失項の有効性を検証し、5つの既存モデルに対して一貫した性能向上をもたらし、私たちのベンチマークにおいて強力なベースラインモデルを確立しました。データセットとコードは以下で見つけることができます：https://neu-vi.github.io/SportsSlomo/。

English

Human-centric video frame interpolation has great potential for improving people's entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available in the community, none of them is dedicated for human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution (geq720p) slow-motion sports videos crawled from YouTube. We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets. It highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To improve the accuracy, we introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection, respectively. The loss terms are model agnostic and can be easily plugged into any video frame interpolation approaches. Experimental results validate the effectiveness of our proposed loss terms, leading to consistent performance improvement over 5 existing models, which establish strong baseline models on our benchmark. The dataset and code can be found at: https://neu-vi.github.io/SportsSlomo/.

SportsSloMo: 人間中心のビデオフレーム補間のための新たなベンチマークとベースライン

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

要旨

Support