Vers une compréhension universelle des vidéos de football

papers.abstract

En tant que sport mondialement célébré, le football a suscité un vif intérêt de la part des fans du monde entier. Cet article vise à développer un cadre multimodal complet pour la compréhension des vidéos de football. Plus précisément, nous apportons les contributions suivantes dans cet article : (i) nous introduisons SoccerReplay-1988, le plus grand ensemble de données multimodal sur le football à ce jour, comprenant des vidéos et des annotations détaillées de 1 988 matchs complets, avec un pipeline d'annotation automatisé ; (ii) nous présentons le premier modèle fondamental visuel-langage dans le domaine du football, MatchVision, qui exploite les informations spatiotemporelles à travers les vidéos de football et excelle dans diverses tâches ultérieures ; (iii) nous menons des expériences approfondies et des études d'ablation sur la classification des événements, la génération de commentaires et la reconnaissance des fautes en multi-vues. MatchVision démontre des performances de pointe sur tous ces aspects, surpassant largement les modèles existants, ce qui souligne la supériorité de nos données et de notre modèle proposés. Nous pensons que ce travail offrira un paradigme standard pour la recherche en compréhension des sports.

English

As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present the first visual-language foundation model in the soccer domain, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on event classification, commentary generation, and multi-view foul recognition. MatchVision demonstrates state-of-the-art performance on all of them, substantially outperforming existing models, which highlights the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research.

Vers une compréhension universelle des vidéos de football

Towards Universal Soccer Video Understanding

papers.abstract

Support