VoxCeleb 화자 인식 챌린지: 회고록

초록

VoxCeleb Speaker Recognition Challenges (VoxSRC)는 2019년부터 2023년까지 매년 진행된 일련의 도전과 워크샵이었습니다. 이 도전들은 주로 다양한 설정에서 화자 인식과 다이어라이제이션 작업을 평가했는데, 이는 폐쇄 및 공개 훈련 데이터뿐만 아니라 지도, 자기 지도 및 반지도 학습을 통한 도메인 적응에 대한 것도 포함했습니다. 이 도전들은 또한 각 작업 및 설정에 대해 공개적으로 이용 가능한 훈련 및 평가 데이터셋을 제공하며, 매년 새로운 테스트 세트를 공개했습니다. 본 논문에서는 이러한 도전들을 검토하며, 탐구한 내용, 도전 참가자들이 개발한 방법 및 이러한 방법이 어떻게 발전했는지, 그리고 화자 확인 및 다이어라이제이션 분야의 현재 상태에 대해 다룹니다. 우리는 도전의 다섯 차수에 걸쳐 공통 평가 데이터셋에서의 성능 향상을 추적하고, 매 해의 특별한 초점이 참가자들의 성능에 어떻게 영향을 미쳤는지에 대해 상세한 분석을 제공합니다. 본 논문은 화자 인식 및 다이어라이제이션 분야의 개요를 원하는 연구자들과 VoxSRC 도전의 성공을 활용하고 실수를 피하려는 도전 주최자들을 대상으로 합니다. 우리는 분야의 현재 강점과 열린 도전에 대한 토론으로 마무리합니다. 프로젝트 페이지: https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/workshop.html

English

The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including: closed and open training data; as well as supervised, self-supervised, and semi-supervised training for domain adaptation. The challenges also provided publicly available training and evaluation datasets for each task and setting, with new test sets released each year. In this paper, we provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved; and also the current state of the field for speaker verification and diarisation. We chart the progress in performance over the five installments of the challenge on a common evaluation dataset and provide a detailed analysis of how each year's special focus affected participants' performance. This paper is aimed both at researchers who want an overview of the speaker recognition and diarisation field, and also at challenge organisers who want to benefit from the successes and avoid the mistakes of the VoxSRC challenges. We end with a discussion of the current strengths of the field and open challenges. Project page : https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/workshop.html