Vibravox: 신체 전도 오디오 센서로 수집한 프랑스어 음성 데이터셋

초록

Vibravox는 일반 데이터 보호 규정(GDPR)을 준수하는 데이터셋으로, 5가지 다른 체전도 오디오 센서를 사용하여 녹음된 오디오 데이터를 포함합니다. 이 센서들은 두 개의 이어폰 마이크, 두 개의 골전도 진동 픽업, 그리고 하나의 후두 마이크로 구성되어 있습니다. 또한, 이 데이터셋은 참조용으로 사용된 공기 전달 마이크의 오디오 데이터도 포함하고 있습니다. Vibravox 코퍼스는 고차 앰비소닉스 3D 공간화 장치에 의해 부과된 다양한 음향 조건 하에서 188명의 참가자로부터 녹음된 38시간 분량의 음성 샘플과 생리적 소리를 담고 있습니다. 녹음 조건에 대한 주석과 언어학적 전사도 코퍼스에 포함되어 있습니다. 우리는 음성 인식, 음성 향상, 화자 검증 등 다양한 음성 관련 작업에 대한 일련의 실험을 수행했습니다. 이러한 실험은 최신 모델을 사용하여 Vibravox 데이터셋이 제공하는 다양한 오디오 센서로 캡처된 신호에 대한 성능을 평가하고 비교함으로써 각 센서의 개별 특성을 더 잘 이해하기 위해 진행되었습니다.

English

Vibravox is a dataset compliant with the General Data Protection Regulation (GDPR) containing audio recordings using five different body-conduction audio sensors : two in-ear microphones, two bone conduction vibration pickups and a laryngophone. The data set also includes audio data from an airborne microphone used as a reference. The Vibravox corpus contains 38 hours of speech samples and physiological sounds recorded by 188 participants under different acoustic conditions imposed by an high order ambisonics 3D spatializer. Annotations about the recording conditions and linguistic transcriptions are also included in the corpus. We conducted a series of experiments on various speech-related tasks, including speech recognition, speech enhancement and speaker verification. These experiments were carried out using state-of-the-art models to evaluate and compare their performances on signals captured by the different audio sensors offered by the Vibravox dataset, with the aim of gaining a better grasp of their individual characteristics.