딥스피크 데이터셋 v1.0

초록

우리는 대규모 데이터셋인 DeepSpeak을 설명합니다. 이 데이터셋은 웹캠 앞에서 말하고 손짓을 하는 사람들의 실제 및 딥페이크 영상을 포함합니다. 이 데이터셋의 첫 번째 버전에는 220명의 다양한 사람들로부터 9시간의 영상이 포함되어 있습니다. 25시간 이상의 영상으로 구성된 가짜 영상은 최첨단 페이스 스왑 및 입술 싱크 딥페이크로, 자연스럽고 AI 생성 음성이 함께 제공됩니다. 우리는 다양하고 최신의 딥페이크 기술을 적용한 향후 버전의 데이터셋을 공개할 예정입니다. 본 데이터셋은 연구 및 비상업적 용도로 무료로 제공되며, 상업적 이용 요청은 고려될 것입니다.

English

We describe a large-scale dataset--{\em DeepSpeak}--of real and deepfake footage of people talking and gesturing in front of their webcams. The real videos in this first version of the dataset consist of 9 hours of footage from 220 diverse individuals. Constituting more than 25 hours of footage, the fake videos consist of a range of different state-of-the-art face-swap and lip-sync deepfakes with natural and AI-generated voices. We expect to release future versions of this dataset with different and updated deepfake technologies. This dataset is made freely available for research and non-commercial uses; requests for commercial use will be considered.