DeepSpeak Dataset v1.0

Abstract

Descriviamo un dataset su larga scala—{\em DeepSpeak}—composto da video reali e deepfake di persone che parlano e gesticolano davanti alle proprie webcam. I video reali in questa prima versione del dataset consistono in 9 ore di filmati provenienti da 220 individui diversi. I video falsi, che costituiscono più di 25 ore di filmati, includono una gamma di diverse tecnologie all'avanguardia per il face-swap e il lip-sync deepfake, con voci naturali e generate dall'IA. Prevediamo di rilasciare versioni future di questo dataset con tecnologie deepfake diverse e aggiornate. Questo dataset è reso disponibile gratuitamente per scopi di ricerca e usi non commerciali; le richieste per usi commerciali saranno valutate.

English

We describe a large-scale dataset--{\em DeepSpeak}--of real and deepfake footage of people talking and gesturing in front of their webcams. The real videos in this first version of the dataset consist of 9 hours of footage from 220 diverse individuals. Constituting more than 25 hours of footage, the fake videos consist of a range of different state-of-the-art face-swap and lip-sync deepfakes with natural and AI-generated voices. We expect to release future versions of this dataset with different and updated deepfake technologies. This dataset is made freely available for research and non-commercial uses; requests for commercial use will be considered.