探索不安全视频生成的理解
Towards Understanding Unsafe Video Generation
July 17, 2024
作者: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang
cs.AI
摘要
视频生成模型(VGMs)已经展示了合成高质量输出的能力。重要的是要了解它们产生不安全内容的潜力,比如暴力或恐怖视频。在这项工作中,我们提供了对不安全视频生成的全面理解。
首先,为了确认这些模型确实能够生成不安全视频的可能性,我们选择了从4chan和Lexica收集的不安全内容生成提示,以及三个开源SOTA VGMs来生成不安全视频。在过滤重复内容和生成质量较差的视频后,我们从原始视频池中创建了一个初始集合,包含2112个不安全视频,而原始视频池共有5607个视频。通过对这些生成视频进行聚类和主题编码分析,我们确定了5种不安全视频类别:扭曲/怪异、恐怖、色情、暴力/血腥和政治。在获得IRB批准后,我们招募了在线参与者来帮助标记生成的视频。根据403名参与者提交的注释,我们从初始视频集中确定了937个不安全视频。根据标记信息和相应的提示,我们创建了由VGMs生成的第一个不安全视频数据集。
然后,我们研究了防止生成不安全视频的可能防御机制。现有的图像生成防御方法主要集中在过滤输入提示或输出结果上。我们提出了一种新方法,称为潜变量防御(LVD),它在模型的内部采样过程中起作用。LVD在采样大量不安全提示时,可以实现0.90的防御准确率,同时减少时间和计算资源的使用量10倍。
English
Video generation models (VGMs) have demonstrated the capability to synthesize
high-quality output. It is important to understand their potential to produce
unsafe content, such as violent or terrifying videos. In this work, we provide
a comprehensive understanding of unsafe video generation.
First, to confirm the possibility that these models could indeed generate
unsafe videos, we choose unsafe content generation prompts collected from 4chan
and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After
filtering out duplicates and poorly generated content, we created an initial
set of 2112 unsafe videos from an original pool of 5607 videos. Through
clustering and thematic coding analysis of these generated videos, we identify
5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic,
Violent/Bloody, and Political. With IRB approval, we then recruit online
participants to help label the generated videos. Based on the annotations
submitted by 403 participants, we identified 937 unsafe videos from the initial
video set. With the labeled information and the corresponding prompts, we
created the first dataset of unsafe videos generated by VGMs.
We then study possible defense mechanisms to prevent the generation of unsafe
videos. Existing defense methods in image generation focus on filtering either
input prompt or output results. We propose a new approach called Latent
Variable Defense (LVD), which works within the model's internal sampling
process. LVD can achieve 0.90 defense accuracy while reducing time and
computing resources by 10x when sampling a large number of unsafe prompts.Summary
AI-Generated Summary