朝向理解不安全的影片生成
Towards Understanding Unsafe Video Generation
July 17, 2024
作者: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang
cs.AI
摘要
影片生成模型(VGMs)已證明具有合成高質量輸出的能力。重要的是要了解它們生成不安全內容的潛力,例如暴力或恐怖影片。在這項研究中,我們提供了對不安全影片生成的全面理解。
首先,為了確認這些模型確實能夠生成不安全影片,我們選擇了從4chan和Lexica收集的不安全內容生成提示,以及三個開源SOTA VGMs來生成不安全影片。在篩選重複內容和生成不佳的內容後,我們從原始5607部影片中創建了一組初步的2112部不安全影片。通過對這些生成的影片進行聚類和主題編碼分析,我們確定了5個不安全影片類別:扭曲/怪異、恐怖、色情、暴力/血腥和政治。在IRB批准的情況下,我們招募了在線參與者來幫助標記生成的影片。根據403名參與者提交的標註,我們從最初的影片集中識別出937部不安全影片。根據標記信息和相應的提示,我們創建了由VGMs生成的第一組不安全影片數據集。
然後,我們研究了防止生成不安全影片的可能防禦機制。現有的圖像生成防禦方法主要集中在過濾輸入提示或輸出結果。我們提出了一種新方法,稱為潛變量防禦(LVD),它在模型的內部抽樣過程中工作。LVD可以在抽樣大量不安全提示時實現0.90的防禦準確度,同時將時間和計算資源減少10倍。
English
Video generation models (VGMs) have demonstrated the capability to synthesize
high-quality output. It is important to understand their potential to produce
unsafe content, such as violent or terrifying videos. In this work, we provide
a comprehensive understanding of unsafe video generation.
First, to confirm the possibility that these models could indeed generate
unsafe videos, we choose unsafe content generation prompts collected from 4chan
and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After
filtering out duplicates and poorly generated content, we created an initial
set of 2112 unsafe videos from an original pool of 5607 videos. Through
clustering and thematic coding analysis of these generated videos, we identify
5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic,
Violent/Bloody, and Political. With IRB approval, we then recruit online
participants to help label the generated videos. Based on the annotations
submitted by 403 participants, we identified 937 unsafe videos from the initial
video set. With the labeled information and the corresponding prompts, we
created the first dataset of unsafe videos generated by VGMs.
We then study possible defense mechanisms to prevent the generation of unsafe
videos. Existing defense methods in image generation focus on filtering either
input prompt or output results. We propose a new approach called Latent
Variable Defense (LVD), which works within the model's internal sampling
process. LVD can achieve 0.90 defense accuracy while reducing time and
computing resources by 10x when sampling a large number of unsafe prompts.Summary
AI-Generated Summary