安全でない動画生成の理解に向けて

要旨

ビデオ生成モデル（VGMs）は、高品質な出力を合成する能力を示しています。しかし、暴力的または恐怖を誘発するビデオなど、安全でないコンテンツを生成する可能性を理解することが重要です。本研究では、安全でないビデオ生成に関する包括的な理解を提供します。まず、これらのモデルが実際に安全でないビデオを生成する可能性を確認するため、4chanとLexicaから収集した安全でないコンテンツ生成プロンプトと、3つのオープンソースの最先端VGMsを選択し、安全でないビデオを生成しました。重複や生成品質の低いコンテンツを除外した後、元の5607本のビデオから2112本の安全でないビデオの初期セットを作成しました。これらの生成ビデオをクラスタリングし、テーマ別コーディング分析を行うことで、5つの安全でないビデオカテゴリを特定しました：歪んだ/奇妙な、恐怖を誘発する、ポルノグラフィック、暴力的/血まみれの、政治的です。IRBの承認を得た後、オンライン参加者を募集し、生成されたビデオにラベルを付けてもらいました。403名の参加者から提出されたアノテーションに基づき、初期ビデオセットから937本の安全でないビデオを特定しました。ラベル情報と対応するプロンプトを用いて、VGMsによって生成された安全でないビデオの最初のデータセットを作成しました。次に、安全でないビデオの生成を防ぐための可能な防御メカニズムを研究しました。画像生成における既存の防御方法は、入力プロンプトまたは出力結果のフィルタリングに焦点を当てています。私たちは、モデルの内部サンプリングプロセス内で動作する新しいアプローチであるLatent Variable Defense（LVD）を提案します。LVDは、大量の安全でないプロンプトをサンプリングする際に、防御精度0.90を達成し、時間と計算リソースを10分の1に削減することができます。

English

Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we created an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identified 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called Latent Variable Defense (LVD), which works within the model's internal sampling process. LVD can achieve 0.90 defense accuracy while reducing time and computing resources by 10x when sampling a large number of unsafe prompts.

安全でない動画生成の理解に向けて

Towards Understanding Unsafe Video Generation

要旨

Support