理解與緩解圖像-文本預訓練數據集中的毒性問題:以LLaVA為例
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
May 9, 2025
作者: Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam
cs.AI
摘要
預訓練數據集是多模態模型發展的基石,然而這些數據集往往源自網絡規模的語料庫,存在固有的偏見和有害內容。本文探討了LLaVA圖像-文本預訓練數據集中有害內容的普遍性,分析了不同模態下有害內容的表現形式。我們對常見的有害內容類別進行了全面分析,並提出了針對性的緩解策略,從而創建了一個經過精煉的去有害化數據集。該數據集移除了LLaVA預訓練數據集中的7,531對有害圖像-文本對。我們提供了實施穩健有害內容檢測管道的指導原則。研究結果強調了積極識別和過濾有害內容(如仇恨言論、露骨圖像和針對性騷擾)的必要性,以構建更負責任和公平的多模態系統。去有害化數據集已開源,可供進一步研究使用。
English
Pretraining datasets are foundational to the development of multimodal
models, yet they often have inherent biases and toxic content from the
web-scale corpora they are sourced from. In this paper, we investigate the
prevalence of toxicity in LLaVA image-text pretraining dataset, examining how
harmful content manifests in different modalities. We present a comprehensive
analysis of common toxicity categories and propose targeted mitigation
strategies, resulting in the creation of a refined toxicity-mitigated dataset.
This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training
dataset. We offer guidelines for implementing robust toxicity detection
pipelines. Our findings underscore the need to actively identify and filter
toxic content - such as hate speech, explicit imagery, and targeted harassment
- to build more responsible and equitable multimodal systems. The
toxicity-mitigated dataset is open source and is available for further
research.Summary
AI-Generated Summary