ChatPaper.aiChatPaper

理解與緩解圖像-文本預訓練數據集中的毒性問題:以LLaVA為例

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

May 9, 2025
作者: Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam
cs.AI

摘要

預訓練數據集是多模態模型發展的基石,然而這些數據集往往源自網絡規模的語料庫,存在固有的偏見和有害內容。本文探討了LLaVA圖像-文本預訓練數據集中有害內容的普遍性,分析了不同模態下有害內容的表現形式。我們對常見的有害內容類別進行了全面分析,並提出了針對性的緩解策略,從而創建了一個經過精煉的去有害化數據集。該數據集移除了LLaVA預訓練數據集中的7,531對有害圖像-文本對。我們提供了實施穩健有害內容檢測管道的指導原則。研究結果強調了積極識別和過濾有害內容(如仇恨言論、露骨圖像和針對性騷擾)的必要性,以構建更負責任和公平的多模態系統。去有害化數據集已開源,可供進一步研究使用。
English
Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The toxicity-mitigated dataset is open source and is available for further research.

Summary

AI-Generated Summary

PDF11May 15, 2025