检测有害模因:基于解耦理解与引导式链式推理的方法
Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning
June 10, 2025
作者: Fengjun Pan, Anh Tuan Luu, Xiaobao Wu
cs.AI
摘要
检测有害网络模因对于维护在线环境的完整性至关重要。然而,现有方法往往在资源效率、灵活性或可解释性方面存在不足,限制了其在内容审核系统中的实际部署。为解决这些挑战,我们提出了U-CoT+,一种新颖的有害模因检测框架。不同于仅依赖提示或多模态模型微调,我们首先开发了一个高保真的模因到文本转换管道,将视觉模因转化为保留细节的文本描述。这一设计将模因理解与模因分类解耦,从而避免直接对复杂的原始视觉内容进行推理,并利用通用大型语言模型(LLMs)实现资源高效的有害模因检测。基于这些文本描述,我们进一步整合了有针对性的、可解释的人工制定指南,以在零样本思维链(CoT)提示下引导模型推理。因此,该框架能够轻松适应跨平台、跨地区及随时间变化的不同有害性检测标准,提供了高度的灵活性和可解释性。在七个基准数据集上的广泛实验验证了我们框架的有效性,凸显了其在使用小规模LLMs进行可解释且低资源有害模因检测方面的潜力。代码和数据可在以下网址获取:https://anonymous.4open.science/r/HMC-AF2B/README.md。
English
Detecting harmful memes is essential for maintaining the integrity of online
environments. However, current approaches often struggle with resource
efficiency, flexibility, or explainability, limiting their practical deployment
in content moderation systems. To address these challenges, we introduce
U-CoT+, a novel framework for harmful meme detection. Instead of relying solely
on prompting or fine-tuning multimodal models, we first develop a high-fidelity
meme-to-text pipeline that converts visual memes into detail-preserving textual
descriptions. This design decouples meme interpretation from meme
classification, thus avoiding immediate reasoning over complex raw visual
content and enabling resource-efficient harmful meme detection with general
large language models (LLMs). Building on these textual descriptions, we
further incorporate targeted, interpretable human-crafted guidelines to guide
models' reasoning under zero-shot CoT prompting. As such, this framework allows
for easy adaptation to different harmfulness detection criteria across
platforms, regions, and over time, offering high flexibility and
explainability. Extensive experiments on seven benchmark datasets validate the
effectiveness of our framework, highlighting its potential for explainable and
low-resource harmful meme detection using small-scale LLMs. Codes and data are
available at: https://anonymous.4open.science/r/HMC-AF2B/README.md.