有害なミームの検出：分離された理解とガイド付きCoT推論を用いて

要旨

有害なミームを検出することは、オンライン環境の健全性を維持するために不可欠である。しかし、現在のアプローチは、リソース効率、柔軟性、または説明可能性のいずれかにおいて課題を抱えており、コンテンツモデレーションシステムへの実用的な展開が制限されている。これらの課題に対処するため、我々は有害ミーム検出のための新しいフレームワークであるU-CoT+を提案する。マルチモーダルモデルのプロンプティングやファインチューニングに依存する代わりに、まず視覚的ミームを詳細を保持したテキスト記述に変換する高忠実度のミーム・トゥ・テキストパイプラインを開発する。この設計により、ミームの解釈と分類を分離し、複雑な生の視覚コンテンツに対する即時の推論を回避し、汎用の大規模言語モデル（LLMs）を用いたリソース効率の高い有害ミーム検出を可能にする。これらのテキスト記述を基に、ゼロショットCoTプロンプティングの下でモデルの推論を導くための、ターゲットを絞った解釈可能な人間が作成したガイドラインをさらに組み込む。このようにして、このフレームワークは、プラットフォーム、地域、時間を超えた異なる有害性検出基準への容易な適応を可能にし、高い柔軟性と説明可能性を提供する。7つのベンチマークデータセットを用いた広範な実験により、我々のフレームワークの有効性が検証され、小規模LLMsを用いた説明可能かつ低リソースの有害ミーム検出の可能性が強調されている。コードとデータは以下で利用可能である：https://anonymous.4open.science/r/HMC-AF2B/README.md。

English

Detecting harmful memes is essential for maintaining the integrity of online environments. However, current approaches often struggle with resource efficiency, flexibility, or explainability, limiting their practical deployment in content moderation systems. To address these challenges, we introduce U-CoT+, a novel framework for harmful meme detection. Instead of relying solely on prompting or fine-tuning multimodal models, we first develop a high-fidelity meme-to-text pipeline that converts visual memes into detail-preserving textual descriptions. This design decouples meme interpretation from meme classification, thus avoiding immediate reasoning over complex raw visual content and enabling resource-efficient harmful meme detection with general large language models (LLMs). Building on these textual descriptions, we further incorporate targeted, interpretable human-crafted guidelines to guide models' reasoning under zero-shot CoT prompting. As such, this framework allows for easy adaptation to different harmfulness detection criteria across platforms, regions, and over time, offering high flexibility and explainability. Extensive experiments on seven benchmark datasets validate the effectiveness of our framework, highlighting its potential for explainable and low-resource harmful meme detection using small-scale LLMs. Codes and data are available at: https://anonymous.4open.science/r/HMC-AF2B/README.md.

有害なミームの検出：分離された理解とガイド付きCoT推論を用いて

Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning

要旨

Support