文脈に即した対話における対話のカウンター：適応、個人化、評価の戦略

要旨

AIによる生成された対話は、公共の議論を促進する直接の返信を通じてオンラインの有害性を抑制するための有望でスケーラブルな戦略を提供します。しかしながら、現在の対話は一般的なものであり、モデレーションの文脈や関与するユーザーに適応していません。私たちは、モデレーションの文脈に適応し、モデレートされたユーザーに合わせて個別化された対話を生成するための複数の戦略を提案し評価します。LLaMA2-13Bモデルに対話を生成するよう指示し、異なる文脈情報と微調整戦略に基づいたさまざまな構成で実験します。我々は、定量的指標と事前登録された混合設計のクラウドソーシング実験を通じて収集された人間の評価を組み合わせることで、説得力のある対話を生成する構成を特定します。結果は、文脈に即した対話が適切さや説得力において、他の特性を損なうことなく、最先端の一般的な対話を大幅に上回ることを示しています。また、定量的指標と人間の評価との間には弱い相関があり、これらの方法が異なる側面を評価していることを示し、微妙な評価方法の必要性を強調しています。文脈に即したAIによる生成された対話の効果と、人間とアルゴリズムの評価の乖離は、コンテンツのモデレーションにおける人間とAIの協力の重要性を強調しています。

English

AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

文脈に即した対話における対話のカウンター：適応、個人化、評価の戦略

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

要旨

Support