预警对话脱轨:GitHub交流失控预测
Toxicity Ahead: Forecasting Conversational Derailment on GitHub
December 17, 2025
作者: Mia Mohammad Imran, Robert Zita, Rahat Rizvi Rahman, Preetha Chatterjee, Kostadin Damevski
cs.AI
摘要
开源软件社区中的有害互动会降低贡献者参与度并威胁项目可持续性。要预防此类毒性对话的发生,需清晰把握有害对话的演变规律。然而现有主动审核策略多依赖人工操作,消耗社区维护者大量时间精力。为支持更可扩展的解决方案,我们从GitHub讨论区构建了包含159个脱轨毒性线程和207个非毒性线程的数据集。分析表明,通过张力触发点、情感转向及特定对话模式可预测毒性演变。
我们提出基于大语言模型的新型框架,采用两步提示流程预测GitHub对话脱轨:首先通过"由简至繁"提示生成对话动态摘要,继而利用摘要评估脱轨可能性。在Qwen和Llama模型上的实验显示,该策略在0.3决策阈值下分别达到0.901和0.852的F1分数,优于现有自然语言处理基线方法。在包含308个GitHub议题线程(65个毒性/243个非毒性)的外部验证集上最高取得0.797的F1分数。研究表明,结构化大语言模型提示能有效实现开源社区对话脱轨的早期检测,为可解释的主动审核提供技术支持。
English
Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns.
We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate Summaries of Conversation Dynamics (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the likelihood of derailment. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.