对话脱轨预警:GitHub平台交流失控预测模型
Toxicity Ahead: Forecasting Conversational Derailment on GitHub
December 17, 2025
作者: Mia Mohammad Imran, Robert Zita, Rahat Rizvi Rahman, Preetha Chatterjee, Kostadin Damevski
cs.AI
摘要
開源軟體(OSS)社區中的惡性互動會降低貢獻者參與度並威脅項目可持續性。要預防此類毒性對話的產生,需清晰掌握有害對話的演變規律。然而現有主動審核策略多依賴人工操作,耗費社區維護者大量時間精力。為支持更可擴展的解決方案,我們從GitHub討論區構建了包含159個脫軌毒性對話線程和207個非毒性線程的數據集。分析表明,緊張觸發點、情感轉變及特定對話模式可預測毒性演變。
我們提出基於大語言模型(LLM)的新型框架,通過兩階段提示流程預測GitHub對話脫軌:首先採用由簡至繁(LtM)提示法生成對話動態摘要(SCD),再利用這些摘要評估脫軌概率。在Qwen和Llama模型上的實驗顯示,LtM策略在決策閾值0.3時分別達到0.901和0.852的F1分數,優於現有自然語言處理基線模型。在包含308個GitHub議題線程(65個毒性、243個非毒性)的外部驗證數據集上實現最高0.797的F1分數。研究證明了結構化LLM提示能有效實現OSS對話脫軌的早期檢測,為可解釋的主動審核提供技術路徑。
English
Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns.
We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate Summaries of Conversation Dynamics (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the likelihood of derailment. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.