ChatPaper.aiChatPaper

ExaGPT:基於範例的機器生成文本檢測,提升人類可解釋性

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

February 17, 2025
作者: Ryuto Koike, Masahiro Kaneko, Ayana Niwa, Preslav Nakov, Naoaki Okazaki
cs.AI

摘要

偵測由大型語言模型(LLMs)生成的文本可能導致嚴重錯誤,例如損害學生的學術尊嚴,這源於錯誤的判斷。因此,LLM文本檢測需確保決策的可解釋性,以幫助用戶判斷其預測的可靠性。當人類驗證一段文本是人類撰寫還是LLM生成時,他們會直覺地探究該文本與哪一方共享更多相似的片段。然而,現有的可解釋檢測器並未與人類的決策過程對齊,未能提供用戶易於理解的證據。為彌合這一差距,我們引入了ExaGPT,這是一種基於人類決策過程的可解釋檢測方法,用於驗證文本的來源。ExaGPT通過檢查文本是否與數據庫中的人類撰寫文本或LLM生成文本共享更多相似片段來識別文本。此方法能為文本中的每個片段提供有助於決策的相似片段示例作為證據。我們的人類評估表明,提供相似片段示例比現有的可解釋方法更有效地幫助判斷決策的正確性。此外,在四個領域和三種生成器上的廣泛實驗顯示,ExaGPT在1%的假陽性率下,準確率大幅超越先前強大的檢測器,提升幅度高達40.9個百分點。
English
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining student's academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate with which of them it shares more similar spans. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.

Summary

AI-Generated Summary

PDF12February 19, 2025