ChatPaper.aiChatPaper

大型語言模型的偵測規避技術

Detection Avoidance Techniques for Large Language Models

March 10, 2025
作者: Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek
cs.AI

摘要

大型語言模型的日益普及不僅帶來了廣泛應用,也伴隨著各種風險,包括系統性散播假新聞的可能性。因此,開發如DetectGPT等分類系統變得至關重要。這些檢測器容易受到迴避技術的影響,正如一系列實驗所展示的:系統性調整生成模型的溫度參數,證明了淺層學習檢測器是最不可靠的。通過強化學習微調生成模型,成功繞過了基於BERT的檢測器。最後,重述文本導致了如DetectGPT等零樣本檢測器的迴避率超過90%,儘管文本與原文保持高度相似。與現有工作的比較顯示了所提出方法的更佳性能。文中還討論了對社會的潛在影響及未來研究方向。
English
The increasing popularity of large language models has not only led to widespread use but has also brought various risks, including the potential for systematically spreading fake news. Consequently, the development of classification systems such as DetectGPT has become vital. These detectors are vulnerable to evasion techniques, as demonstrated in an experimental series: Systematic changes of the generative models' temperature proofed shallow learning-detectors to be the least reliable. Fine-tuning the generative model via reinforcement learning circumvented BERT-based-detectors. Finally, rephrasing led to a >90\% evasion of zero-shot-detectors like DetectGPT, although texts stayed highly similar to the original. A comparison with existing work highlights the better performance of the presented methods. Possible implications for society and further research are discussed.

Summary

AI-Generated Summary

PDF41March 11, 2025