用雙筒望遠鏡發現LLMs:機器生成文本的零樣本偵測
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
January 22, 2024
作者: Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
cs.AI
摘要
檢測現代大型語言模型生成的文本被認為是困難的,因為LLM和人類都可能展現出各種複雜的行為。然而,我們發現基於對比兩個密切相關語言模型的分數在區分人類生成和機器生成文本方面非常準確。基於這種機制,我們提出了一種新穎的LLM檢測器,只需要使用一對預先訓練的LLM進行簡單的計算。這種方法名為"雙筒望遠鏡",在不使用任何訓練數據的情況下實現了最先進的準確性。它能夠在不進行任何特定於模型的修改的情況下,從各種現代LLM中發現機器文本。我們在多個文本來源和不同情況下對"雙筒望遠鏡"進行了全面評估。在各種文檔類型中,"雙筒望遠鏡"以0.01%的偽陽性率檢測到ChatGPT(以及其他LLM)生成樣本的超過90%,儘管沒有接受任何ChatGPT數據的訓練。
English
Detecting text generated by modern large language models is thought to be
hard, as both LLMs and humans can exhibit a wide range of complex behaviors.
However, we find that a score based on contrasting two closely related language
models is highly accurate at separating human-generated and machine-generated
text. Based on this mechanism, we propose a novel LLM detector that only
requires simple calculations using a pair of pre-trained LLMs. The method,
called Binoculars, achieves state-of-the-art accuracy without any training
data. It is capable of spotting machine text from a range of modern LLMs
without any model-specific modifications. We comprehensively evaluate
Binoculars on a number of text sources and in varied situations. Over a wide
range of document types, Binoculars detects over 90% of generated samples from
ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being
trained on any ChatGPT data.