通过双筒望远镜发现LLMs：零样本检测机器生成文本

摘要

检测现代大型语言模型生成的文本被认为是困难的，因为LLMs和人类都可能表现出各种复杂行为。然而，我们发现，基于对比两个密切相关的语言模型的得分，在区分人类生成和机器生成的文本方面非常准确。基于这种机制，我们提出了一种新颖的LLM检测器，只需要使用一对预训练的LLMs进行简单计算。这种名为“双筒望远镜”的方法在没有任何训练数据的情况下实现了最先进的准确性。它能够在不进行任何特定于模型的修改的情况下，从各种现代LLMs中发现机器文本。我们对“双筒望远镜”在多个文本来源和不同情况下进行了全面评估。在各种文档类型中，“双筒望远镜”能够在误报率为0.01%的情况下，检测出ChatGPT（以及其他LLMs）生成样本中超过90%的样本，尽管没有接受任何ChatGPT数据的训练。

English

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

通过双筒望远镜发现LLMs：零样本检测机器生成文本

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

摘要

Support