双眼鏡でLLMを見つける：機械生成テキストのゼロショット検出

要旨

現代の大規模言語モデルによって生成されたテキストを検出することは困難であると考えられており、それはLLMと人間の両方が多様で複雑な振る舞いを示すためです。しかし、我々は、密接に関連する2つの言語モデルを対比させることに基づくスコアが、人間が生成したテキストと機械が生成したテキストを高い精度で区別できることを発見しました。このメカニズムに基づいて、我々は、事前に訓練された一対のLLMを使用するだけで簡単な計算を行う新しいLLM検出器を提案します。この手法は「Binoculars」と呼ばれ、トレーニングデータを一切必要とせずに最先端の精度を達成します。Binocularsは、モデル固有の修正を加えることなく、さまざまな現代のLLMから生成されたテキストを検出することが可能です。我々は、Binocularsを多数のテキストソースと多様な状況で包括的に評価しました。広範なドキュメントタイプにわたって、BinocularsはChatGPT（および他のLLM）から生成されたサンプルの90%以上を、0.01%の偽陽性率で検出します。これは、ChatGPTのデータでトレーニングされていないにもかかわらず達成された成果です。

English

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

双眼鏡でLLMを見つける：機械生成テキストのゼロショット検出

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

要旨

Support