쌍안경으로 LLM 탐지하기: 기계 생성 텍스트의 제로샷 탐지

초록

현대의 대규모 언어 모델(LLM)이 생성한 텍스트를 탐지하는 것은 어려운 일로 여겨져 왔다. 이는 LLM과 인간 모두 다양한 복잡한 행동을 보일 수 있기 때문이다. 그러나 우리는 두 개의 밀접하게 관련된 언어 모델을 대조하여 얻은 점수가 인간이 생성한 텍스트와 기계가 생성한 텍스트를 구분하는 데 매우 정확하다는 사실을 발견했다. 이 메커니즘을 기반으로, 우리는 사전 훈련된 두 개의 LLM만을 사용하여 간단한 계산만으로도 작동하는 새로운 LLM 탐지기를 제안한다. 이 방법은 'Binoculars'라고 명명되었으며, 어떠한 훈련 데이터도 없이도 최첨단 수준의 정확도를 달성한다. Binoculars는 모델별 수정 없이도 다양한 현대 LLM에서 생성된 기계 텍스트를 탐지할 수 있다. 우리는 Binoculars를 다양한 텍스트 소스와 다양한 상황에서 포괄적으로 평가했다. 다양한 문서 유형에 걸쳐, Binoculars는 ChatGPT(및 기타 LLM) 데이터로 훈련되지 않았음에도 불구하고 0.01%의 오탐률로 ChatGPT에서 생성된 샘플의 90% 이상을 탐지한다.

English

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

쌍안경으로 LLM 탐지하기: 기계 생성 텍스트의 제로샷 탐지

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

초록

Support