DNA-GPT: GPT 생성 텍스트의 무감지 탐지를 위한 발산 N-그램 분석

초록

대규모 언어 모델(LLM)은 기계 생성 텍스트의 유창성과 다양성을 크게 향상시켰다. 그러나 이러한 발전은 주어진 텍스트의 출처를 탐지하는 데 있어 상당한 도전 과제를 제시하며, 탐지 방법에 대한 현재의 연구는 LLM의 급속한 진화에 뒤처져 있다. 기존의 훈련 기반 방법은 새로운 도메인에 적응하는 데 있어 유연성이 부족하며, 종종 설명력을 결여한다. 이러한 격차를 해결하기 위해, 우리는 Divergent N-Gram Analysis (DNA-GPT)라는 새로운 훈련 없는 탐지 전략을 제안한다. 주어진 텍스트를 중간에서 잘라낸 후, 앞부분만을 LLM에 입력하여 새로운 나머지 부분을 재생성한다. 블랙박스에서의 N-gram 분석 또는 화이트박스에서의 확률 발산을 통해 원본과 새로운 나머지 부분 간의 차이를 분석함으로써, 기계 생성 텍스트와 인간 작성 텍스트 간의 상당한 차이를 명확히 보여줄 수 있다. 우리는 OpenAI의 가장 최신 LLM인 text-davinci-003, GPT-3.5-turbo, GPT-4뿐만 아니라 GPT-NeoX-20B 및 LLaMa-13B와 같은 오픈소스 모델에 대해 광범위한 실험을 수행했다. 결과는 우리의 제로샷 접근법이 네 개의 영어 데이터셋과 하나의 독일어 데이터셋에서 인간과 GPT 생성 텍스트를 구별하는 데 있어 최첨단 성능을 보이며, 수백만 개의 텍스트를 훈련한 OpenAI의 자체 분류기를 능가함을 보여준다. 또한, 우리의 방법은 주장을 뒷받침하는 합리적인 설명과 증거를 제공하며, 이는 설명 가능한 탐지의 독특한 특징이다. 우리의 방법은 수정된 텍스트 공격 하에서도 견고하며, 추가적으로 모델 소싱 문제를 해결할 수 있다. 코드는 https://github.com/Xianjun-Yang/DNA-GPT에서 확인할 수 있다.

English

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

DNA-GPT: GPT 생성 텍스트의 무감지 탐지를 위한 발산 N-그램 분석

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

초록

Support