DNA-GPT: GPT生成テキストのトレーニング不要検出のための発散Nグラム解析

要旨

大規模言語モデル（LLM）は、機械生成テキストの流暢さと多様性を著しく向上させました。しかし、この進歩は同時に、与えられたテキストの起源を検出する上で大きな課題を提示しており、検出手法に関する現在の研究はLLMの急速な進化に遅れを取っています。従来のトレーニングベースの手法は、特に新しいドメインに適応する際の柔軟性に限界があり、説明能力も不足していることが多いです。このギャップを埋めるため、我々は新しいトレーニング不要の検出戦略「Divergent N-Gram Analysis（DNA-GPT）」を提案します。与えられたテキストに対して、まず中間で切り取り、その前の部分のみをLLMの入力として使用し、新しい残りの部分を再生成します。ブラックボックスではN-gram分析、ホワイトボックスでは確率発散を通じて、元の残りの部分と新しい残りの部分の違いを分析することで、機械生成テキストと人間が書いたテキストの間に明確な差異を示すことができます。我々は、OpenAIの最先端のLLM（text-davinci-003、GPT-3.5-turbo、GPT-4）およびオープンソースモデル（GPT-NeoX-20B、LLaMa-13B）を用いて広範な実験を行いました。結果は、我々のゼロショットアプローチが、4つの英語データセットと1つのドイツ語データセットにおいて、人間とGPT生成テキストを区別する上で最先端の性能を示し、数百万のテキストでトレーニングされたOpenAI自身の分類器を上回ることを示しています。さらに、我々の手法は、説明可能な検出の独自の特徴として、主張を支持する合理的な説明と証拠を提供します。我々の手法は、改訂テキスト攻撃に対して頑健であり、モデルソーシングも追加的に解決できます。コードはhttps://github.com/Xianjun-Yang/DNA-GPTで公開されています。

English

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

DNA-GPT: GPT生成テキストのトレーニング不要検出のための発散Nグラム解析

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

要旨

Support