堅牢かつ細粒度なAI生成テキスト検出

要旨

機械生成コンテンツに対する理想的な検出システムは、日々進化するより高度な大規模言語モデル（LLM）に対しても、あらゆる生成器において良好に機能することが求められる。既存のシステムは、短いテキストにおけるAI生成コンテンツの正確な識別に苦戦することが多い。さらに、すべてのテキストが完全に人間またはLLMによって作成されているわけではないため、我々は部分的なケース、すなわち人間とLLMが共著したテキストに焦点を当てた。本論文では、トークン分類タスクのために構築された一連のモデルを紹介する。これらのモデルは、人間と機械が共著したテキストの大規模なコレクションで訓練され、未知のドメイン、未知の生成器、非ネイティブスピーカーによるテキスト、および敵対的入力に対するテキストにおいても良好な性能を示した。また、23言語にわたる複数の人気のあるプロプライエタリLLMが主に共著した240万以上のテキストからなる新しいデータセットを紹介する。さらに、各ドメインおよび各生成器のテキストに対するモデルの性能に関する知見を提示する。追加の知見として、各敵対的手法に対する性能の比較、入力テキストの長さ、および生成されたテキストの特性と元の人間が作成したテキストとの比較が含まれる。

English

An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.