AIディテクターは十分に優れていますか？機械生成テキストのデータセットの品質に関する調査

要旨

自己回帰型の大規模言語モデル（LLM）の急速な発展により、生成されたテキストの品質が大幅に向上し、信頼性のある機械生成テキスト検出器が必要とされるようになりました。多数の検出器とAI断片を含むコレクションが登場し、いくつかの検出方法は、そのようなコレクションにおける目標メトリクスに基づく認識品質が99.9％に達することさえ示しています。しかし、このような検出器の品質は実際の状況下で急激に低下する傾向があり、検出器は実際に高い信頼性があるのか、それとも高いベンチマークスコアは評価データセットの品質の低さから来ているのかという疑問が生じています。本論文では、将来のモデルのバイアスや一般化能力の低さに対処するために、生成されたデータを評価するための堅牢で質の高い方法の必要性を強調します。AI生成コンテンツ検出に特化した競技大会のデータセットについての体系的なレビューを行い、AI生成フラグメントを含むデータセットの品質を評価する方法を提案します。さらに、高品質な生成データを使用して、検出モデルのトレーニングやトレーニングデータセット自体の改善という2つの目標を達成する可能性について議論します。私たちの貢献は、人間と機械テキストの間のダイナミクスをより良く理解し、ますます自動化される世界における情報の完全性を支援することを目指しています。

English

The rapid development of autoregressive Large Language Models (LLMs) has significantly improved the quality of generated texts, necessitating reliable machine-generated text detectors. A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9% according to the target metrics in such collections. However, the quality of such detectors tends to drop dramatically in the wild, posing a question: Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets? In this paper, we emphasise the need for robust and qualitative methods for evaluating generated data to be secure against bias and low generalising ability of future model. We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments. In addition, we discuss the possibility of using high-quality generated data to achieve two goals: improving the training of detection models and improving the training datasets themselves. Our contribution aims to facilitate a better understanding of the dynamics between human and machine text, which will ultimately support the integrity of information in an increasingly automated world.

AIディテクターは十分に優れていますか？機械生成テキストのデータセットの品質に関する調査

Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts

要旨

Support