LLM-DetectAIve: 機械生成テキスト検出のための細粒度ツール

要旨

大規模言語モデル（LLM）が一般に広く利用可能になったことで、機械生成テキスト（MGT）の普及が大幅に加速しています。プロンプト操作の進歩により、テキストの出所（人間が作成したものか機械生成されたものか）を判別することがますます困難になっています。これにより、特に教育や学術分野におけるMGTの悪用に対する懸念が高まっています。本論文では、細粒度のMGT検出を目的としたシステム「LLM-DetectAIve」を提案します。このシステムは、テキストを4つのカテゴリに分類することができます：人間が書いたもの、機械生成されたもの、機械が書いて人間が修正したもの、人間が書いて機械が磨き上げたものです。従来のMGT検出器が二値分類を行っていたのに対し、LLM-DetectAIveでは2つの追加カテゴリを導入することで、テキスト作成過程におけるLLMの介入の度合いを詳細に把握することが可能です。これは、教育などの分野で有用かもしれません。教育分野では、通常、LLMの介入は禁止されています。実験結果から、LLM-DetectAIveがテキストコンテンツの著者を効果的に識別できることが示されており、教育、学術、その他の分野における信頼性向上に役立つことが証明されています。LLM-DetectAIveは、https://huggingface.co/spaces/raj-tomar001/MGT-New で公開されています。また、システムの説明動画は https://youtu.be/E8eT_bE7k8c で視聴可能です。

English

The widespread accessibility of large language models (LLMs) to the general public has significantly amplified the dissemination of machine-generated texts (MGTs). Advancements in prompt manipulation have exacerbated the difficulty in discerning the origin of a text (human-authored vs machinegenerated). This raises concerns regarding the potential misuse of MGTs, particularly within educational and academic domains. In this paper, we present LLM-DetectAIve -- a system designed for fine-grained MGT detection. It is able to classify texts into four categories: human-written, machine-generated, machine-written machine-humanized, and human-written machine-polished. Contrary to previous MGT detectors that perform binary classification, introducing two additional categories in LLM-DetectiAIve offers insights into the varying degrees of LLM intervention during the text creation. This might be useful in some domains like education, where any LLM intervention is usually prohibited. Experiments show that LLM-DetectAIve can effectively identify the authorship of textual content, proving its usefulness in enhancing integrity in education, academia, and other domains. LLM-DetectAIve is publicly accessible at https://huggingface.co/spaces/raj-tomar001/MGT-New. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.

LLM-DetectAIve: 機械生成テキスト検出のための細粒度ツール

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

要旨

Support