神经模型比较框架下的生成式人工智能文本自动检测

摘要

大型语言模型的迅速扩散显著增加了区分人类撰写文本与AI生成文本的难度，在学术、出版及社会领域引发关键问题。本文通过设计、实现并比较评估多种基于机器学习的检测器，对AI生成文本的检测问题展开研究。我们开发并分析了四种神经架构：多层感知机、一维卷积神经网络、基于MobileNet的CNN以及Transformer模型。所提出的模型与广泛使用的在线检测工具（包括ZeroGPT、GPTZero、QuillBot、Originality.AI、Sapling、IsGen、Rephrase和Writer）进行了基准测试。实验在COLING多语言数据集上开展，涵盖英语和意大利语两种配置，同时还在一个专注于艺术与心理健康的原创主题数据集上进行测试。结果表明，在不同语言和领域间，有监督检测器比商业工具表现出更稳定、更鲁棒的检测性能，凸显了当前检测策略的核心优势与局限。

English

The rapid proliferation of Large Language Models has significantly increased the difficulty of distinguishing between human-written and AI generated texts, raising critical issues across academic, editorial, and social domains. This paper investigates the problem of AI generated text detection through the design, implementation, and comparative evaluation of multiple machine learning based detectors. Four neural architectures are developed and analyzed: a Multilayer Perceptron, a one-dimensional Convolutional Neural Network, a MobileNet-based CNN, and a Transformer model. The proposed models are benchmarked against widely used online detectors, including ZeroGPT, GPTZero, QuillBot, Originality.AI, Sapling, IsGen, Rephrase, and Writer. Experiments are conducted on the COLING Multilingual Dataset, considering both English and Italian configurations, as well as on an original thematic dataset focused on Art and Mental Health. Results show that supervised detectors achieve more stable and robust performance than commercial tools across different languages and domains, highlighting key strengths and limitations of current detection strategies.

神经模型比较框架下的生成式人工智能文本自动检测

Automatic detection of Gen-AI texts: A comparative framework of neural models

摘要

Support