ChatPaper.aiChatPaper

展示而非告知:可解释的AI生成文本检测

Show, Don't TELL: Explainable AI-Generated Text Detection

May 27, 2026
作者: Aldan Creo, Suraj Ranganath
cs.AI

摘要

关于AI生成文本检测的研究已提出了多种区分人类与AI文本的方法,其中一些在分布内测试中表现优异。然而,由于检测结果与用户(如教授)需求脱节——他们仅获得无解释的数值评分——实际应用进展缓慢。针对这一问题,我们提出了一种新型架构TELL,从底层构建可解释性。尽管为便于比较,我们的系统仍像其他检测器一样提供数值评分,但TELL采用根本不同的方法:旨在向用户展示模型认为文本由AI或人类撰写的“线索”,使用户能结合写作背景及所谓作者的情境,自主判断文本来源。我们在定制化SFT数据集(包含领域特定的作者标注)上训练TELL,并进一步采用课程学习的GRPO方法优化系统,提升性能。在实现与最先进检测器相当的性能(AUROC 0.927)的同时,TELL原生提供解释检测决策依据的标注。我们进一步利用人工标注数据集评估解释质量,报告了在标注具体性、可证伪性、连贯性、合理性和依据性方面的高胜率(平均72.3%),使用户能够批判性思考并自主决策。因此,我们的工作从人本视角重新定义了AI生成文本检测问题,为聚焦原生可解释性的新一类检测器铺平了道路。
English
Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.