ChatPaper.aiChatPaper

基础模型在AI检测器看来如同人类

Base Models Look Human To AI Detectors

May 19, 2026
作者: Yixuan Even Xu, Ziqian Zhong, Aditi Raghunathan, Fei Fang, J. Zico Kolter
cs.AI

摘要

随着AI生成文本大规模进入现实世界,各类机构,尤其是在教育和学术诚信工作流程中,越来越多地使用商用AI文本检测器。我们报告了一个关于此类系统令人惊讶的实证发现:当由GPTZero和Pangram评估时,基础模型生成的文本通常被判定为极似人类,而它们经过指令微调的版本生成的文本则不然。基于这一观察,我们提出了基于迭代释义的人类化方法(HIP),这是一种与检测器无关的流程,它对基础模型进行最小程度的微调以成为释义器,并迭代应用。与我们测试的基线方法相比,HIP在商用检测器上实现了语义保留与规避检测之间更强的权衡。在Llama-3和Qwen-3系列中,跨越0.6B到70B的模型规模,HIP持续提升了检测器评估的类人程度。我们的发现表明,当前检测器更多追踪指令微调和局部上下文的痕迹,而非机器生成文本的任何不变概念。这进而要求检测器设计能更显式地建模这些因素。
English
As AI-generated text enters the real-world at scale, institutions increasingly use commercial AI-text detectors, especially in education and academic-integrity workflows. We report a surprising empirical finding about such systems: when evaluated by GPTZero and Pangram, generated text from base models is often judged overwhelmingly human, whereas text generated by their instruction-tuned counterparts is not. Building on this observation, we propose Humanization by Iterative Paraphrasing (HIP), a detector-agnostic pipeline that minimally fine-tunes a base model into a paraphraser and applies it iteratively. Compared with the baselines we test, HIP yields a stronger trade-off between semantic preservation and detector evasion on commercial detectors. Across Llama-3 and Qwen-3 families, spanning model sizes from 0.6B to 70B, HIP consistently improves detector human-likeness. Our findings suggest that current detectors are tracking artifacts of instruction tuning and local context more than any invariant notion of machine-generated text. This, in turn, calls for detector designs that model these factors more explicitly.