ChatPaper.aiChatPaper

基礎模型在AI檢測器眼中看似人類

Base Models Look Human To AI Detectors

May 19, 2026
作者: Yixuan Even Xu, Ziqian Zhong, Aditi Raghunathan, Fei Fang, J. Zico Kolter
cs.AI

摘要

隨著AI生成文本大規模進入現實世界,各機構——尤其在教育與學術誠信工作流程中——日益採用商用AI文本檢測器。我們報告一項關於此類系統的意外實證發現:經GPTZero與Pangram評估時,基礎模型所生成的文本往往被高度判定為人類所寫,而其經指令調校的對應模型所生成的文本則不然。基於此觀察,我們提出「迭代改寫人本化」(Humanization by Iterative Paraphrasing, HIP),這是一種無關檢測器的流程,能將基礎模型微調為改寫器,並反覆應用。與我們測試的基準方法相比,HIP在商用檢測器上取得更佳的語意保留與規避檢測之間的權衡。在Llama-3與Qwen-3系列中,涵蓋0.6B至70B的模型規模,HIP持續提升檢測器對人寫相似度的判斷。我們的發現表明,當前檢測器所追蹤的更多是指令調校與局部語境的痕跡,而非任何關於機器生成文本的不變概念。這進而呼籲檢測器的設計應更明確地對這些因素進行建模。
English
As AI-generated text enters the real-world at scale, institutions increasingly use commercial AI-text detectors, especially in education and academic-integrity workflows. We report a surprising empirical finding about such systems: when evaluated by GPTZero and Pangram, generated text from base models is often judged overwhelmingly human, whereas text generated by their instruction-tuned counterparts is not. Building on this observation, we propose Humanization by Iterative Paraphrasing (HIP), a detector-agnostic pipeline that minimally fine-tunes a base model into a paraphraser and applies it iteratively. Compared with the baselines we test, HIP yields a stronger trade-off between semantic preservation and detector evasion on commercial detectors. Across Llama-3 and Qwen-3 families, spanning model sizes from 0.6B to 70B, HIP consistently improves detector human-likeness. Our findings suggest that current detectors are tracking artifacts of instruction tuning and local context more than any invariant notion of machine-generated text. This, in turn, calls for detector designs that model these factors more explicitly.