机器文本检测器即成员推断攻击
Machine Text Detectors are Membership Inference Attacks
October 22, 2025
作者: Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
cs.AI
摘要
尽管成员推断攻击(MIAs)与机器生成文本检测针对不同目标——识别训练样本和合成文本,但它们的常用方法往往基于语言模型的概率分布,利用相似的信号。尽管存在这一共同的方法论基础,这两项任务却一直独立研究,可能导致结论忽视了另一任务中开发的更强方法和宝贵见解。在本研究中,我们从理论和实证角度探讨了MIAs与机器文本检测之间的可迁移性,即一项任务开发的方法在另一任务上的表现如何。作为理论贡献,我们证明了在这两项任务上达到渐近最高性能的度量标准是相同的。我们在此最优度量标准的框架下统一了大量现有文献,并假设一个方法近似该度量标准的准确度与其可迁移性直接相关。我们的大规模实证实验,涵盖了7种最先进的MIA方法和5种最先进的机器文本检测器,跨越13个领域和10种生成器,显示出跨任务性能中非常强的秩相关性(rho > 0.6)。特别值得注意的是,最初为机器文本检测设计的Binoculars,在MIA基准测试中也达到了最先进的性能,展示了可迁移性的实际影响。我们的发现强调了两大研究社区之间需要更强的跨任务意识与合作。为了促进跨任务发展和公平评估,我们引入了MINT,一个统一的评估套件,用于MIAs和机器生成文本检测,其中实现了来自两项任务的15种最新方法。
English
Although membership inference attacks (MIAs) and machine-generated text
detection target different goals, identifying training samples and synthetic
texts, their methods often exploit similar signals based on a language model's
probability distribution. Despite this shared methodological foundation, the
two tasks have been independently studied, which may lead to conclusions that
overlook stronger methods and valuable insights developed in the other task. In
this work, we theoretically and empirically investigate the transferability,
i.e., how well a method originally developed for one task performs on the
other, between MIAs and machine text detection. For our theoretical
contribution, we prove that the metric that achieves the asymptotically highest
performance on both tasks is the same. We unify a large proportion of the
existing literature in the context of this optimal metric and hypothesize that
the accuracy with which a given method approximates this metric is directly
correlated with its transferability. Our large-scale empirical experiments,
including 7 state-of-the-art MIA methods and 5 state-of-the-art machine text
detectors across 13 domains and 10 generators, demonstrate very strong rank
correlation (rho > 0.6) in cross-task performance. We notably find that
Binoculars, originally designed for machine text detection, achieves
state-of-the-art performance on MIA benchmarks as well, demonstrating the
practical impact of the transferability. Our findings highlight the need for
greater cross-task awareness and collaboration between the two research
communities. To facilitate cross-task developments and fair evaluations, we
introduce MINT, a unified evaluation suite for MIAs and machine-generated text
detection, with implementation of 15 recent methods from both tasks.