機器文本檢測器即為成員推論攻擊
Machine Text Detectors are Membership Inference Attacks
October 22, 2025
作者: Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
cs.AI
摘要
儘管成員推斷攻擊(MIAs)與機器生成文本檢測針對不同的目標——識別訓練樣本與合成文本,但它們的方法往往基於語言模型的概率分佈,利用相似的訊號。儘管存在這種共同的方法論基礎,這兩項任務卻被獨立研究,這可能導致結論忽略了在另一任務中開發的更強方法和有價值的見解。在本研究中,我們從理論和實證兩方面探討了MIAs與機器文本檢測之間的可遷移性,即原本為一項任務開發的方法在另一任務上的表現如何。作為我們的理論貢獻,我們證明了在兩項任務上達到漸近最高性能的指標是相同的。我們在這一最優指標的背景下統一了大量現有文獻,並假設給定方法逼近這一指標的準確度與其可遷移性直接相關。我們的大規模實證實驗,涵蓋了7種最先進的MIA方法和5種最先進的機器文本檢測器,跨越13個領域和10種生成器,展示了跨任務性能中非常強的秩相關性(rho > 0.6)。值得注意的是,我們發現最初為機器文本檢測設計的Binoculars,在MIA基準測試中也達到了最先進的性能,這證明了可遷移性的實際影響。我們的研究結果強調了兩個研究社群之間需要更多的跨任務意識與合作。為了促進跨任務發展和公平評估,我們引入了MINT,一個用於MIAs和機器生成文本檢測的統一評估套件,其中實現了來自兩項任務的15種最新方法。
English
Although membership inference attacks (MIAs) and machine-generated text
detection target different goals, identifying training samples and synthetic
texts, their methods often exploit similar signals based on a language model's
probability distribution. Despite this shared methodological foundation, the
two tasks have been independently studied, which may lead to conclusions that
overlook stronger methods and valuable insights developed in the other task. In
this work, we theoretically and empirically investigate the transferability,
i.e., how well a method originally developed for one task performs on the
other, between MIAs and machine text detection. For our theoretical
contribution, we prove that the metric that achieves the asymptotically highest
performance on both tasks is the same. We unify a large proportion of the
existing literature in the context of this optimal metric and hypothesize that
the accuracy with which a given method approximates this metric is directly
correlated with its transferability. Our large-scale empirical experiments,
including 7 state-of-the-art MIA methods and 5 state-of-the-art machine text
detectors across 13 domains and 10 generators, demonstrate very strong rank
correlation (rho > 0.6) in cross-task performance. We notably find that
Binoculars, originally designed for machine text detection, achieves
state-of-the-art performance on MIA benchmarks as well, demonstrating the
practical impact of the transferability. Our findings highlight the need for
greater cross-task awareness and collaboration between the two research
communities. To facilitate cross-task developments and fair evaluations, we
introduce MINT, a unified evaluation suite for MIAs and machine-generated text
detection, with implementation of 15 recent methods from both tasks.