生成式AI时代下可解释解耦表征学习在泛化作者归属中的应用研究（注：此处采用学术论文标题的典型译法，其中： 1. "Explainable Disentangled Representation Learning"译为"可解释解耦表征学习"，符合机器学习领域术语规范 2. "Generalizable Authorship Attribution"译为"泛化作者归属"，强调模型的跨文本泛化能力 3. "in the Era of Generative AI"译为"生成式AI时代下"，体现时代背景 4. 补充"应用研究"以符合中文论文标题习惯，使语义更完整）

摘要

學習作者風格的魯棒表徵對於作者歸屬和AI生成文本檢測至關重要。然而現有方法常受內容-風格糾纏問題困擾，模型會學習作者寫作風格與主題之間的虛假相關性，導致跨領域泛化能力不足。為解決這一挑戰，我們提出可解釋作者身份變分自編碼器（EAVAE），這是一種通過架構分離設計顯式解耦風格與內容的新框架。EAVAE首先基於多樣化作者數據進行監督對比學習來預訓練風格編碼器，隨後採用變分自編碼器（VAE）架構，分別使用獨立的編碼器處理風格與內容表徵。我們通過新型判別器強化解耦，該判別器不僅能區分風格/內容表徵對是否屬於相同/不同的作者/內容源，還能生成決策的自然語言解釋，同步抑制混淆信息並增強可解釋性。大量實驗證明了EAVAE的有效性：在作者歸屬任務中，我們在Amazon Reviews、PAN21和HRS等多個數據集上實現了最優性能；在AI生成文本檢測方面，EAVAE在M4數據集上展現出卓越的小樣本學習能力。代碼與數據倉庫已公開於https://github.com/hieum98/avae 與 https://huggingface.co/collections/Hieuman/document-level-authorship-datasets。

English

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available onlinehttps://github.com/hieum98/avae https://huggingface.co/collections/Hieuman/document-level-authorship-datasets.

Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

摘要

Support