生成式人工智能时代下可解释解耦表征学习在泛化作者归属中的应用研究【内容提要】随着生成式AI技术的飞速发展，文本创作范式正在发生深刻变革。本文提出一种基于可解释解耦表征学习的创新方法，旨在解决生成式AI环境下作者身份识别的泛化性挑战。通过将文本特征分解为风格标识成分与语义内容成分，我们的模型不仅能准确追溯人类作者和AI生成文本的创作源头，更具备对未知创作风格的泛化识别能力。实验表明，该方法在跨领域文本数据集上显著优于传统作者归属模型，其解耦后的风格表征为算法决策提供了透明化的解释路径，为数字取证和知识产权保护提供了新的技术支撑。

摘要

学习鲁棒的作者风格表征对于作者归属识别和AI生成文本检测至关重要。然而现有方法常受内容-风格纠缠问题困扰，即模型会学习作者写作风格与主题之间的伪相关性，导致跨领域泛化能力不足。为解决这一挑战，我们提出可解释作者身份变分自编码器（EAVAE），该创新框架通过架构层面的分离设计实现风格与内容的显式解耦。EAVAE首先基于多样化作者数据通过监督对比学习预训练风格编码器，随后采用变分自编码器架构，分别使用独立的编码器提取风格与内容表征。我们引入新型判别器强化解耦效果：该判别器不仅能区分风格/内容表征对是否属于相同/不同的作者/内容源，还能生成自然语言解释其判断依据，在消除混杂信息的同时增强模型可解释性。大量实验证明了EAVAE的有效性。在作者归属识别任务中，我们在Amazon Reviews、PAN21和HRS等多个数据集上达到最先进性能；在AI生成文本检测方面，EAVAE在M4数据集上展现出卓越的小样本学习能力。代码与数据仓库已开源：https://github.com/hieum98/avae https://huggingface.co/collections/Hieuman/document-level-authorship-datasets。

English

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available onlinehttps://github.com/hieum98/avae https://huggingface.co/collections/Hieuman/document-level-authorship-datasets.