生成AI時代における一般化可能な著作帰属のための説明可能な分離表現学習

要旨

執筆者スタイルの頑健な表現学習は、著者帰属課題やAI生成文検出において極めて重要である。しかし既存手法は、執筆者のスタイルとトピック間の擬似相関を学習するコンテンツ-スタイル混同問題に悩まされ、ドメイン横断的な汎化性能が不十分である。この課題解決に向け、我々は構造的分離設計によりスタイルとコンテンツを明示的に分離する新規フレームワーク「説明可能著者属性変分自己符号化器（EAVAE）」を提案する。EAVAEはまず多様な著者データに対し教師付き対照学習でスタイル符号化器を事前学習し、スタイルとコンテンツ表現を別々の符号化器で処理する変分自己符号化器（VAE）構造で微調整する。分離処理は、スタイル/コンテンツ表現ペアが同一/異なる著者・情報源に属するかを判別するだけでなく、判断根拠の自然言語説明を生成する新規識別器によって強化され、混同情報の軽減と解釈性向上を同時実現する。大規模実験によりEAVAEの有効性を実証：著者帰属課題ではAmazon Reviews、PAN21、HRSなど各種データセットで最高精度を達成。AI生成文検出ではM4データセットにおける少数ショット学習で優位性を示した。コードとデータリポジトリは以下で公開https://github.com/hieum98/avae https://huggingface.co/collections/Hieuman/document-level-authorship-datasets

English

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available onlinehttps://github.com/hieum98/avae https://huggingface.co/collections/Hieuman/document-level-authorship-datasets.

生成AI時代における一般化可能な著作帰属のための説明可能な分離表現学習

Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

要旨

Support