エージェント的再識別に対するLLM匿名化

要旨

エージェンティックLLMにウェブ検索機能が加わることで、テキスト匿名化の脅威モデルは変化する。すなわち、弱い文脈的手がかりが再識別のための相互参照可能な証拠となり得る一方で、それら同一の詳細はテキストの下流における分析的価値も担っている。既存の防御手法は、明示的識別子の除去、形式プライバシーのためのテキスト摂動、あるいは非ウェブ推論モデルに対する書き換えテキストのテストのいずれかに留まっており、エージェンティックなウェブ検索による再識別への耐性と有用性保持との間の動作領域は未探索のままである。本稿では、プライバシーの局所化を有用性保持再構成から分離し、敵対的プライバシーと有用性保持のチェックにより候補を選択する、LLMを活用したマスク再構成フレームワークであるAURA（Anonymization with Utility-Retention Adaptation）を導入する。我々は、実在ユーザのインタビュー書き起こしを対象に、ウェブ検索エージェントによる再識別攻撃を用いた評価とともに、被験者プロファイルの事実、コードブックの事実、および結合文脈的有用性グリッドに基づく有用性評価を実施した。結果は、AURAが適応的プライバシースコープを用いてエージェンティックな再識別への耐性を強化し、固定プライバシースコープ下でマスク再構成匿名化手法により文脈的有用性をより良好に保持することで、プライバシーと有用性のフロンティアを改善することを示している。

English

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.