LLM-anonimisering tegen agentische heridentificatie

Samenvatting

Agentische LLM's met webzoekopdrachten veranderen het bedreigingsmodel voor tekstanonimisering: zwakke contextuele aanwijzingen kunnen kruisverwijsbaar bewijs voor heridentificatie worden, terwijl diezelfde details ook stroomafwaartse analytische waarde van de tekst behouden. Bestaande verdedigingsmechanismen verwijderen ofwel expliciete identificatoren, verstoren tekst voor formele privacy, of testen herschreven tekst tegen niet-webgebaseerde inferentiemodellen, waardoor het operationele gebied tussen weerstand tegen agentische webzoekopdracht-heridentificatie en utiliteitsbehoud onderbelicht blijft. We introduceren AURA (Anonimisering met UtiliteitsbehoudAanpassing), een door LLM aangedreven masker-reconstructieframework dat privacy-lokalisatie ontkoppelt van utiliteitsbehoudende reconstructie en kandidaten selecteert op basis van adversariële privacy- en utiliteitsbehoudcontroles. We evalueren AURA op transcripten van interviews met echte gebruikers, met behulp van heridentificatieaanvallen uitgevoerd door webzoekopdrachtagenten, samen met een utiliteitsevaluatie op basis van feiten over geïnterviewde profielen, codeboekfeiten en het gezamenlijke contextuele utiliteitsraster. Onze resultaten tonen aan dat AURA de privacy-utiliteitsgrens verbetert door adaptief privacybereik te gebruiken om de weerstand tegen agentische heridentificatie te versterken en door een masker-reconstructie-anonimiseringsmethode te gebruiken om contextuele utiliteit beter te behouden onder een vast privacybereik.

English

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.