自适应文本匿名化：通过提示优化学习隐私与效用的权衡

摘要

文本匿名化处理是一个高度依赖上下文的问题：隐私保护与数据效用的平衡点需根据数据领域、隐私目标和下游应用动态调整。然而，现有匿名化方法依赖静态的人工设计策略，缺乏适应多样化需求的灵活性，且难以跨领域泛化。我们提出自适应文本匿名化这一新任务范式，通过自动调整匿名化策略来满足特定的隐私-效用需求。该框架采用任务导向的提示优化技术，能自动构建面向语言模型的匿名化指令，从而适配不同的隐私目标、领域场景和下游使用模式。为评估该方法，我们构建了涵盖五个数据集的基准测试，包含多样化的领域、隐私约束和效用目标。在所有测试场景下，本框架均能持续实现优于基线方法的隐私-效用平衡，同时在开源语言模型上保持计算高效性，其性能可与更大规模的闭源模型相媲美。此外，实验表明本方法能发掘新型匿名化策略，探索隐私-效用权衡边界上的不同优化点。

English

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.