自适应文本匿名化：通过提示优化学习隐私与效用的权衡

摘要

文本文件匿名化是一个高度依赖上下文的问题：隐私保护与数据效用之间的平衡点会随数据领域、隐私目标和下游应用场景的变化而动态调整。然而，现有匿名化方法依赖静态的人工设计策略，缺乏适应多样化需求的灵活性，且往往难以跨领域泛化。我们提出自适应文本匿名化这一新任务范式，通过自动调整匿名化策略来满足特定隐私-效用需求。我们开发了一种面向任务的提示优化框架，能够自动构建适用于语言模型的匿名化指令，从而适配不同的隐私目标、领域场景和下游使用模式。为评估该方法，我们构建了涵盖五个数据集的基准测试平台，这些数据集具有多样化的领域特征、隐私约束和效用目标。在所有测试场景中，我们的框架始终比现有基线方法获得更优的隐私-效用平衡，同时保持计算高效性，在开源语言模型上表现优异，其性能可与规模更大的闭源模型相媲美。此外，我们还证明该方法能发掘出沿隐私-效用边界探索不同平衡点的创新匿名化策略。

English

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.

自适应文本匿名化：通过提示优化学习隐私与效用的权衡

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

摘要

Support