適応的テキスト匿名化：プロンプト最適化によるプライシーと有用性のトレードオフの学習

要旨

テキスト文書の匿名化は、高度に文脈に依存する課題である。すなわち、プライバシー保護と有用性維持の適切なバランスは、データドメイン、プライバシー目標、下流の応用によって変化する。しかし、既存の匿名化手法は、手動で設計された静的な戦略に依存しており、多様な要求に適応する柔軟性に欠け、ドメインを超えた一般化に失敗することが多い。本研究では、**適応的テキスト匿名化**という新しいタスク定式化を提案する。これは、特定のプライバシーと有用性の要求に応じて匿名化戦略を自動的に適応させるものである。我々は、タスク特化型プロンプト最適化のフレームワークを提案する。このフレームワークは、言語モデル向けの匿名化指示を自動的に構築し、異なるプライバシー目標、ドメイン、下流の利用パターンへの適応を可能にする。本手法を評価するため、多様なドメイン、プライバシー制約、有用性目標を持つ5つのデータセットにまたがるベンチマークを構築した。評価した全ての設定において、本フレームワークは既存のベースラインよりも一貫して優れたプライバシーと有用性のトレードオフを達成し、計算効率が高く、オープンソースの言語モデルにおいても有効に機能し、その性能は大規模なクローズドソースモデルに匹敵する。さらに、本手法が、プライバシーと有用性のトレードオフ曲線上における新たな戦略を発見できることを示す。

English

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.

適応的テキスト匿名化：プロンプト最適化によるプライシーと有用性のトレードオフの学習

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

要旨

Support