TAROT: 정책 최적화를 활용한 작업 지향적 저자 은닉 기법

초록

저자 익명화는 텍스트 내 저자의 정체성을 숨기기 위해 글쓰기 스타일, 어휘, 구문 및 기타 언어적 특징을 변경하는 것을 목표로 합니다. 이러한 변경은 프라이버시와 유용성 사이의 균형을 맞추어야 합니다. 강력한 익명화 기술은 저자의 정체성을 효과적으로 숨길 수 있지만, 종종 텍스트의 품질과 의도된 목적에 대한 유용성을 저하시킵니다. 반대로, 높은 유용성을 유지하는 것은 프라이버시를 충분히 보호하지 못해 공격자가 저자를 식별하기 쉬워지는 경향이 있습니다. 따라서 이 두 상충되는 목표 사이의 최적의 균형을 달성하는 것이 중요합니다. 본 논문에서는 TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization을 제안합니다. 이는 다운스트림 유용성을 고려하여 전체 텍스트를 재생성함으로써 프라이버시-유용성 균형을 최적화하는 새로운 비지도 저자 익명화 방법입니다. 우리의 접근 방식은 작은 언어 모델에 대한 미세 조정 패러다임으로서 정책 최적화를 활용하여 저자 정체성과 다운스트림 작업 유용성을 보존하면서 텍스트를 재작성합니다. 우리의 접근 방식이 공격자의 정확도를 크게 줄이면서 유용성을 유지한다는 것을 보여줍니다. 우리는 코드와 모델을 공개적으로 제공합니다.

English

Authorship obfuscation aims to disguise the identity of an author within a text by altering the writing style, vocabulary, syntax, and other linguistic features associated with the text author. This alteration needs to balance privacy and utility. While strong obfuscation techniques can effectively hide the author's identity, they often degrade the quality and usefulness of the text for its intended purpose. Conversely, maintaining high utility tends to provide insufficient privacy, making it easier for an adversary to de-anonymize the author. Thus, achieving an optimal trade-off between these two conflicting objectives is crucial. In this paper, we propose TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization, a new unsupervised authorship obfuscation method whose goal is to optimize the privacy-utility trade-off by regenerating the entire text considering its downstream utility. Our approach leverages policy optimization as a fine-tuning paradigm over small language models in order to rewrite texts by preserving author identity and downstream task utility. We show that our approach largely reduce the accuracy of attackers while preserving utility. We make our code and models publicly available.