当可解释性遇上隐私:自然语言处理背景下事后可解释性与差分隐私的交叉研究
When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing
August 14, 2025
作者: Mahdi Dhaini, Stephen Meisenbacher, Ege Erdogan, Florian Matthes, Gjergji Kasneci
cs.AI
摘要
在可信自然语言处理(NLP)的研究中,多个重要领域逐渐显现,其中包括可解释性与隐私保护。尽管近年来针对可解释和隐私保护的NLP研究兴趣显著增长,但两者交叉领域的研究仍显不足。这导致我们对于同时实现可解释性与隐私保护是否可行,或两者是否存在冲突的理解存在显著空白。在本研究中,我们以差分隐私(DP)和事后可解释性这两大主流方法为指导,对NLP中的隐私-可解释性权衡进行了实证探索。我们的发现揭示了隐私与可解释性之间错综复杂的关系,这种关系由多种因素构成,包括下游任务的性质、文本隐私化方法及可解释性方法的选择。在此过程中,我们强调了隐私与可解释性共存的潜力,并将研究发现总结为一系列实用建议,为这一重要交叉领域的未来研究提供指导。
English
In the study of trustworthy Natural Language Processing (NLP), a number of
important research fields have emerged, including that of
explainability and privacy. While research interest in both
explainable and privacy-preserving NLP has increased considerably in recent
years, there remains a lack of investigation at the intersection of the two.
This leaves a considerable gap in understanding of whether achieving
both explainability and privacy is possible, or whether the two are at
odds with each other. In this work, we conduct an empirical investigation into
the privacy-explainability trade-off in the context of NLP, guided by the
popular overarching methods of Differential Privacy (DP) and Post-hoc
Explainability. Our findings include a view into the intricate relationship
between privacy and explainability, which is formed by a number of factors,
including the nature of the downstream task and choice of the text
privatization and explainability method. In this, we highlight the potential
for privacy and explainability to co-exist, and we summarize our findings in a
collection of practical recommendations for future work at this important
intersection.