當可解釋性遇上隱私:自然語言處理背景下事後可解釋性與差分隱私交叉領域的探究
When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing
August 14, 2025
作者: Mahdi Dhaini, Stephen Meisenbacher, Ege Erdogan, Florian Matthes, Gjergji Kasneci
cs.AI
摘要
在可信自然語言處理(NLP)的研究中,已湧現出多個重要研究領域,其中包括可解釋性與隱私保護。儘管近年來對可解釋及隱私保護的NLP研究興趣顯著增加,但在這兩者的交叉領域仍缺乏深入探討。這導致我們對於是否能夠同時實現可解釋性與隱私保護,或者這兩者是否相互矛盾,存在著相當大的理解空白。在本研究中,我們以差分隱私(DP)和事後可解釋性這兩種主流方法為指導,對NLP中的隱私-可解釋性權衡進行了實證研究。我們的研究揭示了隱私與可解釋性之間複雜的關係,這種關係由多種因素構成,包括下游任務的性質、文本隱私化方法及可解釋性方法的選擇。在此基礎上,我們強調了隱私與可解釋性共存的潛力,並將我們的研究成果總結為一系列實用建議,以供未來在這一重要交叉領域的工作參考。
English
In the study of trustworthy Natural Language Processing (NLP), a number of
important research fields have emerged, including that of
explainability and privacy. While research interest in both
explainable and privacy-preserving NLP has increased considerably in recent
years, there remains a lack of investigation at the intersection of the two.
This leaves a considerable gap in understanding of whether achieving
both explainability and privacy is possible, or whether the two are at
odds with each other. In this work, we conduct an empirical investigation into
the privacy-explainability trade-off in the context of NLP, guided by the
popular overarching methods of Differential Privacy (DP) and Post-hoc
Explainability. Our findings include a view into the intricate relationship
between privacy and explainability, which is formed by a number of factors,
including the nature of the downstream task and choice of the text
privatization and explainability method. In this, we highlight the potential
for privacy and explainability to co-exist, and we summarize our findings in a
collection of practical recommendations for future work at this important
intersection.