联邦计算下的ROC与PR曲线

摘要

接收者操作特征（ROC）曲线与精确率-召回率（PR）曲线是评估机器学习分类器的基本工具，它们深入揭示了真阳性率与假阳性率（ROC）或精确率与召回率（PR）之间的权衡关系。然而，在联邦学习（FL）场景中，数据分布于多个客户端之间，由于隐私保护和通信限制，计算这些曲线面临挑战。具体而言，服务器无法访问原始预测分数和类别标签，而这些在集中式设置中用于计算ROC和PR曲线。本文提出了一种新颖的方法，在联邦学习环境下通过估计预测分数分布的分布式差分隐私分位数来近似ROC和PR曲线。我们为真实曲线与估计曲线之间的面积误差（AE）提供了理论界限，展示了近似精度、隐私保护与通信成本之间的权衡。基于真实世界数据集的实证结果表明，我们的方法在最小化通信和确保强隐私保护的同时，实现了高精度的近似，使其成为联邦系统中隐私保护模型评估的实用方案。

English

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate (ROC) or precision vs. recall (PR). However, in Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing these curves is challenging due to privacy and communication constraints. Specifically, the server cannot access raw prediction scores and class labels, which are used to compute the ROC and PR curves in a centralized setting. In this paper, we propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy. We provide theoretical bounds on the Area Error (AE) between the true and estimated curves, demonstrating the trade-offs between approximation accuracy, privacy, and communication cost. Empirical results on real-world datasets demonstrate that our method achieves high approximation accuracy with minimal communication and strong privacy guarantees, making it practical for privacy-preserving model evaluation in federated systems.

联邦计算下的ROC与PR曲线

Federated Computation of ROC and PR Curves

摘要

Support