联邦计算下的ROC与PR曲线

摘要

接收者操作特徵（ROC）曲線與精確率-召回率（PR）曲線是評估機器學習分類器的基本工具，它們深入揭示了真陽性率與假陽性率（ROC）或精確率與召回率（PR）之間的權衡關係。然而，在聯邦學習（FL）場景中，數據分散於多個客戶端，由於隱私保護和通信限制，計算這些曲線面臨挑戰。具體而言，服務器無法訪問原始預測分數和類別標籤，這些在集中式設置中用於計算ROC和PR曲線。本文提出了一種新穎的方法，在聯邦學習環境下，通過在分佈式差分隱私框架下估計預測分數分佈的分位數，來近似計算ROC和PR曲線。我們提供了真實曲線與估計曲線之間面積誤差（AE）的理論界限，展示了近似精度、隱私保護與通信成本之間的權衡。基於真實世界數據集的實驗結果表明，我們的方法在最小化通信成本和提供強隱私保障的同時，實現了高精度的近似，使其成為聯邦系統中隱私保護模型評估的實用方案。

English

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate (ROC) or precision vs. recall (PR). However, in Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing these curves is challenging due to privacy and communication constraints. Specifically, the server cannot access raw prediction scores and class labels, which are used to compute the ROC and PR curves in a centralized setting. In this paper, we propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy. We provide theoretical bounds on the Area Error (AE) between the true and estimated curves, demonstrating the trade-offs between approximation accuracy, privacy, and communication cost. Empirical results on real-world datasets demonstrate that our method achieves high approximation accuracy with minimal communication and strong privacy guarantees, making it practical for privacy-preserving model evaluation in federated systems.

联邦计算下的ROC与PR曲线

Federated Computation of ROC and PR Curves

摘要

Support