非IIDデータ下におけるフェデレーテッド学習におけるマルチタスクオートエンコーダを用いたサンプル選択

要旨

フェデレーテッドラーニングは、データのプライバシーを保証しつつ、複数のデバイスが中央サーバーの管理下で協調的にモデルを訓練する機械学習のパラダイムである。しかし、その性能は冗長なサンプル、悪意のあるサンプル、異常サンプルによって阻害され、モデルの劣化や非効率性を招くことが多い。これらの問題を克服するため、本論文では画像分類向けの新たなサンプル選択手法を提案する。この手法では、マルチタスクオートエンコーダを用い、損失と特徴量の分析を通じてサンプルの寄与度を推定する。我々のアプローチは、教師なし外れ値検出を組み込んでおり、中央サーバーが管理するOne-Class Support Vector Machine (OCSVM)、Isolation Forest (IF)、Adaptive Loss Threshold (AT) 法をクライアント側のノイジーサンプル除去に用いる。さらに、特徴量ベースのサンプル選択を強化するため、中央サーバーが制御する多クラスDeep Support Vector Data Description (SVDD) 損失を提案する。提案手法を、CIFAR10およびMNISTデータセットにおいて、様々なクライアント数、非IID分布、最大40%のノイズレベルという条件下で検証した。結果は、損失ベースのサンプル選択が精度を大幅に向上させることを示しており、CIFAR10ではOCSVMを用いて最大7.02%、MNISTではATを用いて最大1.83%の精度向上を達成した。加えて、我々のフェデレーテッドSVDD損失は特徴量ベースのサンプル選択をさらに改善し、CIFAR10ではOCSVMと組み合わせて最大0.99%の精度向上をもたらした。これらの結果は、様々なクライアント数やノイズ条件下において、提案手法がモデル精度を改善する有効性を示している。

English

Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multitask autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.