ChatPaper.aiChatPaper

非独立同分布数据联邦学习中的多任务自编码器样本选择方法

Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data

April 28, 2026
作者: Emre Ardıç, Yakup Genç
cs.AI

摘要

联邦学习是一种在中央服务器协调下,多个设备协同训练模型并确保数据隐私的机器学习范式。然而,冗余、恶意或异常样本常导致模型性能下降与效率低下。为解决此问题,我们提出面向图像分类的新型样本选择方法,采用多任务自编码器通过损失值与特征分析评估样本贡献度。该方法集成无监督异常检测技术,由中央服务器调度一类支持向量机(OCSVM)、隔离森林(IF)和自适应损失阈值(AT)方法,在客户端过滤噪声样本。我们还提出由中央服务器调控的多分类深度支持向量数据描述(SVDD)损失函数,以增强基于特征的样本选择。通过在CIFAR10和MNIST数据集上验证,涵盖不同客户端数量、非独立同分布数据及最高40%的噪声水平,实验表明:基于损失的样本选择显著提升准确率,其中OCSVM在CIFAR10上实现7.02%的增益,AT在MNIST上获得1.83%的增益;此外,联邦SVDD损失函数进一步优化了基于特征的样本选择,使OCSVM在CIFAR10上的准确率最高提升0.99%。这些结果证明了我们的方法在不同客户端规模与噪声环境下提升模型准确率的有效性。
English
Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multitask autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.
PDF01May 1, 2026