FedNano:面向预训练多模态大语言模型的轻量化联邦调优
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
June 12, 2025
作者: Yao Zhang, Hewei Gao, Haokun Chen, Weiguo Li, Yunpu Ma, Volker Tresp
cs.AI
摘要
多模态大语言模型(MLLMs)在多模态推理和跨模态检索等任务中表现出色,但在实际应用场景中,由于分布式多模态数据和严格的隐私要求,其部署面临挑战。联邦学习(FL)提供了一种解决方案,它允许在不集中数据的情况下进行协作模型训练。然而,为MLLMs实现FL带来了显著挑战,包括高计算需求、有限的客户端能力、巨大的通信成本以及异构的客户端数据。现有的FL方法假设在客户端部署完整模型,这一假设对于大规模MLLMs来说不成立,因为它们的庞大规模和通信需求。为了解决这些限制,我们提出了FedNano,这是首个将LLM集中在服务器上,同时引入NanoEdge的FL框架,NanoEdge是一个用于客户端特定适应的轻量级模块。NanoEdge采用特定模态的编码器、连接器以及可训练的低秩适应NanoAdapters。这一设计消除了在客户端部署LLM的需求,将客户端存储减少了95%,并将通信开销限制在模型参数的仅0.01%。通过仅传输紧凑的NanoAdapter更新,FedNano能够处理异构的客户端数据和资源限制,同时保护隐私。实验表明,FedNano超越了先前的FL基线,弥合了MLLM规模与FL可行性之间的差距,并实现了可扩展的、去中心化的多模态AI系统。
English
Multimodal Large Language Models (MLLMs) excel in tasks like multimodal
reasoning and cross-modal retrieval but face deployment challenges in
real-world scenarios due to distributed multimodal data and strict privacy
requirements. Federated Learning (FL) offers a solution by enabling
collaborative model training without centralizing data. However, realizing FL
for MLLMs presents significant challenges, including high computational
demands, limited client capacity, substantial communication costs, and
heterogeneous client data. Existing FL methods assume client-side deployment of
full models, an assumption that breaks down for large-scale MLLMs due to their
massive size and communication demands. To address these limitations, we
propose FedNano, the first FL framework that centralizes the LLM on the server
while introducing NanoEdge, a lightweight module for client-specific
adaptation. NanoEdge employs modality-specific encoders, connectors, and
trainable NanoAdapters with low-rank adaptation. This design eliminates the
need to deploy LLM on clients, reducing client-side storage by 95%, and
limiting communication overhead to only 0.01% of the model parameters. By
transmitting only compact NanoAdapter updates, FedNano handles heterogeneous
client data and resource constraints while preserving privacy. Experiments
demonstrate that FedNano outperforms prior FL baselines, bridging the gap
between MLLM scale and FL feasibility, and enabling scalable, decentralized
multimodal AI systems.