FedNano：面向预训练多模态大语言模型的轻量化联邦调优

摘要

多模态大语言模型（MLLMs）在多模态推理和跨模态检索等任务中表现出色，但在实际应用场景中，由于分布式多模态数据和严格的隐私要求，其部署面临挑战。联邦学习（FL）提供了一种解决方案，它允许在不集中数据的情况下进行协作模型训练。然而，为MLLMs实现FL带来了显著挑战，包括高计算需求、有限的客户端能力、巨大的通信成本以及异构的客户端数据。现有的FL方法假设在客户端部署完整模型，这一假设对于大规模MLLMs来说不成立，因为它们的庞大规模和通信需求。为了解决这些限制，我们提出了FedNano，这是首个将LLM集中在服务器上，同时引入NanoEdge的FL框架，NanoEdge是一个用于客户端特定适应的轻量级模块。NanoEdge采用特定模态的编码器、连接器以及可训练的低秩适应NanoAdapters。这一设计消除了在客户端部署LLM的需求，将客户端存储减少了95%，并将通信开销限制在模型参数的仅0.01%。通过仅传输紧凑的NanoAdapter更新，FedNano能够处理异构的客户端数据和资源限制，同时保护隐私。实验表明，FedNano超越了先前的FL基线，弥合了MLLM规模与FL可行性之间的差距，并实现了可扩展的、去中心化的多模态AI系统。

English

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

FedNano：面向预训练多模态大语言模型的轻量化联邦调优

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models

摘要

Support