FedNano: 사전 학습된 멀티모달 대규모 언어 모델을 위한 경량화된 연합 튜닝 방향성 제시

초록

멀티모달 대규모 언어 모델(MLLMs)은 멀티모달 추론 및 크로스모달 검색과 같은 작업에서 뛰어난 성능을 보이지만, 분산된 멀티모달 데이터와 엄격한 개인정보 보호 요구사항으로 인해 실제 시나리오에서의 배포에는 어려움이 있습니다. 연합 학습(Federated Learning, FL)은 데이터를 중앙 집중화하지 않고도 협력적인 모델 학습을 가능하게 함으로써 이러한 문제에 대한 해결책을 제공합니다. 그러나 MLLMs에 FL을 적용하는 것은 높은 계산 요구 사항, 제한된 클라이언트 용량, 상당한 통신 비용, 그리고 이질적인 클라이언트 데이터와 같은 상당한 도전 과제를 안고 있습니다. 기존의 FL 방법들은 클라이언트 측에 전체 모델을 배포한다는 가정을 하고 있지만, 이는 대규모 MLLMs의 경우 그 거대한 크기와 통신 요구 사항으로 인해 적용하기 어렵습니다. 이러한 한계를 해결하기 위해, 우리는 서버에 LLM을 중앙 집중화하고 클라이언트별 적응을 위한 경량 모듈인 NanoEdge를 도입한 최초의 FL 프레임워크인 FedNano를 제안합니다. NanoEdge는 모달리티별 인코더, 커넥터, 그리고 저랭크 적응(Low-Rank Adaptation)을 사용한 학습 가능한 NanoAdapters를 활용합니다. 이 설계는 클라이언트에 LLM을 배포할 필요를 없애고, 클라이언트 측 저장 공간을 95% 줄이며, 통신 오버헤드를 모델 파라미터의 단 0.01%로 제한합니다. FedNano는 컴팩트한 NanoAdapter 업데이트만을 전송함으로써 이질적인 클라이언트 데이터와 자원 제약을 처리하면서도 개인정보를 보호합니다. 실험 결과, FedNano는 기존의 FL 베이스라인을 능가하며, MLLM의 규모와 FL의 실현 가능성 간의 격차를 줄이고, 확장 가능한 분산형 멀티모달 AI 시스템을 가능하게 합니다.

English

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

FedNano: 사전 학습된 멀티모달 대규모 언어 모델을 위한 경량화된 연합 튜닝 방향성 제시

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models

초록

Support