ChatPaper.aiChatPaper

透過混合語言家族專家,高效實現醫學LLM模型在50種語言上的民主化。

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

October 14, 2024
作者: Guorui Zheng, Xidong Wang, Juhao Liang, Nuo Chen, Yuping Zheng, Benyou Wang
cs.AI

摘要

將醫學大型語言模型適應本地語言可以降低訪問醫療服務的障礙,但資料稀缺仍然是一個重要挑戰,特別是對於資源匱乏的語言。為了應對這一挑戰,我們首先建立了一個高質量的醫學數據集並進行分析以確保其質量。為了利用多語言大型語言模型的泛化能力,以有效擴展到更多資源受限的語言,我們從多語言的角度探索了LLMs的內部信息流,並使用專家混合(MoE)模塊化。從技術上講,我們提出了一種採用特定語言專家和跨語言路由的新型MoE路由方法。受電路理論的啟發,我們的路由分析揭示了一種信息流機制,即在較早的層次集中於跨語言信息流,而在較後的層次表現出特定語言的分歧。這一洞察直接導致了Post-MoE架構的開發,該架構僅在後期層次應用稀疏路由,同時保持其他層次的密集性。實驗結果表明,這種方法增強了多語言模型對其他語言的泛化能力,同時保持了可解釋性。最後,為了將模型有效擴展到50種語言,我們引入了語言家族專家的概念,借鑒語言先驗知識,這使得可以擴展語言數量而無需增加額外參數。
English
Adapting medical Large Language Models to local languages can reduce barriers to accessing healthcare services, but data scarcity remains a significant challenge, particularly for low-resource languages. To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. In order to leverage the generalization capability of multilingual LLMs to efficiently scale to more resource-constrained languages, we explore the internal information flow of LLMs from a multilingual perspective using Mixture of Experts (MoE) modularity. Technically, we propose a novel MoE routing method that employs language-specific experts and cross-lingual routing. Inspired by circuit theory, our routing analysis revealed a Spread Out in the End information flow mechanism: while earlier layers concentrate cross-lingual information flow, the later layers exhibit language-specific divergence. This insight directly led to the development of the Post-MoE architecture, which applies sparse routing only in the later layers while maintaining dense others. Experimental results demonstrate that this approach enhances the generalization of multilingual models to other languages while preserving interpretability. Finally, to efficiently scale the model to 50 languages, we introduce the concept of language family experts, drawing on linguistic priors, which enables scaling the number of languages without adding additional parameters.

Summary

AI-Generated Summary

PDF402November 16, 2024