通过一种语言家族专家混合方法,高效地将医学LLMs民主化,覆盖50种语言。
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
October 14, 2024
作者: Guorui Zheng, Xidong Wang, Juhao Liang, Nuo Chen, Yuping Zheng, Benyou Wang
cs.AI
摘要
将医学大型语言模型调整为本地语言可以降低获取医疗服务的障碍,但数据稀缺仍然是一个重要挑战,特别是对于资源匮乏的语言。为了解决这一问题,我们首先构建了一个高质量的医学数据集,并进行分析以确保其质量。为了利用多语言大型语言模型的泛化能力,以便高效扩展到更多资源受限的语言,我们从多语言角度探索了LLMs的内部信息流,采用专家混合(MoE)模块化。在技术上,我们提出了一种新颖的MoE路由方法,采用特定语言的专家和跨语言路由。受电路理论启发,我们的路由分析揭示了一种信息流机制,即“最终分散”:早期层集中跨语言信息流,而后期层展现出特定语言的分歧。这一发现直接导致了后MoE架构的开发,该架构仅在后期层中应用稀疏路由,同时保持其他层的密集性。实验结果表明,这种方法增强了多语言模型对其他语言的泛化能力,同时保持了可解释性。最后,为了将模型高效扩展到50种语言,我们引入了语言族专家的概念,借助语言先验知识,从而能够扩展语言数量而无需增加额外参数。
English
Adapting medical Large Language Models to local languages can reduce barriers
to accessing healthcare services, but data scarcity remains a significant
challenge, particularly for low-resource languages. To address this, we first
construct a high-quality medical dataset and conduct analysis to ensure its
quality. In order to leverage the generalization capability of multilingual
LLMs to efficiently scale to more resource-constrained languages, we explore
the internal information flow of LLMs from a multilingual perspective using
Mixture of Experts (MoE) modularity. Technically, we propose a novel MoE
routing method that employs language-specific experts and cross-lingual
routing. Inspired by circuit theory, our routing analysis revealed a Spread Out
in the End information flow mechanism: while earlier layers concentrate
cross-lingual information flow, the later layers exhibit language-specific
divergence. This insight directly led to the development of the Post-MoE
architecture, which applies sparse routing only in the later layers while
maintaining dense others. Experimental results demonstrate that this approach
enhances the generalization of multilingual models to other languages while
preserving interpretability. Finally, to efficiently scale the model to 50
languages, we introduce the concept of language family experts, drawing on
linguistic priors, which enables scaling the number of languages without adding
additional parameters.Summary
AI-Generated Summary