ChatPaper.aiChatPaper

通过一种语言家族专家混合方法,高效地将医学LLMs民主化,覆盖50种语言。

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

October 14, 2024
作者: Guorui Zheng, Xidong Wang, Juhao Liang, Nuo Chen, Yuping Zheng, Benyou Wang
cs.AI

摘要

将医学大型语言模型调整为本地语言可以降低获取医疗服务的障碍,但数据稀缺仍然是一个重要挑战,特别是对于资源匮乏的语言。为了解决这一问题,我们首先构建了一个高质量的医学数据集,并进行分析以确保其质量。为了利用多语言大型语言模型的泛化能力,以便高效扩展到更多资源受限的语言,我们从多语言角度探索了LLMs的内部信息流,采用专家混合(MoE)模块化。在技术上,我们提出了一种新颖的MoE路由方法,采用特定语言的专家和跨语言路由。受电路理论启发,我们的路由分析揭示了一种信息流机制,即“最终分散”:早期层集中跨语言信息流,而后期层展现出特定语言的分歧。这一发现直接导致了后MoE架构的开发,该架构仅在后期层中应用稀疏路由,同时保持其他层的密集性。实验结果表明,这种方法增强了多语言模型对其他语言的泛化能力,同时保持了可解释性。最后,为了将模型高效扩展到50种语言,我们引入了语言族专家的概念,借助语言先验知识,从而能够扩展语言数量而无需增加额外参数。
English
Adapting medical Large Language Models to local languages can reduce barriers to accessing healthcare services, but data scarcity remains a significant challenge, particularly for low-resource languages. To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. In order to leverage the generalization capability of multilingual LLMs to efficiently scale to more resource-constrained languages, we explore the internal information flow of LLMs from a multilingual perspective using Mixture of Experts (MoE) modularity. Technically, we propose a novel MoE routing method that employs language-specific experts and cross-lingual routing. Inspired by circuit theory, our routing analysis revealed a Spread Out in the End information flow mechanism: while earlier layers concentrate cross-lingual information flow, the later layers exhibit language-specific divergence. This insight directly led to the development of the Post-MoE architecture, which applies sparse routing only in the later layers while maintaining dense others. Experimental results demonstrate that this approach enhances the generalization of multilingual models to other languages while preserving interpretability. Finally, to efficiently scale the model to 50 languages, we introduce the concept of language family experts, drawing on linguistic priors, which enables scaling the number of languages without adding additional parameters.

Summary

AI-Generated Summary

PDF402November 16, 2024