言語ファミリー専門家の混合を用いて、50言語向けの医療LLMsを効率的に民主化する

要旨

医療用大規模言語モデルを地域言語に適応させることで、医療サービスへのアクセス障壁を減らすことができますが、データの希少性は依然として大きな課題です。特に、リソースが少ない言語にとってはそうです。この課題に対処するために、まず高品質な医療データセットを構築し、その品質を確保するための分析を行います。リソースが制約された言語に効率的にスケーリングするために、多言語対応LLMの汎化能力を活用するために、Mixture of Experts（MoE）のモジュラリティからLLMの内部情報フローを多言語の視点から探求します。技術的には、言語固有の専門家とクロスリンガルなルーティングを採用する新しいMoEルーティング手法を提案します。回路理論に着想を得て、我々のルーティング分析は、情報フローのメカニズムとして「最終的に拡散する」を明らかにしました。初期の層がクロスリンガルな情報フローに集中する一方、後の層は言語固有の分岐を示します。この洞察から、Post-MoEアーキテクチャの開発に直結し、後の層でのみ疎なルーティングを適用しつつ、他の層は密に保ちます。実験結果は、このアプローチが多言語モデルの他言語への汎化を向上させつつ、解釈可能性を維持することを示しています。最後に、50言語のモデルを効率的にスケーリングするために、言語ファミリー専門家の概念を導入し、言語学的先行事項に基づいて言語の数を増やすことなく追加のパラメータを追加します。

English

Adapting medical Large Language Models to local languages can reduce barriers to accessing healthcare services, but data scarcity remains a significant challenge, particularly for low-resource languages. To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. In order to leverage the generalization capability of multilingual LLMs to efficiently scale to more resource-constrained languages, we explore the internal information flow of LLMs from a multilingual perspective using Mixture of Experts (MoE) modularity. Technically, we propose a novel MoE routing method that employs language-specific experts and cross-lingual routing. Inspired by circuit theory, our routing analysis revealed a Spread Out in the End information flow mechanism: while earlier layers concentrate cross-lingual information flow, the later layers exhibit language-specific divergence. This insight directly led to the development of the Post-MoE architecture, which applies sparse routing only in the later layers while maintaining dense others. Experimental results demonstrate that this approach enhances the generalization of multilingual models to other languages while preserving interpretability. Finally, to efficiently scale the model to 50 languages, we introduce the concept of language family experts, drawing on linguistic priors, which enables scaling the number of languages without adding additional parameters.

言語ファミリー専門家の混合を用いて、50言語向けの医療LLMsを効率的に民主化する

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

要旨

Support