限られたドメインデータからの低コスト推論を実現する専門言語モデル

要旨

大規模言語モデルは汎用的なツールとして登場したが、大規模な推論予算や大規模なドメイン内トレーニングセットを欠くタスクに適用するのは困難である。本研究ではこれらの制約を形式化し、4つの重要な変数を区別する：事前学習予算（ターゲットドメインが知られる前のトレーニング用）、専門化予算（ターゲットドメインが知られた後のトレーニング用）、推論予算、およびドメイン内トレーニングセットのサイズである。これらの設定において、機械学習文献から異なるアプローチを比較する。推論コストに制約される中で、非常に大規模な標準的なTransformerモデルをトレーニングする従来の手法よりも優れた代替案を見出した。特に、ハイパーネットワークやエキスパートの混合は大規模な事前学習予算に対してより良いパープレキシティを示し、重要度サンプリングされたデータセットでトレーニングされた小規模モデルは大規模な専門化予算に対して魅力的であることを示す。

English

Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the inference budget, and the in-domain training set size. Across these settings, we compare different approaches from the machine learning literature. Limited by inference cost, we find better alternatives to the standard practice of training very large vanilla transformer models. In particular, we show that hyper-networks and mixture of experts have better perplexity for large pretraining budgets, while small models trained on importance sampled datasets are attractive for large specialization budgets.

限られたドメインデータからの低コスト推論を実現する専門言語モデル

Specialized Language Models with Cheap Inference from Limited Domain Data

要旨

Support