從有限領域的數據中以低成本進行推論的專業語言模型

摘要

大型語言模型已成為一種多功能工具，但在應用於缺乏大型推論預算和大型領域內訓練集的任務時具有挑戰性。本研究對這些限制進行了形式化，並區分了四個重要變數：預訓練預算（用於在目標領域未知之前進行訓練）、專業預算（用於在目標領域已知之後進行訓練）、推論預算和領域內訓練集大小。在這些設置中，我們比較了機器學習文獻中的不同方法。受推論成本限制，我們找到了比訓練非常大型基本變壓器模型的標準做法更好的替代方案。特別是，我們發現超網絡和專家混合對於大型預訓練預算具有更好的困惑度，而在重要性抽樣數據集上訓練的小型模型對於大型專業預算具有吸引力。

English

Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the inference budget, and the in-domain training set size. Across these settings, we compare different approaches from the machine learning literature. Limited by inference cost, we find better alternatives to the standard practice of training very large vanilla transformer models. In particular, we show that hyper-networks and mixture of experts have better perplexity for large pretraining budgets, while small models trained on importance sampled datasets are attractive for large specialization budgets.

從有限領域的數據中以低成本進行推論的專業語言模型

Specialized Language Models with Cheap Inference from Limited Domain Data

摘要

Support