Transformerアーキテクチャのための構成可能な関数保存的拡張

要旨

最先端のニューラルネットワークを訓練するには、計算資源と時間の面で高いコストがかかります。モデルの規模は、最先端の性能を達成し向上させるための重要な要素として認識されています。ニューラルネットワークの規模を拡大する場合、通常はモデルのすべてのパラメータをランダムに初期化してゼロから再始動する必要があります。これは、アーキテクチャのパラメータが変更されるため、小規模なモデルからの知識を直接転移することができないからです。本研究では、トランスフォーマーベースのニューラルネットワークの規模を機能を維持しながら段階的に拡大するための6つの合成可能な変換を提案します。これにより、必要に応じてモデルの容量を拡張することが可能になります。各変換について、最小限の初期化制約の下で正確な機能保存を証明します。提案手法は、訓練を通じてアーキテクチャを段階的に拡張することで、より大規模で強力なモデルの効率的な訓練パイプラインを可能にするかもしれません。

English

Training state-of-the-art neural networks requires a high cost in terms of compute and time. Model scale is recognized to be a critical factor to achieve and improve the state-of-the-art. Increasing the scale of a neural network normally requires restarting from scratch by randomly initializing all the parameters of the model, as this implies a change of architecture's parameters that does not allow for a straightforward transfer of knowledge from smaller size models. In this work, we propose six composable transformations to incrementally increase the size of transformer-based neural networks while preserving functionality, allowing to expand the capacity of the model as needed. We provide proof of exact function preservation under minimal initialization constraints for each transformation. The proposed methods may enable efficient training pipelines for larger and more powerful models by progressively expanding the architecture throughout training.

Transformerアーキテクチャのための構成可能な関数保存的拡張

Composable Function-preserving Expansions for Transformer Architectures

要旨

Support