PanGu-π: 非線形性補償による言語モデルアーキテクチャの強化

要旨

近年の大規模言語モデル（LLMs）のトレンドは、モデルサイズ（すなわちパラメータ数）とデータセットの規模を拡大することで、より優れた生成能力を達成することにあります。これは、有名なGPTやLlamaなどの多くの研究によって明確に証明されています。しかし、大規模モデルはしばしば膨大な計算コストを伴い、実用的なアプリケーションではそのような高コストを負担することができません。一方で、LLMsのための強力なモデルアーキテクチャを構築する方法については、ほとんど議論されていません。我々はまず、最先端の言語モデルアーキテクチャを分析し、特徴の崩壊問題を観察しました。理論的な分析に基づいて、非線形性が言語モデルにおいても非常に重要であることを提案します。これは通常、視覚タスクのための畳み込みニューラルネットワークで研究されているものです。その後、計算量が無視できるほど小さいシリーズインフォームド活性化関数を導入し、モデルの非線形性をさらに強化するために拡張ショートカットを使用しました。我々は、慎重に設計されたアブレーション実験を通じて、提案されたアプローチがモデルの非線形性を大幅に向上させることを実証し、現代的な効率的なモデルアーキテクチャであるPanGu-piを提示します。その後、同じデータセットとトレーニング戦略を使用して、PanGu-piと最先端のLLMsを比較する実験を行いました。その結果、PanGu-pi-7Bはベンチマークと同等の性能を達成し、推論速度が約10％向上し、PanGu-pi-1Bは精度と効率の両面で最先端の性能を達成することが示されました。さらに、我々はPanGu-pi-7Bを金融や法律などの高価値ドメインに展開し、実用的なアプリケーション向けのLLMであるYunShanを開発しました。その結果、YunShanは類似スケールの他のモデルをベンチマークで上回ることが示されました。

English

The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-pi. Experiments are then conducted using the same dataset and training strategy to compare PanGu-pi with state-of-the-art LLMs. The results show that PanGu-pi-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-pi-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-pi-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.

PanGu-π: 非線形性補償による言語モデルアーキテクチャの強化

PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation

要旨

Support