PanGu-π：通过非线性补偿增强语言模型架构

摘要

最近大语言模型（LLMs）的趋势是增加模型规模（即参数数量）和数据集规模，以实现更好的生成能力，这一点已被许多工作所证实，如著名的GPT和Llama。然而，大型模型往往涉及巨大的计算成本，实际应用无法承受如此高昂的价格。然而，针对LLMs构建强大模型架构的方法很少被讨论。我们首先分析了最先进的语言模型架构，并观察到特征坍塌问题。基于理论分析，我们提出非线性对于语言模型也非常重要，这通常在用于视觉任务的卷积神经网络中进行研究。然后引入了一种系列通知激活函数，通过可以忽略的微小计算，并进一步使用增强的快捷方式来增强模型的非线性。我们随后证明了所提出的方法通过精心设计的消融实验显著有效，因此我们提出了一种用于建立现代模型的高效模型架构，即PanGu-pi。然后使用相同的数据集和训练策略进行实验，将PanGu-pi与最先进的LLMs进行比较。结果显示，PanGu-pi-7B可以实现与基准模型相当的性能，推理速度提高约10％，而PanGu-pi-1B在准确性和效率方面可以实现最先进的性能。此外，我们已将PanGu-pi-7B部署在金融和法律等高价值领域，开发了一个名为YunShan的LLM用于实际应用。结果显示，YunShan在基准测试中可以超越其他类似规模的模型。

English

The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-pi. Experiments are then conducted using the same dataset and training strategy to compare PanGu-pi with state-of-the-art LLMs. The results show that PanGu-pi-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-pi-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-pi-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.

PanGu-π：通过非线性补偿增强语言模型架构

PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation

摘要

Support