具有数十亿参数的大型语言模型训练的优化网络架构

摘要

本文挑战了为训练大型语言模型（LLMs）构建任意到任意网络的成熟范式。我们展示了LLMs表现出独特的通信模式，其中只有少量GPU组需要它们之间的高带宽任意到任意通信，以实现接近最佳的训练性能。在这些GPU组中，通信是微不足道的、稀疏的和均匀的。我们提出了一种新的网络架构，它与LLMs的通信需求密切相关。我们的架构将集群分成一组与非阻塞任意到任意高带宽互连相连的GPU集合，我们称之为HB域。在HB域之间，网络仅连接具有通信需求的GPU。我们将这种网络称为“仅轨道”连接，并展示了我们提出的架构将网络成本降低了高达75%，而不会影响LLMs训练的性能，相比之下，与最先进的任意到任意Clos网络相比。

English

This paper challenges the well-established paradigm for building any-to-any networks for training Large Language Models (LLMs). We show that LLMs exhibit a unique communication pattern where only small groups of GPUs require high-bandwidth any-to-any communication within them, to achieve near-optimal training performance. Across these groups of GPUs, the communication is insignificant, sparse, and homogeneous. We propose a new network architecture that closely resembles the communication requirement of LLMs. Our architecture partitions the cluster into sets of GPUs interconnected with non-blocking any-to-any high-bandwidth interconnects that we call HB domains. Across the HB domains, the network only connects GPUs with communication demands. We call this network a "rail-only" connection, and show that our proposed architecture reduces the network cost by up to 75% compared to the state-of-the-art any-to-any Clos networks without compromising the performance of LLM training.

具有数十亿参数的大型语言模型训练的优化网络架构

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

摘要

Support