数十億パラメータを有する大規模言語モデル訓練のための最適化ネットワークアーキテクチャ

要旨

本論文は、大規模言語モデル（LLM）のトレーニングにおける任意対任意ネットワーク構築の既存パラダイムに挑戦するものである。我々は、LLMが独特の通信パターンを示し、最適に近いトレーニング性能を達成するためには、GPUの小さなグループ内でのみ高帯域幅の任意対任意通信が必要であることを示す。これらのGPUグループ間では、通信は無視できるほど少なく、疎で均質である。我々は、LLMの通信要件に密接に適合する新しいネットワークアーキテクチャを提案する。このアーキテクチャでは、クラスタを非ブロッキングの任意対任意高帯域幅相互接続（HBドメインと呼ぶ）で相互接続されたGPUのセットに分割する。HBドメイン間では、ネットワークは通信需要のあるGPUのみを接続する。我々はこのネットワークを「レールのみ」接続と呼び、提案するアーキテクチャが、LLMトレーニングの性能を損なうことなく、最先端の任意対任意Closネットワークと比較してネットワークコストを最大75％削減することを示す。

English

This paper challenges the well-established paradigm for building any-to-any networks for training Large Language Models (LLMs). We show that LLMs exhibit a unique communication pattern where only small groups of GPUs require high-bandwidth any-to-any communication within them, to achieve near-optimal training performance. Across these groups of GPUs, the communication is insignificant, sparse, and homogeneous. We propose a new network architecture that closely resembles the communication requirement of LLMs. Our architecture partitions the cluster into sets of GPUs interconnected with non-blocking any-to-any high-bandwidth interconnects that we call HB domains. Across the HB domains, the network only connects GPUs with communication demands. We call this network a "rail-only" connection, and show that our proposed architecture reduces the network cost by up to 75% compared to the state-of-the-art any-to-any Clos networks without compromising the performance of LLM training.

数十億パラメータを有する大規模言語モデル訓練のための最適化ネットワークアーキテクチャ

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

要旨

Support