ChatPaper.aiChatPaper

億級參數大型語言模型訓練的優化網絡架構

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

July 22, 2023
作者: Weiyang Wang, Manya Ghobadi, Kayvon Shakeri, Ying Zhang, Naader Hasani
cs.AI

摘要

本文挑戰建構用於訓練大型語言模型(LLMs)的任意至任意網路的傳統範式。我們展示LLMs呈現獨特的通訊模式,其中只有少數GPU組需要彼此之間高頻寬的任意至任意通訊,以達到接近最佳的訓練效能。在這些GPU組之間,通訊是微不足道、稀疏且均勻的。我們提出一種新的網路架構,與LLMs的通訊需求密切相似。我們的架構將叢集分割為一組GPU,這些GPU之間通過非阻塞的任意至任意高頻寬互連相連,我們稱之為HB區域。在HB區域之間,網路僅連接具有通訊需求的GPU。我們稱這種網路為“僅軌道”連接,並展示我們提出的架構將網路成本降低高達75%,相較於最先進的任意至任意Clos網路,同時不影響LLM訓練的性能。
English
This paper challenges the well-established paradigm for building any-to-any networks for training Large Language Models (LLMs). We show that LLMs exhibit a unique communication pattern where only small groups of GPUs require high-bandwidth any-to-any communication within them, to achieve near-optimal training performance. Across these groups of GPUs, the communication is insignificant, sparse, and homogeneous. We propose a new network architecture that closely resembles the communication requirement of LLMs. Our architecture partitions the cluster into sets of GPUs interconnected with non-blocking any-to-any high-bandwidth interconnects that we call HB domains. Across the HB domains, the network only connects GPUs with communication demands. We call this network a "rail-only" connection, and show that our proposed architecture reduces the network cost by up to 75% compared to the state-of-the-art any-to-any Clos networks without compromising the performance of LLM training.
PDF100December 15, 2024