ChatPaper.aiChatPaper

平衡管道平行性與詞彙平行性

Balancing Pipeline Parallelism with Vocabulary Parallelism

November 8, 2024
作者: Man Tsung Yeung, Penghui Qi, Min Lin, Xinyi Wan
cs.AI

摘要

管道並行性被廣泛應用於擴展基於變壓器的大型語言模型的訓練,已經有各種工作來改進其吞吐量和內存佔用。本文解決了一個經常被忽視的問題:詞彙層可能導致管道階段之間的計算和內存使用不平衡,加劇了管道氣泡和內存瓶頸。為了應對這一問題,我們將詞彙層均勻地劃分到管道設備上並將計算分組為管道通過。為了減少激活內存開銷,我們提出了幾種算法來減少詞彙層內的通信障礙。此外,我們利用一種通用方法將詞彙並行性與現有的管道計劃集成在一起。通過結合這些技術,我們的方法有效地平衡了計算和參數內存,只有一個很小的恆定激活內存開銷。值得注意的是,當與像V-Half這樣的激活內存平衡計劃結合時,我們的方法在內存和計算方面實現了完美平衡。廣泛的評估表明,我們的方法實現了計算和內存平衡,無論詞彙大小如何,與天真方法相比,吞吐量提高了5%至51%,同時顯著減少了內存使用峰值,特別是對於大詞彙情況。我們的實現已在https://github.com/sail-sg/VocabularyParallelism 開源。
English
Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at https://github.com/sail-sg/VocabularyParallelism .

Summary

AI-Generated Summary

PDF203November 14, 2024