LoGAH:使用具有1/100參數的圖形超網路來預測擁有7.74億參數的Transformer
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
May 25, 2024
作者: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu
cs.AI
摘要
深度學習模型的良好初始化至關重要,因為它有助於使模型更好地且更快速地收斂。然而,對許多研究人員來說,預訓練大型模型是負擔不起的,這使得對初始參數進行理想預測變得更加必要。圖形超網絡(GHNs)是一種預測模型參數的方法,最近在初始化大視覺模型方面表現出色。不幸的是,預測非常寬的網絡的參數依賴於多次複製小塊參數,並且需要極其龐大的參數來支持完整預測,這在實踐中大大阻礙了其應用。為了解決這一限制,我們提出了LoGAH(低秩圖形超網絡),這是一種具有低秩參數解碼器的GHN,可擴展到更寬的網絡,而不需要像以前的嘗試那樣極度增加參數。LoGAH使我們能夠以節省內存的方式預測774百萬個大型神經網絡的參數。我們展示了使用LoGAH初始化的視覺和語言模型(即ViT和GPT-2)比隨機初始化或使用現有超網絡實現更好的性能。此外,我們展示了關於在小數據集上訓練LoGAH並使用預測的參數來初始化更大任務的有前途的遷移學習結果。我們在https://github.com/Blackzxy/LoGAH 提供了代碼。
English
A good initialization of deep learning models is essential since it can help
them converge better and faster. However, pretraining large models is
unaffordable for many researchers, which makes a desired prediction for initial
parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to
predicting model parameters, have recently shown strong performance in
initializing large vision models. Unfortunately, predicting parameters of very
wide networks relies on copying small chunks of parameters multiple times and
requires an extremely large number of parameters to support full prediction,
which greatly hinders its adoption in practice. To address this limitation, we
propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter
decoder that expands to significantly wider networks without requiring as
excessive increase of parameters as in previous attempts. LoGAH allows us to
predict the parameters of 774-million large neural networks in a
memory-efficient manner. We show that vision and language models (i.e., ViT and
GPT-2) initialized with LoGAH achieve better performance than those initialized
randomly or using existing hypernetworks. Furthermore, we show promising
transfer learning results w.r.t. training LoGAH on small datasets and using the
predicted parameters to initialize for larger tasks. We provide the codes in
https://github.com/Blackzxy/LoGAH .Summary
AI-Generated Summary