ChatPaper.aiChatPaper

LoGAH:使用具有1/100參數的圖形超網路來預測擁有7.74億參數的Transformer

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

May 25, 2024
作者: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu
cs.AI

摘要

深度學習模型的良好初始化至關重要,因為它有助於使模型更好地且更快速地收斂。然而,對許多研究人員來說,預訓練大型模型是負擔不起的,這使得對初始參數進行理想預測變得更加必要。圖形超網絡(GHNs)是一種預測模型參數的方法,最近在初始化大視覺模型方面表現出色。不幸的是,預測非常寬的網絡的參數依賴於多次複製小塊參數,並且需要極其龐大的參數來支持完整預測,這在實踐中大大阻礙了其應用。為了解決這一限制,我們提出了LoGAH(低秩圖形超網絡),這是一種具有低秩參數解碼器的GHN,可擴展到更寬的網絡,而不需要像以前的嘗試那樣極度增加參數。LoGAH使我們能夠以節省內存的方式預測774百萬個大型神經網絡的參數。我們展示了使用LoGAH初始化的視覺和語言模型(即ViT和GPT-2)比隨機初始化或使用現有超網絡實現更好的性能。此外,我們展示了關於在小數據集上訓練LoGAH並使用預測的參數來初始化更大任務的有前途的遷移學習結果。我們在https://github.com/Blackzxy/LoGAH 提供了代碼。
English
A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous attempts. LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner. We show that vision and language models (i.e., ViT and GPT-2) initialized with LoGAH achieve better performance than those initialized randomly or using existing hypernetworks. Furthermore, we show promising transfer learning results w.r.t. training LoGAH on small datasets and using the predicted parameters to initialize for larger tasks. We provide the codes in https://github.com/Blackzxy/LoGAH .

Summary

AI-Generated Summary

PDF112December 12, 2024