ChatPaper.aiChatPaper

LoGAH:使用1/100参数的图超网络预测774百万参数的Transformer

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

May 25, 2024
作者: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu
cs.AI

摘要

深度学习模型的良好初始化至关重要,因为它可以帮助模型更好地、更快地收敛。然而,对许多研究人员来说,预训练大型模型是难以承受的,这使得如今对初始参数的期望预测变得更加必要。图形超网络(GHNs)是一种预测模型参数的方法,最近在初始化大型视觉模型方面表现出强大性能。然而,预测非常宽网络的参数依赖于多次复制小块参数,并且需要极其庞大的参数数量来支持完整预测,这严重阻碍了其在实践中的采用。为了解决这一局限性,我们提出了LoGAH(低秩图形超网络),这是一个带有低秩参数解码器的GHN,可以扩展到更宽的网络,而无需像以前那样过度增加参数。LoGAH使我们能够以内存高效的方式预测774百万规模的大型神经网络的参数。我们展示了使用LoGAH初始化的视觉和语言模型(即ViT和GPT-2)比随机初始化或使用现有超网络获得了更好的性能。此外,我们展示了关于在小数据集上训练LoGAH并使用预测参数初始化更大任务的有希望的迁移学习结果。我们在 https://github.com/Blackzxy/LoGAH 提供了代码。
English
A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous attempts. LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner. We show that vision and language models (i.e., ViT and GPT-2) initialized with LoGAH achieve better performance than those initialized randomly or using existing hypernetworks. Furthermore, we show promising transfer learning results w.r.t. training LoGAH on small datasets and using the predicted parameters to initialize for larger tasks. We provide the codes in https://github.com/Blackzxy/LoGAH .

Summary

AI-Generated Summary

PDF112December 12, 2024