指令引導的自回歸神經網絡參數生成
Instruction-Guided Autoregressive Neural Network Parameter Generation
April 2, 2025
作者: Soro Bedionita, Bruno Andreis, Song Chong, Sung Ju Hwang
cs.AI
摘要
學習根據任務描述和架構規格生成神經網絡參數,對於提升模型適應性和遷移學習能力至關重要。現有方法,尤其是基於擴散模型的方法,存在對大型架構的可擴展性有限、處理不同網絡深度時的僵化性,以及參數生成過程的割裂性,這削弱了層間的一致性。在本研究中,我們提出了IGPG(指令引導參數生成),這是一個自迴歸框架,能夠統一跨多樣任務和架構的參數合成。IGPG利用VQ-VAE和自迴歸模型,根據任務指令、數據集和架構細節生成神經網絡參數。通過自迴歸地生成神經網絡權重的token,IGPG確保了層間的一致性,並實現了跨模型和數據集的高效適應。在token層面操作,IGPG有效地捕捉了從廣泛預訓練模型中聚合而來的複雜參數分佈。在多個視覺數據集上的廣泛實驗表明,IGPG將多樣化的預訓練模型整合到一個靈活的生成框架中。相對於最先進的方法,生成的參數在性能上具有競爭力或更優,尤其是在應用於大型架構時的可擴展性和效率方面。這些結果凸顯了IGPG作為預訓練權重檢索、模型選擇和快速任務特定微調的強大工具的潛力。
English
Learning to generate neural network parameters conditioned on task
descriptions and architecture specifications is pivotal for advancing model
adaptability and transfer learning. Existing methods especially those based on
diffusion models suffer from limited scalability to large architectures,
rigidity in handling varying network depths, and disjointed parameter
generation that undermines inter-layer coherence. In this work, we propose IGPG
(Instruction Guided Parameter Generation), an autoregressive framework that
unifies parameter synthesis across diverse tasks and architectures. IGPG
leverages a VQ-VAE and an autoregressive model to generate neural network
parameters, conditioned on task instructions, dataset, and architecture
details. By autoregressively generating neural network weights' tokens, IGPG
ensures inter-layer coherence and enables efficient adaptation across models
and datasets. Operating at the token level, IGPG effectively captures complex
parameter distributions aggregated from a broad spectrum of pretrained models.
Extensive experiments on multiple vision datasets demonstrate that IGPG
consolidates diverse pretrained models into a single, flexible generative
framework. The synthesized parameters achieve competitive or superior
performance relative to state-of-the-art methods, especially in terms of
scalability and efficiency when applied to large architectures. These results
underscore ICPG potential as a powerful tool for pretrained weight retrieval,
model selection, and rapid task-specific fine-tuning.Summary
AI-Generated Summary