指示誘導型自己回帰ニューラルネットワークパラメータ生成

要旨

タスク記述とアーキテクチャ仕様に基づいてニューラルネットワークのパラメータを生成する方法を学習することは、モデルの適応性と転移学習を進展させる上で極めて重要です。既存の手法、特に拡散モデルに基づくものは、大規模なアーキテクチャへのスケーラビリティの制限、ネットワークの深さの変化に対する柔軟性の欠如、層間の一貫性を損なう断片的なパラメータ生成といった課題を抱えています。本研究では、IGPG（Instruction Guided Parameter Generation）を提案します。これは、多様なタスクとアーキテクチャにわたるパラメータ合成を統合する自己回帰型フレームワークです。IGPGは、VQ-VAEと自己回帰モデルを活用し、タスク指示、データセット、アーキテクチャの詳細に基づいてニューラルネットワークのパラメータを生成します。ニューラルネットワークの重みのトークンを自己回帰的に生成することで、IGPGは層間の一貫性を確保し、モデルやデータセット間での効率的な適応を可能にします。トークンレベルで動作するIGPGは、幅広い事前学習済みモデルから集約された複雑なパラメータ分布を効果的に捉えます。複数の視覚データセットでの大規模な実験により、IGPGが多様な事前学習済みモデルを単一の柔軟な生成フレームワークに統合できることが実証されました。生成されたパラメータは、特に大規模なアーキテクチャに適用した際のスケーラビリティと効率性の点で、最先端の手法と比較して競争力のある、あるいは優れた性能を達成します。これらの結果は、IGPGが事前学習済みの重みの取得、モデル選択、迅速なタスク固有のファインチューニングのための強力なツールとしての可能性を強調しています。

English

Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-layer coherence. In this work, we propose IGPG (Instruction Guided Parameter Generation), an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. IGPG leverages a VQ-VAE and an autoregressive model to generate neural network parameters, conditioned on task instructions, dataset, and architecture details. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Operating at the token level, IGPG effectively captures complex parameter distributions aggregated from a broad spectrum of pretrained models. Extensive experiments on multiple vision datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework. The synthesized parameters achieve competitive or superior performance relative to state-of-the-art methods, especially in terms of scalability and efficiency when applied to large architectures. These results underscore ICPG potential as a powerful tool for pretrained weight retrieval, model selection, and rapid task-specific fine-tuning.

指示誘導型自己回帰ニューラルネットワークパラメータ生成

Instruction-Guided Autoregressive Neural Network Parameter Generation

要旨

Support