MLPの形状に関する慣習の再考

要旨

従来の多層パーセプトロン（MLP）は、入力/出力次元でスキップ接続が機能し、拡張された隠れ空間で処理が行われる「狭い-広い-狭い」設計を採用しています。本研究ではこの慣習に挑戦し、拡張次元でスキップ接続が機能し、残差計算が狭いボトルネックを通過する「広い-狭い-広い」（砂時計型）MLPブロックを提案します。この逆転設計により、高次元空間を段階的な改良に活用しつつ、パラメータ数を一致させた設計を通じて計算効率を維持します。砂時計型MLPを実装するには、入力信号を拡張次元に引き上げる初期投影が必要です。この投影はランダム初期化のまま訓練全体を通じて固定できることを提案し、効率的な訓練と推論の実装を可能にします。両アーキテクチャを人気のある画像データセットでの生成タスクで評価し、体系的なアーキテクチャ探索を通じて性能-パラメータのパレートフロンティアを特徴付けます。結果は、砂時計型アーキテクチャが従来の設計と比較して一貫して優れたパレートフロンティアを達成することを示しています。パラメータ予算が増加するにつれ、最適な砂時計型構成は、より深いネットワークとより広いスキップ接続、より狭いボトルネックを好む傾向にあります。これは従来のMLPとは異なるスケーリングパターンです。本研究の知見は、現代のアーキテクチャにおけるスキップ接続の配置を見直す必要性を示唆しており、Transformerやその他の残差ネットワークへの応用可能性も示唆しています。

English

Multi-layer perceptrons (MLPs) conventionally follow a narrow-wide-narrow design where skip connections operate at the input/output dimensions while processing occurs in expanded hidden spaces. We challenge this convention by proposing wide-narrow-wide (Hourglass) MLP blocks where skip connections operate at expanded dimensions while residual computation flows through narrow bottlenecks. This inversion leverages higher-dimensional spaces for incremental refinement while maintaining computational efficiency through parameter-matched designs. Implementing Hourglass MLPs requires an initial projection to lift input signals to expanded dimensions. We propose that this projection can remain fixed at random initialization throughout training, enabling efficient training and inference implementations. We evaluate both architectures on generative tasks over popular image datasets, characterizing performance-parameter Pareto frontiers through systematic architectural search. Results show that Hourglass architectures consistently achieve superior Pareto frontiers compared to conventional designs. As parameter budgets increase, optimal Hourglass configurations favor deeper networks with wider skip connections and narrower bottlenecks-a scaling pattern distinct from conventional MLPs. Our findings suggest reconsidering skip connection placement in modern architectures, with potential applications extending to Transformers and other residual networks.

MLPの形状に関する慣習の再考

Rethinking the shape convention of an MLP

要旨

Support