Squeeze-Release: 厳密な構造的最小化を用いた反復プルーニング

要旨

非構造的刈り込みは疎な重みテンソルを生成するが、標準的な実装ではテンソルの形状が変わらないため、展開されたモデルは刈り込み前よりも小さくならない。我々は最小化と呼ぶ厳密な構造的書き換えを提案する。これは、マスクされたネットワークを、浮動小数点の丸め誤差を除いて同一の順伝播関数を持つ、より小さな密なネットワークに変換する。スクイーズ・リリースサイクルは、刈り込みと最小化を反復し、中間にリリースステップを設ける。このリリースステップでは、圧縮されたテンソル内の正確にゼロの位置を、小さな較正ノイズとして再び有効化し、さもなければ無駄になる容量を訓練可能なパラメータに戻す。連続するサイクルはその容量を利用して、単一パスのみでは到達できない構造的冗長性を見つける。さらに、CompensatedLayerNormを導入する。これは、LayerNormの関数保存型の代替であり、LayerNormを備えた残差ストリームにおけるチャネル削減に最小化を拡張する。スクイーズ・リリースは、全結合モデルネットワークでは非刈り込みモデルと比較して展開可能なネットワークを39倍、現代的なCNN（ConvNeXt-Tiny）では14.8倍に圧縮し、同等の精度を達成する。さらに、この書き換えがトランスフォーマーアーキテクチャにも拡張可能であることを証明する。

English

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.