I&S-ViT: ポストトレーニングViT量子化の限界を押し広げるための包括的かつ安定な手法

要旨

ビジョントランスフォーマー（ViT）のスケーラブルな性能にもかかわらず、その高密度な計算コスト（学習と推論）が産業応用における地位を損なっています。ポストトレーニング量子化（PTQ）は、小さなデータセットでViTを調整し、低ビット形式で実行することでコスト問題をうまく解決しますが、残念ながら低ビットの場合にはより大きな性能低下を招きます。本論文では、ViTのPTQを包括的かつ安定的に制御する新しい手法であるI&S-ViTを紹介します。I&S-ViTはまず、ViTのPTQにおける2つの問題を特定します：（1）ポストSoftmax活性化における一般的なlog2量子化器の量子化非効率性；（2）ポストLayerNorm活性化における粗粒度量子化粒度での起伏が大きく増幅された損失ランドスケープ。次に、I&S-ViTはこれらの問題を以下の方法で解決します：（1）シフト機構と均一量子化を組み合わせた新しいshift-uniform-log2量子化器（SULQ）を導入し、包括的なドメイン表現と正確な分布近似を実現する；（2）チャネル単位とレイヤー単位の量子化の長所を融合した3段階のスムーズ最適化戦略（SOS）を採用し、安定した学習を可能にする。多様な視覚タスクにわたる包括的な評価により、I&S-ViTが既存のViTのPTQ手法を凌駕する優位性が確認されました。特に低ビットシナリオでは、I&S-ViTは3ビットViT-Bの性能を印象的な50.68%向上させました。

English

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

I&S-ViT: ポストトレーニングViT量子化の限界を押し広げるための包括的かつ安定な手法

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

要旨

Support