I&S-ViT:一種推動後訓練 ViTs 量化極限的包容且穩定方法
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
November 16, 2023
作者: Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, Rongrong Ji
cs.AI
摘要
儘管視覺轉換器(ViTs)具有可擴展的性能,但密集的計算成本(訓練和推理)削弱了它們在工業應用中的地位。後訓練量化(PTQ)是一種方法,通過使用微小數據集調整ViTs並以低位格式運行,很好地解決了成本問題,但不幸的是在低位情況下會帶來更多性能下降。在本文中,我們介紹了I&S-ViT,這是一種新方法,以全面且穩定的方式調節ViTs的PTQ。I&S-ViT首先識別了ViTs PTQ中的兩個問題:(1)對於後Softmax激活的主流log2量化器存在量化效率問題;(2)對於後LayerNorm激活的粗粒度量化粒度中存在崎嶇且放大的損失地形。然後,I&S-ViT通過引入以下內容來解決這些問題:(1)一種新型的shift-uniform-log2量化器(SULQ),它結合了一個位移機制,然後是均勻量化,以實現包容性的域表示和準確的分佈逼近;(2)一種三階段平滑優化策略(SOS),它融合了通道和層量化的優勢,以實現穩定的學習。對各種視覺任務的全面評估驗證了I&S-ViT在現有ViTs PTQ方法中的優越性,特別是在低位情況下。例如,I&S-ViT將3位ViT-B的性能提升了令人印象深刻的50.68%。
English
Albeit the scalable performance of vision transformers (ViTs), the dense
computational costs (training & inference) undermine their position in
industrial applications. Post-training quantization (PTQ), tuning ViTs with a
tiny dataset and running in a low-bit format, well addresses the cost issue but
unluckily bears more performance drops in lower-bit cases. In this paper, we
introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an
inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of
ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for
post-Softmax activations; (2) Rugged and magnified loss landscape in
coarse-grained quantization granularity for post-LayerNorm activations. Then,
I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2
quantizer (SULQ) that incorporates a shift mechanism followed by uniform
quantization to achieve both an inclusive domain representation and accurate
distribution approximation; (2) A three-stage smooth optimization strategy
(SOS) that amalgamates the strengths of channel-wise and layer-wise
quantization to enable stable learning. Comprehensive evaluations across
diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs
methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the
performance of 3-bit ViT-B by an impressive 50.68%.