I&S-ViT: Eine inklusive und stabile Methode zur Erweiterung der Grenzen der Post-Training-Quantisierung von Vision Transformern

papers.abstract

Obwohl Vision Transformer (ViTs) eine skalierbare Leistung bieten, untergraben die hohen Rechenkosten (Training und Inferenz) ihre Position in industriellen Anwendungen. Die Post-Training-Quantisierung (PTQ), bei der ViTs mit einem kleinen Datensatz angepasst und in einem niedrigen Bit-Format ausgeführt werden, löst das Kostenproblem gut, führt jedoch leider zu stärkeren Leistungseinbußen in niedrigeren Bit-Fällen. In diesem Artikel stellen wir I&S-ViT vor, eine neuartige Methode, die die PTQ von ViTs auf inklusive und stabile Weise reguliert. I&S-ViT identifiziert zunächst zwei Probleme in der PTQ von ViTs: (1) Ineffizienz der Quantisierung im weit verbreiteten log2-Quantisierer für Post-Softmax-Aktivierungen; (2) Unebene und verstärkte Verlustlandschaft bei grobkörniger Quantisierungsgranularität für Post-LayerNorm-Aktivierungen. Anschließend adressiert I&S-ViT diese Probleme durch die Einführung von: (1) Einem neuartigen Shift-Uniform-log2-Quantisierer (SULQ), der einen Verschiebungsmechanismus gefolgt von einer gleichmäßigen Quantisierung integriert, um sowohl eine inklusive Domänendarstellung als auch eine genaue Verteilungsapproximation zu erreichen; (2) Eine dreistufige glatte Optimierungsstrategie (SOS), die die Stärken der kanalweisen und schichtweisen Quantisierung kombiniert, um stabiles Lernen zu ermöglichen. Umfassende Bewertungen über verschiedene Vision-Aufgaben bestätigen die Überlegenheit von I&S-ViT gegenüber bestehenden PTQ-Methoden für ViTs, insbesondere in niedrigen Bit-Szenarien. Beispielsweise steigert I&S-ViT die Leistung von 3-Bit-ViT-B um beeindruckende 50,68 %.

English

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

I&S-ViT: Eine inklusive und stabile Methode zur Erweiterung der Grenzen der Post-Training-Quantisierung von Vision Transformern

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

papers.abstract

Support