FLIQS: 원샷 혼합 정밀도 부동소수점 및 정수 양자화 탐색

초록

양자화(Quantization)는 현대 딥 뉴럴 네트워크(DNN)의 모델 크기, 계산 요구량, 에너지 소비를 줄이기 위한 주류 압축 기술로 자리 잡았습니다. 최근 하드웨어에서 정수 및 부동소수점의 다양한 변형을 포함한 개선된 수치 지원으로 인해, 낮은 모델 비용으로 고품질 결과를 달성하기 위해 혼합 정밀도(mixed-precision) 양자화가 필수적이 되었습니다. 기존의 혼합 정밀도 양자화 방법은 정확도를 희생하는 사후 학습 양자화 탐색을 수행하거나, 분기로 인해 높은 메모리 사용량을 초래하는 미분 가능한 양자화 탐색을 수행했습니다. 따라서 우리는 정수 및 저정밀도 부동소수점 모델 모두에서 재학습이 필요 없는 최초의 원샷(one-shot) 혼합 정밀도 양자화 탐색을 제안합니다. 우리는 부동소수점 및 정수 양자화 탐색(FLIQS)을 여러 컨볼루션 네트워크와 비전 트랜스포머 모델에서 평가하여 파레토 최적(Pareto-optimal) 모델을 발견합니다. 우리의 접근 방식은 균일 정밀도, 수동 혼합 정밀도, 최근의 정수 양자화 탐색 방법을 개선한 모델을 발견합니다. 제안된 정수 양자화 탐색을 통해, 이전 방법 대비 동일한 모델 비용으로 ImageNet에서 ResNet-18의 정확도를 1.31% 포인트, ResNet-50의 정확도를 0.90% 포인트 향상시켰습니다. 또한, 최초로 새로운 혼합 정밀도 부동소수점 탐색을 탐구하여, 이전 최신 FP8 모델 대비 MobileNetV2의 정확도를 최대 0.98% 포인트 향상시켰습니다. 마지막으로, FLIQS를 확장하여 양자화와 뉴럴 아키텍처 공간을 동시에 탐색하고, MobileNetV2 탐색 공간에서 유사한 모델 비용으로 ImageNet 정확도를 2.69% 포인트 향상시켰습니다.

English

Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With the improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed-precision quantization methods have performed a post-training quantization search, which compromises on accuracy, or a differentiable quantization search, which leads to high memory usage from branching. Therefore, we propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models. We evaluate our floating-point and integer quantization search (FLIQS) on multiple convolutional networks and vision transformer models to discover Pareto-optimal models. Our approach discovers models that improve upon uniform precision, manual mixed-precision, and recent integer quantization search methods. With the proposed integer quantization search, we increase the accuracy of ResNet-18 on ImageNet by 1.31% points and ResNet-50 by 0.90% points with equivalent model cost over previous methods. Additionally, for the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% points compared to prior state-of-the-art FP8 models. Finally, we extend FLIQS to simultaneously search a joint quantization and neural architecture space and improve the ImageNet accuracy by 2.69% points with similar model cost on a MobileNetV2 search space.

FLIQS: 원샷 혼합 정밀도 부동소수점 및 정수 양자화 탐색

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

초록

Support