ChatPaper.aiChatPaper

QuartDepth:面向邊緣設備即時深度估計的訓練後量化技術

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

March 20, 2025
作者: Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu
cs.AI

摘要

單目深度估計(Monocular Depth Estimation, MDE)已成為計算機視覺領域的關鍵任務,支撐著眾多實際應用。然而,在資源受限的邊緣設備,尤其是專用集成電路(Application-Specific Integrated Circuits, ASICs)上部署精確的深度估計模型,由於其高計算和內存需求而面臨挑戰。近期基礎深度估計技術的進展雖取得了令人矚目的成果,卻進一步加大了在ASICs上部署的難度。為此,我們提出了QuartDepth,該方法採用訓練後量化技術,結合硬件加速來量化MDE模型,使其適用於ASICs。我們的方法包括將權重和激活值量化至4位精度,從而縮減模型規模並降低計算成本。為減少量化帶來的性能損失,我們引入了激活值精煉與補償算法,應用於激活值量化前後,以及一種權重重建方法,以最小化權重量化中的誤差。此外,我們設計了一款靈活且可編程的硬件加速器,支持內核融合和定制指令可編程性,從而提升吞吐量和效率。實驗結果表明,我們的框架在保持競爭力精度的同時,實現了ASICs上的快速推理和更高能效,彌合了高性能深度估計與實際邊緣設備應用之間的鴻溝。代碼見:https://github.com/shawnricecake/quart-depth
English
Monocular Depth Estimation (MDE) has emerged as a pivotal task in computer vision, supporting numerous real-world applications. However, deploying accurate depth estimation models on resource-limited edge devices, especially Application-Specific Integrated Circuits (ASICs), is challenging due to the high computational and memory demands. Recent advancements in foundational depth estimation deliver impressive results but further amplify the difficulty of deployment on ASICs. To address this, we propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. To mitigate the performance degradation, we introduce activation polishing and compensation algorithm applied before and after activation quantization, as well as a weight reconstruction method for minimizing errors in weight quantization. Furthermore, we design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability, enhancing throughput and efficiency. Experimental results demonstrate that our framework achieves competitive accuracy while enabling fast inference and higher energy efficiency on ASICs, bridging the gap between high-performance depth estimation and practical edge-device applicability. Code: https://github.com/shawnricecake/quart-depth

Summary

AI-Generated Summary

PDF02March 25, 2025