QuartDepth:面向边缘设备实时深度估计的训练后量化技术
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
March 20, 2025
作者: Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu
cs.AI
摘要
单目深度估计(MDE)已成为计算机视觉领域的一项关键任务,支撑着众多实际应用。然而,在资源受限的边缘设备,尤其是专用集成电路(ASIC)上部署精确的深度估计模型颇具挑战,这主要源于其高计算与内存需求。尽管基础深度估计技术的最新进展带来了令人瞩目的成果,却进一步加剧了在ASIC上部署的难度。为此,我们提出了QuartDepth,它采用训练后量化技术,结合硬件加速,对MDE模型进行量化处理。我们的方法包括将权重和激活量化为4位精度,从而缩减模型规模并降低计算成本。为减轻性能损失,我们引入了激活精炼与补偿算法,应用于激活量化前后,以及一种权重重建方法,以最小化权重量化中的误差。此外,我们设计了一种灵活且可编程的硬件加速器,通过支持内核融合和定制指令可编程性,提升了吞吐量和效率。实验结果表明,我们的框架在保持竞争力的准确度同时,实现了ASIC上的快速推理与更高能效,弥合了高性能深度估计与实用边缘设备应用之间的鸿沟。代码地址:https://github.com/shawnricecake/quart-depth。
English
Monocular Depth Estimation (MDE) has emerged as a pivotal task in computer
vision, supporting numerous real-world applications. However, deploying
accurate depth estimation models on resource-limited edge devices, especially
Application-Specific Integrated Circuits (ASICs), is challenging due to the
high computational and memory demands. Recent advancements in foundational
depth estimation deliver impressive results but further amplify the difficulty
of deployment on ASICs. To address this, we propose QuartDepth which adopts
post-training quantization to quantize MDE models with hardware accelerations
for ASICs. Our approach involves quantizing both weights and activations to
4-bit precision, reducing the model size and computation cost. To mitigate the
performance degradation, we introduce activation polishing and compensation
algorithm applied before and after activation quantization, as well as a weight
reconstruction method for minimizing errors in weight quantization.
Furthermore, we design a flexible and programmable hardware accelerator by
supporting kernel fusion and customized instruction programmability, enhancing
throughput and efficiency. Experimental results demonstrate that our framework
achieves competitive accuracy while enabling fast inference and higher energy
efficiency on ASICs, bridging the gap between high-performance depth estimation
and practical edge-device applicability. Code:
https://github.com/shawnricecake/quart-depthSummary
AI-Generated Summary