RaBiT:面向精准高效大语言模型的残差感知二值化训练方法
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs
February 5, 2026
作者: Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim
cs.AI
摘要
大型语言模型(LLM)的高效部署需要极端量化技术,这迫使研究者在低比特效率与模型性能之间进行关键权衡。残差二值化方法通过堆叠二元(±1)层实现了硬件友好且无需矩阵乘法的推理,但一直受困于病态特征共适应问题。我们发现了一种关键失效模式——路径间适应:在量化感知训练(QAT)过程中,并行的残差二元路径会学习冗余特征,从而破坏误差补偿结构并限制模型表达能力。现有研究多依赖启发式解决方案(如路径冻结)来约束解空间,而本文提出创新量化框架RaBiT,通过算法强制建立残差层级结构来解决共适应问题。其核心机制是从单一共享全精度权重中顺序推导每个二元路径,确保每条路径都能修正前一条路径的误差。该过程通过稳健初始化实现稳定化,其优先考虑功能保持而非单纯权重近似。RaBiT重新定义了2比特精度-效率边界:在RTX 4090上实现了领先性能,媲美硬件密集型矢量量化(VQ)方法,并将推理速度较全精度模型提升4.49倍。
English
Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference by stacking binary (pm1) layers, but is plagued by pathological feature co-adaptation. We identify a key failure mode, which we term inter-path adaptation: during quantization-aware training (QAT), parallel residual binary paths learn redundant features, degrading the error-compensation structure and limiting the expressive capacity of the model. While prior work relies on heuristic workarounds (e.g., path freezing) that constrain the solution space, we propose RaBiT, a novel quantization framework that resolves co-adaptation by algorithmically enforcing a residual hierarchy. Its core mechanism sequentially derives each binary path from a single shared full-precision weight, which ensures that every path corrects the error of the preceding one. This process is stabilized by a robust initialization that prioritizes functional preservation over mere weight approximation. RaBiT redefines the 2-bit accuracy-efficiency frontier: it achieves state-of-the-art performance, rivals even hardware-intensive Vector Quantization (VQ) methods, and delivers a 4.49times inference speed-up over full-precision models on an RTX 4090.