二元和三元自然語言生成
Binary and Ternary Natural Language Generation
June 2, 2023
作者: Zechun Liu, Barlas Oguz, Aasish Pappu, Yangyang Shi, Raghuraman Krishnamoorthi
cs.AI
摘要
三元和二元神經網絡實現無需乘法運算,如果在專用硬體上實現,可以比全精度網絡帶來數量級的效率提升。然而,由於參數和輸出空間都高度離散化,這類網絡很難進行優化。對於變壓器文本生成模型這一類型的網絡,困難更加嚴重,因為注意力操作對量化非常敏感,自回歸解碼在高基數輸出空間中產生的噪聲效應也會加劇問題。我們通過統計為基礎的權重量化和激活的彈性量化來解決這個問題,並展示了首個三元和二元變壓器模型在總結和機器翻譯的下游任務上的應用。我們的三元 BART 基礎模型在 CNN/DailyMail 基準測試中取得了 41 的 R1 分數,僅比完整模型低 3.9 分,但效率提高了 16 倍。我們的二元模型雖然精度較低,但取得了 35.6 的高度可觀分數。在機器翻譯方面,我們在 WMT16 En-Ro 基準測試中取得了 21.7 和 17.6 的 BLEU 分數,而完整精度的 mBART 模型分數為 26.8。我們還在 8 位激活設置中比較了我們的方法,在這種設置下,我們的三元和甚至二元權重模型可以匹敵或優於文獻中最佳的 8 位權重模型。我們的代碼和模型可在以下鏈接找到:https://github.com/facebookresearch/Ternary_Binary_Transformer
English
Ternary and binary neural networks enable multiplication-free computation and
promise multiple orders of magnitude efficiency gains over full-precision
networks if implemented on specialized hardware. However, since both the
parameter and the output space are highly discretized, such networks have
proven very difficult to optimize. The difficulties are compounded for the
class of transformer text generation models due to the sensitivity of the
attention operation to quantization and the noise-compounding effects of
autoregressive decoding in the high-cardinality output space. We approach the
problem with a mix of statistics-based quantization for the weights and elastic
quantization of the activations and demonstrate the first ternary and binary
transformer models on the downstream tasks of summarization and machine
translation. Our ternary BART base achieves an R1 score of 41 on the
CNN/DailyMail benchmark, which is merely 3.9 points behind the full model while
being 16x more efficient. Our binary model, while less accurate, achieves a
highly non-trivial score of 35.6. For machine translation, we achieved BLEU
scores of 21.7 and 17.6 on the WMT16 En-Ro benchmark, compared with a full
precision mBART model score of 26.8. We also compare our approach in the 8-bit
activation setting, where our ternary and even binary weight models can match
or outperform the best existing 8-bit weight models in the literature. Our code
and models are available at:
https://github.com/facebookresearch/Ternary_Binary_Transformer