二値および三値自然言語生成

要旨

三値および二値ニューラルネットワークは、乗算を必要としない計算を可能にし、専用ハードウェア上で実装された場合、完全精度のネットワークに比べて複数桁の効率向上が期待されます。しかし、パラメータ空間と出力空間の両方が高度に離散化されているため、このようなネットワークの最適化は非常に困難であることが証明されています。この困難は、トランスフォーマーテキスト生成モデルのクラスにおいてさらに深刻です。これは、量子化に対するアテンション操作の感度と、高次元出力空間における自己回帰デコーディングのノイズ累積効果によるものです。私たちはこの問題に、統計ベースの重み量子化と活性化の弾力的量子化を組み合わせてアプローチし、要約と機械翻訳の下流タスクにおいて初めての三値および二値トランスフォーマーモデルを実証しました。私たちの三値BARTベースモデルは、CNN/DailyMailベンチマークでR1スコア41を達成し、完全モデルに比べてわずか3.9ポイント低いだけで、16倍の効率性を実現しました。二値モデルは精度が低いものの、35.6という非常に重要なスコアを達成しました。機械翻訳では、WMT16 En-RoベンチマークでBLEUスコア21.7と17.6を達成し、完全精度のmBARTモデルのスコア26.8と比較しました。また、8ビット活性化設定においても私たちのアプローチを比較し、三値および二値重みモデルが文献中の既存の最良の8ビット重みモデルに匹敵またはそれを上回る性能を示しました。私たちのコードとモデルは以下で公開されています： https://github.com/facebookresearch/Ternary_Binary_Transformer

English

Ternary and binary neural networks enable multiplication-free computation and promise multiple orders of magnitude efficiency gains over full-precision networks if implemented on specialized hardware. However, since both the parameter and the output space are highly discretized, such networks have proven very difficult to optimize. The difficulties are compounded for the class of transformer text generation models due to the sensitivity of the attention operation to quantization and the noise-compounding effects of autoregressive decoding in the high-cardinality output space. We approach the problem with a mix of statistics-based quantization for the weights and elastic quantization of the activations and demonstrate the first ternary and binary transformer models on the downstream tasks of summarization and machine translation. Our ternary BART base achieves an R1 score of 41 on the CNN/DailyMail benchmark, which is merely 3.9 points behind the full model while being 16x more efficient. Our binary model, while less accurate, achieves a highly non-trivial score of 35.6. For machine translation, we achieved BLEU scores of 21.7 and 17.6 on the WMT16 En-Ro benchmark, compared with a full precision mBART model score of 26.8. We also compare our approach in the 8-bit activation setting, where our ternary and even binary weight models can match or outperform the best existing 8-bit weight models in the literature. Our code and models are available at: https://github.com/facebookresearch/Ternary_Binary_Transformer

二値および三値自然言語生成

Binary and Ternary Natural Language Generation

要旨

Support