NeuZip: ニューラルネットワークのダイナミックな圧縮によるメモリ効率の高いトレーニングと推論

要旨

ニューラルネットワークの性能は、より多くのパラメータを使用することで向上します。ただし、モデルサイズは、トレーニングおよび推論中に利用可能なデバイス上のメモリによって制約されます。量子化などの技術を適用することで制約を緩和できますが、性能の低下が問題となります。本研究では、ニューラルネットワーク内の浮動小数点数のエントロピーに基づく新しい重み圧縮スキームであるNeuZipを紹介します。 NeuZipを使用することで、性能を犠牲にすることなく、メモリ効率の良いトレーニングと推論を実現できます。特筆すべきは、Llama-3 8Bモデルのトレーニングのメモリフットプリントを31GBから16GB未満に大幅に削減し、トレーニングダイナミクスを完全に変更せずに維持できる点です。推論では、我々の手法は、ほぼロスのない性能を維持しながら、メモリ使用量を半分以上削減できます。当該コードは公開されています。

English

The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

NeuZip: ニューラルネットワークのダイナミックな圧縮によるメモリ効率の高いトレーニングと推論

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

要旨

Support