NeuZip:神经网络的动态压缩实现高效训练和推断
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
October 28, 2024
作者: Yongchang Hao, Yanshuai Cao, Lili Mou
cs.AI
摘要
当使用更多参数时,神经网络的性能会提高。然而,在训练和推断期间,模型大小受可用的设备内存限制。尽管应用诸如量化等技术可以缓解这种限制,但它们会导致性能下降。在这项工作中,我们介绍了一种名为NeuZip的新的权重压缩方案,该方案基于神经网络中浮点数的熵。通过NeuZip,我们能够实现内存高效的训练和推断,而不会牺牲性能。值得注意的是,我们将训练Llama-3 8B模型的内存占用从31GB显著减少到不到16GB,同时保持训练动态完全不变。在推断中,我们的方法可以将内存使用量减少一半以上,同时保持接近无损的性能。我们的代码已公开发布。
English
The performance of neural networks improves when more parameters are used.
However, the model sizes are constrained by the available on-device memory
during training and inference. Although applying techniques like quantization
can alleviate the constraint, they suffer from performance degradation. In this
work, we introduce NeuZip, a new weight compression scheme based on the entropy
of floating-point numbers in neural networks. With NeuZip, we are able to
achieve memory-efficient training and inference without sacrificing
performance. Notably, we significantly reduce the memory footprint of training
a Llama-3 8B model from 31GB to less than 16GB, while keeping the training
dynamics fully unchanged. In inference, our method can reduce memory usage by
more than half while maintaining near-lossless performance. Our code is
publicly available.