ChatPaper.aiChatPaper

NeuZip:神经网络的动态压缩实现高效训练和推断

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

October 28, 2024
作者: Yongchang Hao, Yanshuai Cao, Lili Mou
cs.AI

摘要

当使用更多参数时,神经网络的性能会提高。然而,在训练和推断期间,模型大小受可用的设备内存限制。尽管应用诸如量化等技术可以缓解这种限制,但它们会导致性能下降。在这项工作中,我们介绍了一种名为NeuZip的新的权重压缩方案,该方案基于神经网络中浮点数的熵。通过NeuZip,我们能够实现内存高效的训练和推断,而不会牺牲性能。值得注意的是,我们将训练Llama-3 8B模型的内存占用从31GB显著减少到不到16GB,同时保持训练动态完全不变。在推断中,我们的方法可以将内存使用量减少一半以上,同时保持接近无损的性能。我们的代码已公开发布。
English
The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.
PDF172November 13, 2024