NeuZip：神经网络的动态压缩实现高效训练和推断

摘要

当使用更多参数时，神经网络的性能会提高。然而，在训练和推断期间，模型大小受可用的设备内存限制。尽管应用诸如量化等技术可以缓解这种限制，但它们会导致性能下降。在这项工作中，我们介绍了一种名为NeuZip的新的权重压缩方案，该方案基于神经网络中浮点数的熵。通过NeuZip，我们能够实现内存高效的训练和推断，而不会牺牲性能。值得注意的是，我们将训练Llama-3 8B模型的内存占用从31GB显著减少到不到16GB，同时保持训练动态完全不变。在推断中，我们的方法可以将内存使用量减少一半以上，同时保持接近无损的性能。我们的代码已公开发布。

English

The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

NeuZip：神经网络的动态压缩实现高效训练和推断

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

摘要

Support