深層神經網絡的不合理無效性

摘要

我們實證研究了一種簡單的層修剪策略，針對流行的開放權重預訓練LLM家族，在移除大部分層（高達一半）之前，在不同的問答基準測試中發現性能幾乎不受損。為了修剪這些模型，我們通過考慮層之間的相似性來識別最佳的層塊進行修剪；然後，為了“修復”損傷，我們進行少量微調。具體來說，我們使用參數高效微調（PEFT）方法，特別是量化和低秩適配器（QLoRA），這樣我們的每個實驗都可以在單個A100 GPU上執行。從實際角度來看，這些結果表明層修剪方法可以補充其他PEFT策略，進一步減少微調的計算資源，同時可以改善推理的記憶體和延遲。從科學角度來看，這些LLM對於刪除層的韌性意味著當前的預訓練方法要麼沒有適當利用網絡較深層的參數，要麼較淺層在存儲知識方面發揮了關鍵作用。

English

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.

深層神經網絡的不合理無效性

The Unreasonable Ineffectiveness of the Deeper Layers

摘要

Support