ChatPaper.aiChatPaper

SEAP:免訓練的稀疏專家激活修剪 釋放大語言模型的智慧潛能

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

March 10, 2025
作者: Xun Liang, Hanyu Wang, Huayi Lai, Simin Niu, Shichao Song, Jiawei Yang, Jihao Zhao, Feiyu Xiong, Bo Tang, Zhiyu Li
cs.AI

摘要

大型語言模型在各種自然語言處理任務中取得了顯著成功,然而其推理過程中的高計算成本仍是主要瓶頸。本文介紹了稀疏專家激活剪枝(SEAP),這是一種無需訓練的剪枝方法,選擇性地保留任務相關參數以減少推理開銷。受大型語言模型中隱藏狀態和激活的聚類模式啟發,SEAP識別出任務特定的專家激活模式,並在保持任務性能的同時提升計算效率。實驗結果表明,SEAP顯著降低了計算開銷,同時保持了競爭性的準確率。值得注意的是,在50%的剪枝率下,SEAP超越了WandA和FLAP超過20%,而在20%的剪枝率下,與密集模型相比僅有2.2%的性能下降。這些發現凸顯了SEAP的可擴展性和有效性,使其成為優化大規模語言模型的一種有前景的方法。
English
Large Language Models have achieved remarkable success across various natural language processing tasks, yet their high computational cost during inference remains a major bottleneck. This paper introduces Sparse Expert Activation Pruning (SEAP), a training-free pruning method that selectively retains task-relevant parameters to reduce inference overhead. Inspired by the clustering patterns of hidden states and activations in LLMs, SEAP identifies task-specific expert activation patterns and prunes the model while preserving task performance and enhancing computational efficiency. Experimental results demonstrate that SEAP significantly reduces computational overhead while maintaining competitive accuracy. Notably, at 50% pruning, SEAP surpasses both WandA and FLAP by over 20%, and at 20% pruning, it incurs only a 2.2% performance drop compared to the dense model. These findings highlight SEAP's scalability and effectiveness, making it a promising approach for optimizing large-scale LLMs.

Summary

AI-Generated Summary

PDF681March 11, 2025