ChatPaper.aiChatPaper

壓縮後的大型語言模型能否真正行動?對LLM壓縮中代理能力的實證評估

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

May 26, 2025
作者: Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li
cs.AI

摘要

訓練後壓縮技術降低了大型語言模型(LLMs)的計算和記憶體成本,實現了資源高效部署。然而,現有的壓縮基準僅專注於語言建模(例如,困惑度)和自然語言理解任務(例如,GLUE準確率),忽略了代理能力——工作流程、工具使用/函數調用、長上下文理解以及實際應用。我們引入了代理壓縮基準(ACBench),這是首個全面評估壓縮如何影響LLMs代理能力的基準。ACBench涵蓋(1)跨四種能力的12項任務(例如,WorfBench用於工作流程生成,Needle-in-Haystack用於長上下文檢索),(2)量化(GPTQ, AWQ)和剪枝(Wanda, SparseGPT)技術,以及(3)15種模型,包括小型(Gemma-2B)、標準(Qwen2.5 7B-32B)和蒸餾推理LLMs(DeepSeek-R1-Distill)。我們的實驗揭示了壓縮的權衡:4位元量化保留了工作流程生成和工具使用(下降1%-3%),但實際應用準確率下降了10%-15%。我們引入了ERank、Top-k排名相關性和能量來系統化分析。ACBench為在代理場景中優化LLM壓縮提供了可操作的見解。程式碼可在https://github.com/pprp/ACBench找到。
English
Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.

Summary

AI-Generated Summary

PDF51May 28, 2025