ChatPaper.aiChatPaper

压缩后的大语言模型能否真正行动?大语言模型压缩中代理能力的实证评估

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

May 26, 2025
作者: Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li
cs.AI

摘要

后训练压缩技术旨在降低大型语言模型(LLMs)的计算与内存开销,从而实现资源高效部署。然而,现有的压缩基准测试仅聚焦于语言建模(如困惑度)和自然语言理解任务(如GLUE准确率),忽视了模型在代理能力方面的表现——包括工作流生成、工具使用/函数调用、长上下文理解及实际应用。为此,我们推出了首个全面评估压缩对LLMs代理能力影响的基准测试——代理压缩基准(ACBench)。ACBench涵盖:(1) 四大能力维度下的12项任务(例如,工作流生成的WorfBench、长上下文检索的Needle-in-Haystack),(2) 量化(GPTQ、AWQ)与剪枝(Wanda、SparseGPT)技术,以及(3) 15个模型,从小型(Gemma-2B)、标准(Qwen2.5 7B-32B)到蒸馏推理LLMs(DeepSeek-R1-Distill)。实验揭示了压缩的权衡:4位量化虽能保持工作流生成与工具使用能力(仅下降1%-3%),却使实际应用准确率降低10%-15%。我们引入ERank、Top-k排序相关性与能量指标以系统化分析。ACBench为优化代理场景下的LLM压缩提供了可操作的洞见。代码已发布于https://github.com/pprp/ACBench。
English
Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.

Summary

AI-Generated Summary

PDF51May 28, 2025