将AI效率从模型中心压缩转向数据中心压缩
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
May 25, 2025
作者: Xuyang Liu, Zichen Wen, Shaobo Wang, Junjie Chen, Zhishan Tao, Yubo Wang, Xiangqi Jin, Chang Zou, Yiyu Wang, Chenfei Liao, Xu Zheng, Honggang Chen, Weijia Li, Xuming Hu, Conghui He, Linfeng Zhang
cs.AI
摘要
大型语言模型(LLMs)与多模态大型语言模型(MLLMs)的快速发展,历来依赖于通过将参数量从数百万提升至数千亿来实现以模型为中心的规模扩展,以此推动性能提升。然而,随着我们触及模型规模的硬件极限,主导的计算瓶颈已从根本上转向了长序列自注意力机制的二次方成本,这一现象如今由超长文本上下文、高分辨率图像及延长视频所驱动。在本立场论文中,我们主张高效人工智能的研究重心正从以模型为中心的压缩转向以数据为中心的压缩。我们将令牌压缩定位为新前沿,它通过减少模型训练或推理过程中的令牌数量来提升AI效率。通过全面分析,我们首先审视了各领域内长上下文AI的最新进展,并为现有的模型效率策略建立了一个统一的数学框架,阐明了为何令牌压缩在应对长上下文开销方面代表了一次关键的范式转变。随后,我们系统性地回顾了令牌压缩的研究现状,分析了其根本优势,并识别了其在多种场景下的显著优点。此外,我们深入剖析了当前令牌压缩研究面临的挑战,并勾勒了未来发展的光明方向。最终,我们的工作旨在为AI效率提供新视角,整合现有研究,并激发创新性发展,以应对日益增长的上下文长度对AI社区进步带来的挑战。
English
The rapid advancement of large language models (LLMs) and multi-modal LLMs
(MLLMs) has historically relied on model-centric scaling through increasing
parameter counts from millions to hundreds of billions to drive performance
gains. However, as we approach hardware limits on model size, the dominant
computational bottleneck has fundamentally shifted to the quadratic cost of
self-attention over long token sequences, now driven by ultra-long text
contexts, high-resolution images, and extended videos. In this position paper,
we argue that the focus of research for efficient AI is shifting from
model-centric compression to data-centric compression. We position token
compression as the new frontier, which improves AI efficiency via reducing the
number of tokens during model training or inference. Through comprehensive
analysis, we first examine recent developments in long-context AI across
various domains and establish a unified mathematical framework for existing
model efficiency strategies, demonstrating why token compression represents a
crucial paradigm shift in addressing long-context overhead. Subsequently, we
systematically review the research landscape of token compression, analyzing
its fundamental benefits and identifying its compelling advantages across
diverse scenarios. Furthermore, we provide an in-depth analysis of current
challenges in token compression research and outline promising future
directions. Ultimately, our work aims to offer a fresh perspective on AI
efficiency, synthesize existing research, and catalyze innovative developments
to address the challenges that increasing context lengths pose to the AI
community's advancement.Summary
AI-Generated Summary