AI 효율성 전환: 모델 중심 압축에서 데이터 중심 압축으로

초록

대규모 언어 모델(LLM)과 다중 모달 LLM(MLLM)의 급속한 발전은 역사적으로 수백만에서 수천억에 이르는 파라미터 수의 증가를 통해 모델 중심의 확장에 의존하며 성능 향상을 이끌어 왔다. 그러나 모델 크기에 대한 하드웨어 한계에 근접함에 따라, 지배적인 계산 병목 현상은 초장문 텍스트 컨텍스트, 고해상도 이미지, 그리고 확장된 비디오로 인해 장기간 토큰 시퀀스에 대한 자기 주의(self-attention)의 이차 비용으로 근본적으로 전환되었다. 본 포지션 논문에서 우리는 효율적인 AI 연구의 초점이 모델 중심의 압축에서 데이터 중심의 압축으로 전환되고 있음을 주장한다. 우리는 토큰 압축을 새로운 프론티어로 위치지으며, 이는 모델 학습 또는 추론 중 토큰 수를 줄임으로써 AI 효율성을 향상시킨다. 포괄적인 분석을 통해, 우리는 먼저 다양한 도메인에서의 장기 컨텍스트 AI의 최근 발전을 검토하고, 기존 모델 효율성 전략에 대한 통합된 수학적 프레임워크를 구축하여, 토큰 압축이 장기 컨텍스트 오버헤드를 해결하는 데 있어 중요한 패러다임 전환을 나타내는 이유를 입증한다. 이후, 우리는 토큰 압축의 연구 현황을 체계적으로 검토하며, 그 근본적인 이점을 분석하고 다양한 시나리오에서의 강력한 장점을 식별한다. 더 나아가, 우리는 토큰 압축 연구에서의 현재 도전 과제에 대한 심층 분석을 제공하고, 유망한 미래 방향을 제시한다. 궁극적으로, 우리의 작업은 AI 효율성에 대한 새로운 관점을 제공하고, 기존 연구를 종합하며, 증가하는 컨텍스트 길이가 AI 커뮤니티의 발전에 제기하는 도전 과제를 해결하기 위한 혁신적인 발전을 촉진하는 것을 목표로 한다.

English

The rapid advancement of large language models (LLMs) and multi-modal LLMs (MLLMs) has historically relied on model-centric scaling through increasing parameter counts from millions to hundreds of billions to drive performance gains. However, as we approach hardware limits on model size, the dominant computational bottleneck has fundamentally shifted to the quadratic cost of self-attention over long token sequences, now driven by ultra-long text contexts, high-resolution images, and extended videos. In this position paper, we argue that the focus of research for efficient AI is shifting from model-centric compression to data-centric compression. We position token compression as the new frontier, which improves AI efficiency via reducing the number of tokens during model training or inference. Through comprehensive analysis, we first examine recent developments in long-context AI across various domains and establish a unified mathematical framework for existing model efficiency strategies, demonstrating why token compression represents a crucial paradigm shift in addressing long-context overhead. Subsequently, we systematically review the research landscape of token compression, analyzing its fundamental benefits and identifying its compelling advantages across diverse scenarios. Furthermore, we provide an in-depth analysis of current challenges in token compression research and outline promising future directions. Ultimately, our work aims to offer a fresh perspective on AI efficiency, synthesize existing research, and catalyze innovative developments to address the challenges that increasing context lengths pose to the AI community's advancement.

AI 효율성 전환: 모델 중심 압축에서 데이터 중심 압축으로

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

초록

Support