InfiniPot:内存受限的LLM上的无限上下文处理
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
October 2, 2024
作者: Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
cs.AI
摘要
在资源受限的环境中,如移动设备,处理长输入上下文对大型语言模型(LLMs)仍然是一个重要挑战。我们的工作旨在通过引入InfiniPot来解决这一限制,这是一个新颖的KV缓存控制框架,旨在使预训练的LLMs能够在固定内存约束下高效地管理广泛的序列,而无需额外的训练。InfiniPot利用连续上下文蒸馏(CCD),这是一个压缩和保留关键信息的迭代过程,通过新颖的重要性度量,有效地保持关键数据,即使没有未来上下文的访问也能实现。我们的全面评估表明,InfiniPot在各种自然语言处理任务中明显优于针对长上下文训练的模型,证实了其有效性和多功能性。这项工作代表了使LLMs适用于更广泛实际场景的重大进展。
English
Handling long input contexts remains a significant challenge for Large
Language Models (LLMs), particularly in resource-constrained environments such
as mobile devices. Our work aims to address this limitation by introducing
InfiniPot, a novel KV cache control framework designed to enable pre-trained
LLMs to manage extensive sequences within fixed memory constraints efficiently,
without requiring additional training. InfiniPot leverages Continual Context
Distillation (CCD), an iterative process that compresses and retains essential
information through novel importance metrics, effectively maintaining critical
data even without access to future context. Our comprehensive evaluations
indicate that InfiniPot significantly outperforms models trained for long
contexts in various NLP tasks, establishing its efficacy and versatility. This
work represents a substantial advancement toward making LLMs applicable to a
broader range of real-world scenarios.Summary
AI-Generated Summary