InfiniPot：内存受限的LLM上的无限上下文处理

摘要

在资源受限的环境中，如移动设备，处理长输入上下文对大型语言模型（LLMs）仍然是一个重要挑战。我们的工作旨在通过引入InfiniPot来解决这一限制，这是一个新颖的KV缓存控制框架，旨在使预训练的LLMs能够在固定内存约束下高效地管理广泛的序列，而无需额外的训练。InfiniPot利用连续上下文蒸馏（CCD），这是一个压缩和保留关键信息的迭代过程，通过新颖的重要性度量，有效地保持关键数据，即使没有未来上下文的访问也能实现。我们的全面评估表明，InfiniPot在各种自然语言处理任务中明显优于针对长上下文训练的模型，证实了其有效性和多功能性。这项工作代表了使LLMs适用于更广泛实际场景的重大进展。

English

Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making LLMs applicable to a broader range of real-world scenarios.

InfiniPot：内存受限的LLM上的无限上下文处理

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

摘要

Support