ChatPaper.aiChatPaper

海豚:长上下文作为一种新的节能设备端语言模型载体

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

August 28, 2024
作者: Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang
cs.AI

摘要

本文介绍了Dolphin,一种用于在语言模型中高效处理长上下文的创新解码器-解码器架构。我们的方法解决了设备上模型固有的显著能耗和延迟挑战。Dolphin采用紧凑的0.5B参数解码器,将广泛的上下文信息提炼为内存嵌入,从而大幅减少主要的7B参数解码器模型的输入长度。受视觉-语言模型启发,我们重新利用图像嵌入投影器来编码长文本上下文,有效地将扩展上下文视为一个独特的模态。这种创新方法使得能够处理大幅更长的上下文,而无需处理扩展输入序列所带来的典型计算开销。实证评估表明,与传统的全长上下文处理方法相比,我们的方法在能效方面提升了10倍,在延迟方面减少了5倍,同时不会降低响应质量。我们的工作有助于开发更具可持续性和可扩展性的设备上语言模型,解决了资源受限环境中对能效高且响应快的人工智能技术的迫切需求,同时保持了理解长上下文的准确性。这项研究对自然语言处理的更广泛领域具有影响,特别是在为资源受限环境设计高效模型方面。通过在边缘设备上实现更复杂的人工智能功能,Dolphin为在计算资源稀缺的各种应用中实现先进的语言处理铺平了道路。Dolphin模型可在https://huggingface.co/NexaAIDev/Dolphin 上公开获取。
English
This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input length for the primary 7B parameter decoder model. Inspired by vision-language models, we repurpose the image embedding projector to encode long textual contexts, effectively treating extended context as a distinct modality. This innovative method enables processing of substantially longer contexts without the typical computational overhead associated with extended input sequences. Empirical evaluations demonstrate a 10-fold improvement in energy efficiency and a 5-fold reduction in latency compared to conventional full-length context processing methods without losing quality of the response. Our work contributes to the development of more sustainable and scalable language models for on-device applications, addressing the critical need for energy-efficient and responsive AI technologies in resource-constrained environments while maintaining the accuracy to understand long contexts. This research has implications for the broader field of natural language processing, particularly in the domain of efficient model design for resource-limited settings. By enabling more sophisticated AI capabilities on edge devices, Dolphin paves the way for advanced language processing in a wide range of applications where computational resources are at a premium. The Dolphin model is publicly available at https://huggingface.co/NexaAIDev/Dolphin.

Summary

AI-Generated Summary

PDF434November 16, 2024