ChatPaper.aiChatPaper

Dolphin:長上下文作為節能型設備上語言模型的新模態

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

August 28, 2024
作者: Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang
cs.AI

摘要

本文介紹了Dolphin,一種新穎的解碼器-解碼器架構,用於在語言模型中高效處理長上下文。我們的方法解決了設備上模型固有的顯著能耗和延遲挑戰。Dolphin採用緊湊的0.5B參數解碼器,將廣泛的上下文信息提煉成記憶嵌入,大幅減少主要的7B參數解碼器模型的輸入長度。受視覺語言模型的啟發,我們重新運用圖像嵌入投影器來編碼長文本上下文,有效地將擴展上下文視為一種獨特的模態。這種創新方法使得能夠處理更長的上下文,而無需處理擴展輸入序列所帶來的典型計算開銷。實證評估顯示,與傳統的全長上下文處理方法相比,能效提高了10倍,延遲減少了5倍,而不會降低回應質量。我們的工作有助於開發更具可持續性和可擴展性的語言模型,以應對資源受限環境中對節能和反應靈敏的人工智能技術的迫切需求,同時保持準確性以理解長上下文。這項研究對自然語言處理的更廣泛領域具有影響,特別是在為資源有限環境設計高效模型方面。通過在邊緣設備上實現更複雜的人工智能功能,Dolphin為在計算資源有限的廣泛應用中實現先進的語言處理鋪平了道路。Dolphin模型可在https://huggingface.co/NexaAIDev/Dolphin 公開獲取。
English
This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input length for the primary 7B parameter decoder model. Inspired by vision-language models, we repurpose the image embedding projector to encode long textual contexts, effectively treating extended context as a distinct modality. This innovative method enables processing of substantially longer contexts without the typical computational overhead associated with extended input sequences. Empirical evaluations demonstrate a 10-fold improvement in energy efficiency and a 5-fold reduction in latency compared to conventional full-length context processing methods without losing quality of the response. Our work contributes to the development of more sustainable and scalable language models for on-device applications, addressing the critical need for energy-efficient and responsive AI technologies in resource-constrained environments while maintaining the accuracy to understand long contexts. This research has implications for the broader field of natural language processing, particularly in the domain of efficient model design for resource-limited settings. By enabling more sophisticated AI capabilities on edge devices, Dolphin paves the way for advanced language processing in a wide range of applications where computational resources are at a premium. The Dolphin model is publicly available at https://huggingface.co/NexaAIDev/Dolphin.

Summary

AI-Generated Summary

PDF434November 16, 2024