돌핀: 에너지 효율적인 온-디바이스 언어 모델을 위한 새로운 모달리티로서의 긴 문맥

초록

본 논문은 Dolphin이라는 새로운 디코더-디코더 아키텍처를 제시하여 언어 모델에서 장거리 컨텍스트를 에너지 효율적으로 처리하는 방법을 소개합니다. 저희의 접근 방식은 장치 내 모델에서 내재된 상당한 에너지 소비와 지연 도전에 대응합니다. Dolphin은 콤팩트한 0.5B 파라미터 디코더를 사용하여 방대한 문맥 정보를 메모리 임베딩으로 요약함으로써 주요 7B 파라미터 디코더 모델의 입력 길이를 크게 줄입니다. 시각-언어 모델에서 영감을 받아 이미지 임베딩 프로젝터를 재활용하여 긴 텍스트 컨텍스트를 인코딩하며, 확장된 컨텍스트를 별도의 모달리티로 효과적으로 처리합니다. 이 혁신적인 방법은 일반적으로 확장된 입력 시퀀스와 관련된 전형적인 계산 오버헤드 없이 상당히 긴 컨텍스트를 처리할 수 있게 합니다. 경험적 평가 결과, 일반적인 전체 길이 컨텍스트 처리 방법과 비교하여 에너지 효율성이 10배 향상되고 지연 시간이 5배 줄어듭니다. 이를 통해 에너지 효율적이고 반응성 있는 AI 기술이 자원 제한적 환경에서 필요한 정확도를 유지하면서 장거리 컨텍스트를 이해하는 데 기여합니다. 본 연구는 에너지 효율적이고 반응성 있는 AI 기술이 자원 제한적 환경을 위한 효율적인 모델 설계 분야에서 더 지속 가능하고 확장 가능한 언어 모델의 발전에 기여합니다. 엣지 장치에서 더 정교한 AI 능력을 가능하게 함으로써 Dolphin은 계산 자원이 귀중한 다양한 응용 분야에서 고급 언어 처리를 위한 길을 열어줍니다. Dolphin 모델은 https://huggingface.co/NexaAIDev/Dolphin에서 공개적으로 이용 가능합니다.

English

This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input length for the primary 7B parameter decoder model. Inspired by vision-language models, we repurpose the image embedding projector to encode long textual contexts, effectively treating extended context as a distinct modality. This innovative method enables processing of substantially longer contexts without the typical computational overhead associated with extended input sequences. Empirical evaluations demonstrate a 10-fold improvement in energy efficiency and a 5-fold reduction in latency compared to conventional full-length context processing methods without losing quality of the response. Our work contributes to the development of more sustainable and scalable language models for on-device applications, addressing the critical need for energy-efficient and responsive AI technologies in resource-constrained environments while maintaining the accuracy to understand long contexts. This research has implications for the broader field of natural language processing, particularly in the domain of efficient model design for resource-limited settings. By enabling more sophisticated AI capabilities on edge devices, Dolphin paves the way for advanced language processing in a wide range of applications where computational resources are at a premium. The Dolphin model is publicly available at https://huggingface.co/NexaAIDev/Dolphin.

돌핀: 에너지 효율적인 온-디바이스 언어 모델을 위한 새로운 모달리티로서의 긴 문맥

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

초록

Support