FocusLLM: 병렬 디코딩을 통한 LLM의 맥락 확장

초록

LLM(Large Language Model)에 유용한 정보를 활용할 수 있는 능력을 부여하는 것은 많은 하위 응용 프로그램에 대해 중요합니다. 그러나 기존의 트랜스포머 아키텍처로 긴 문맥 길이를 달성하는 것은 상당한 교육 및 추론 리소스가 필요합니다. 본 논문에서는 어떤 디코더 전용 LLM의 문맥 길이를 확장할 수 있는 FocusLLM이라는 프레임워크를 제안합니다. 이를 통해 모델이 매우 긴 시퀀스에서 관련 정보에 집중할 수 있습니다. FocusLLM은 모델의 원래 문맥 길이를 기반으로 입력된 긴 텍스트를 청크로 나누어주어 주의 산만 문제를 완화합니다. 그런 다음 각 청크에 로컬 문맥을 프롬프트로 추가하여 새로운 병렬 디코딩 메커니즘을 기반으로 각 청크에서 중요 정보를 추출하고 최종적으로 추출된 정보를 로컬 문맥에 통합합니다. FocusLLM은 훌륭한 교육 효율성과 다재다능성을 갖추고 있습니다. 이전 방법보다 훨씬 적은 교육 비용으로 8K 입력 길이로 훈련된 FocusLLM은 하위 장기 문맥 작업에서 우수한 성능을 보여주며, 매우 긴 텍스트(최대 400K 토큰)를 처리할 때도 강력한 언어 모델링 능력을 유지합니다. 코드는 https://github.com/leezythu/FocusLLM에서 확인할 수 있습니다.

English

Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length to alleviate the issue of attention distraction. Then, it appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism, and ultimately integrates the extracted information into the local context. FocusLLM stands out for great training efficiency and versatility: trained with an 8K input length with much less training cost than previous methods, FocusLLM exhibits superior performance across downstream long-context tasks and maintains strong language modeling ability when handling extensive long texts, even up to 400K tokens. Our code is available at https://github.com/leezythu/FocusLLM.

FocusLLM: 병렬 디코딩을 통한 LLM의 맥락 확장

FocusLLM: Scaling LLM's Context by Parallel Decoding

초록

Support