장기 문맥 언어 모델링에 대한 포괄적 조사

초록

긴 문맥의 효율적 처리는 자연어 처리 분야에서 지속적으로 추구되어 온 과제입니다. 장문의 문서, 대화 및 기타 텍스트 데이터가 점점 더 많아짐에 따라, 광범위한 입력을 효과적이고 효율적으로 처리하고 분석할 수 있는 장문맥 언어 모델(Long Context Language Models, LCLMs)을 개발하는 것이 중요해졌습니다. 본 논문에서는 대규모 언어 모델을 위한 장문맥 모델링의 최신 연구 동향을 포괄적으로 조사합니다. 우리의 조사는 세 가지 핵심 측면을 중심으로 구성됩니다: 효과적이고 효율적인 LCLMs를 얻는 방법, LCLMs를 효율적으로 훈련하고 배포하는 방법, 그리고 LCLMs를 포괄적으로 평가하고 분석하는 방법. 첫 번째 측면에서는 장문맥 처리를 위한 데이터 전략, 아키텍처 설계 및 워크플로 접근 방식을 논의합니다. 두 번째 측면에서는 LCLM 훈련 및 추론에 필요한 인프라를 자세히 검토합니다. 세 번째 측면에서는 장문맥 이해와 장문 생성에 대한 평가 패러다임과 LCLMs의 행동 분석 및 메커니즘 해석 가능성을 제시합니다. 이 세 가지 핵심 측면을 넘어, 기존 LCLMs가 배포된 다양한 응용 시나리오를 철저히 탐구하고 미래의 유망한 발전 방향을 제시합니다. 본 조사는 장문맥 LLMs에 관한 최신 문헌을 최신 상태로 검토하며, 연구자와 엔지니어 모두에게 유용한 자료가 되기를 바랍니다. 최신 논문과 저장소를 수집한 관련 GitHub 저장소는 다음에서 확인할 수 있습니다: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

English

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

장기 문맥 언어 모델링에 대한 포괄적 조사

A Comprehensive Survey on Long Context Language Modeling

초록

Support