Een Uitgebreid Overzicht van Taalmodellering met Lange Context

Samenvatting

Efficiënte verwerking van lange contexten is een voortdurend streven in Natural Language Processing. Met het groeiende aantal lange documenten, dialogen en andere tekstuele gegevens, is het belangrijk om Long Context Language Models (LCLMs) te ontwikkelen die uitgebreide invoer op een effectieve en efficiënte manier kunnen verwerken en analyseren. In dit artikel presenteren we een uitgebreid overzicht van recente vooruitgang in lange-contextmodellering voor grote taalmodellen. Ons overzicht is gestructureerd rond drie kernaspecten: hoe effectieve en efficiënte LCLMs te verkrijgen, hoe LCLMs efficiënt te trainen en in te zetten, en hoe LCLMs uitgebreid te evalueren en analyseren. Voor het eerste aspect bespreken we datastrategieën, architectonische ontwerpen en workflowbenaderingen gericht op lange context verwerking. Voor het tweede aspect bieden we een gedetailleerd onderzoek van de infrastructuur die nodig is voor LCLM-training en -inferentie. Voor het derde aspect presenteren we evaluatieparadigma's voor lange-contextbegrip en lange-vormgeneratie, evenals gedragsanalyse en mechanisme-interpretatie van LCLMs. Naast deze drie kernaspecten verkennen we grondig de diverse toepassingsscenario's waarin bestaande LCLMs zijn ingezet en schetsen we veelbelovende toekomstige ontwikkelingsrichtingen. Dit overzicht biedt een actuele review van de literatuur over lange-context LLMs, die we willen laten dienen als een waardevolle bron voor zowel onderzoekers als ingenieurs. Een geassocieerde GitHub-repository die de nieuwste papers en repos verzamelt, is beschikbaar op: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

English

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

Een Uitgebreid Overzicht van Taalmodellering met Lange Context

A Comprehensive Survey on Long Context Language Modeling

Samenvatting

Support