Densiteitswet van LLM's

Samenvatting

Grote Taalmodellen (LLM's) zijn opgedoken als een mijlpaal in kunstmatige intelligentie, en hun prestaties kunnen verbeteren naarmate de omvang van het model toeneemt. Echter, deze schaalvergroting brengt grote uitdagingen met zich mee voor training en inferentie-efficiëntie, met name bij het implementeren van LLM's in omgevingen met beperkte middelen, en de schaaltrend wordt steeds onhoudbaarder. Dit artikel introduceert het concept van "capaciteitsdichtheid" als een nieuwe maatstaf om de kwaliteit van de LLM's over verschillende schalen te evalueren en beschrijft de trend van LLM's in termen van zowel effectiviteit als efficiëntie. Om de capaciteitsdichtheid van een bepaald doel-LLM te berekenen, introduceren we eerst een reeks referentiemodellen en ontwikkelen we een schalingswet om de downstream prestaties van deze referentiemodellen te voorspellen op basis van hun parametergroottes. Vervolgens definiëren we de effectieve parametergrootte van het doel-LLM als de parametergrootte die nodig is voor een referentiemodel om een equivalente prestatie te behalen, en formaliseren we de capaciteitsdichtheid als de verhouding tussen de effectieve parametergrootte en de daadwerkelijke parametergrootte van het doel-LLM. Capaciteitsdichtheid biedt een uniform kader voor het beoordelen van zowel model effectiviteit als efficiëntie. Onze verdere analyse van recente open-source basis-LLM's onthult een empirische wet (de verdichtingswet) waarbij de capaciteitsdichtheid van LLM's exponentieel toeneemt in de loop van de tijd. Meer specifiek verdubbelt de capaciteitsdichtheid van LLM's ongeveer elke drie maanden aan de hand van enkele veelgebruikte benchmarks voor evaluatie. De wet biedt nieuwe perspectieven om toekomstige LLM-ontwikkeling te sturen, waarbij de nadruk ligt op het verbeteren van de capaciteitsdichtheid om optimale resultaten te behalen met minimale rekenkundige overhead.

English

Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``capacity density'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the effective parameter size of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.

Densiteitswet van LLM's

Densing Law of LLMs

Samenvatting

Support