Evolutionstrategieën op Schaal: Fijnafstemming van Taalmodellen voorbij Versterkend Leren

Samenvatting

Het finetunen van vooraf getrainde grote taalmodellen (LLMs) voor downstream taken is een cruciale stap in de AI-implementatiepijplijn. Reinforcement learning (RL) is ongetwijfeld de meest prominente finetuningmethode, wat heeft bijgedragen aan de ontwikkeling van veel state-of-the-art LLMs. Daarentegen werden evolutionaire strategieën (ES), die ooit vergelijkbare prestaties lieten zien als RL bij modellen met enkele miljoenen parameters, verwaarloosd vanwege de pessimistische inschatting van hun schaalbaarheid naar grotere modellen. In dit werk melden we de eerste succesvolle poging om ES op te schalen voor het finetunen van alle parameters van LLMs, waarbij we het verrassende feit aantonen dat ES efficiënt kan zoeken over miljarden parameters en bestaande RL-finetuningmethoden op meerdere vlakken overtreft, waaronder sample-efficiëntie, tolerantie voor lange-termijn beloningen, robuustheid tegen verschillende basis-LLMs, minder neiging tot reward hacking en meer stabiele prestaties over meerdere runs. Het dient daarom als basis om een nieuwe richting in LLM-finetuning te ontsluiten die verder gaat dan wat huidige RL-technieken bieden. De broncodes zijn beschikbaar op: https://github.com/VsonicV/es-fine-tuning-paper.

English

Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning method, contributing to the birth of many state-of-the-art LLMs. In contrast, evolution strategies (ES), which once showed comparable performance to RL on models with a few million parameters, was neglected due to the pessimistic perception of its scalability to larger models. In this work, we report the first successful attempt to scale up ES for fine-tuning the full parameters of LLMs, showing the surprising fact that ES can search efficiently over billions of parameters and outperform existing RL fine-tuning methods in multiple respects, including sample efficiency, tolerance to long-horizon rewards, robustness to different base LLMs, less tendency to reward hacking, and more stable performance across runs. It therefore serves as a basis to unlock a new direction in LLM fine-tuning beyond what current RL techniques provide. The source codes are provided at: https://github.com/VsonicV/es-fine-tuning-paper.

Evolutionstrategieën op Schaal: Fijnafstemming van Taalmodellen voorbij Versterkend Leren

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Samenvatting

Support