BOLAA: Benchmarking en Orkestratie van LLM-versterkte Autonome Agents

Samenvatting

De enorme successen van grote taalmmodellen (LLMs) stimuleren het opkomende onderzoek naar LLM-versterkte autonome agents (LAAs). Een LAA is in staat om acties te genereren met zijn kern-LLM en te interacteren met omgevingen, wat het vermogen vergemakkelijkt om complexe taken op te lossen door te conditioneren op eerdere interacties zoals observaties en acties. Aangezien het onderzoek naar LAA nog zeer recent is, zijn er beperkte verkenningen beschikbaar. Daarom bieden wij een uitgebreide vergelijking van LAA in termen van zowel agentarchitecturen als LLM-backbones. Daarnaast stellen wij een nieuwe strategie voor om meerdere LAAs te coördineren, waarbij elke arbeids-LAA zich richt op één type actie, d.w.z. BOLAA, waarbij een controller de communicatie tussen meerdere agents beheert. Wij voeren simulaties uit in zowel besluitvormings- als meerstaps redeneeromgevingen, die de capaciteit van LAAs uitgebreid rechtvaardigen. Onze prestatie-resultaten bieden kwantitatieve suggesties voor het ontwerpen van LAA-architecturen en de optimale keuze van LLMs, evenals de compatibiliteit van beide. Wij maken onze implementatiecode van LAAs openbaar op https://github.com/salesforce/BOLAA.

English

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, i.e. BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at https://github.com/salesforce/BOLAA.

BOLAA: Benchmarking en Orkestratie van LLM-versterkte Autonome Agents

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Samenvatting

Support