BOLAA: Benchmarking und Orchestrierung von LLM-augmentierten autonomen Agenten

papers.abstract

Die enormen Erfolge großer Sprachmodelle (Large Language Models, LLMs) fördern die zunehmende Erforschung von LLM-augmentierten autonomen Agenten (LLM-augmented Autonomous Agents, LAAs). Ein LAA ist in der Lage, Aktionen mit seinem Kern-LLM zu generieren und mit Umgebungen zu interagieren, wodurch die Fähigkeit zur Lösung komplexer Aufgaben durch die Berücksichtigung vergangener Interaktionen wie Beobachtungen und Aktionen ermöglicht wird. Da die Untersuchung von LAAs noch sehr neu ist, sind bisher nur begrenzte Erkundungen verfügbar. Daher bieten wir einen umfassenden Vergleich von LAAs sowohl in Bezug auf Agentenarchitekturen als auch auf LLM-Backbones. Zusätzlich schlagen wir eine neue Strategie vor, um mehrere LAAs zu orchestrieren, sodass jeder Arbeits-LAA sich auf eine Art von Aktion konzentriert, d. h. BOLAA, wobei ein Controller die Kommunikation zwischen mehreren Agenten verwaltet. Wir führen Simulationen in Umgebungen zur Entscheidungsfindung und zum mehrstufigen Schlussfolgern durch, die die Fähigkeiten von LAAs umfassend rechtfertigen. Unsere Leistungsergebnisse liefern quantitative Empfehlungen für die Gestaltung von LAA-Architekturen und die optimale Wahl von LLMs sowie deren Kompatibilität. Wir veröffentlichen unseren Implementierungscode für LAAs unter https://github.com/salesforce/BOLAA.

English

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, i.e. BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at https://github.com/salesforce/BOLAA.

BOLAA: Benchmarking und Orchestrierung von LLM-augmentierten autonomen Agenten

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

papers.abstract

Support