Un agente linguistico Zero-Shot per il controllo del computer con riflessione strutturata

Abstract

I grandi modelli linguistici (LLM) hanno dimostrato una capacità crescente di pianificare e eseguire un obiettivo di alto livello in un ambiente informatico in tempo reale (ad esempio MiniWoB++). Per eseguire un compito, i lavori recenti spesso richiedono che un modello apprenda da esempi di tracce del compito attraverso apprendimento supervisionato o prompt con pochi/molti esempi. Senza queste tracce di esempio, rimane una sfida capire come un agente possa apprendere autonomamente e migliorare il suo controllo su un computer, il che limita la capacità di un agente di eseguire un nuovo compito. Affrontiamo questo problema con un agente zero-shot che non richiede tracce esperte fornite. Il nostro agente pianifica azioni eseguibili in un ambiente parzialmente osservato e progredisce iterativamente in un compito identificando e apprendendo dai propri errori attraverso l'autoriflessione e una gestione strutturata del pensiero. Nei compiti semplici di MiniWoB++, dimostriamo che il nostro agente zero-shot spesso supera i recenti SoTA, con un ragionamento più efficiente. Per compiti con maggiore complessità, il nostro agente riflessivo si comporta alla pari con i migliori modelli precedenti, nonostante i lavori precedenti avessero il vantaggio di accedere a tracce esperte o informazioni aggiuntive sullo schermo.

English

Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.

Un agente linguistico Zero-Shot per il controllo del computer con riflessione strutturata

A Zero-Shot Language Agent for Computer Control with Structured Reflection

Abstract

Support