InstructExcel: Un Benchmark per le Istruzioni in Linguaggio Naturale in Excel

Abstract

Con l'evoluzione dei Large Language Model (LLM), possiamo risolvere compiti NLP sempre più complessi in vari domini, inclusi i fogli di calcolo. Questo lavoro indaga se i LLM possono generare codice (Excel OfficeScripts, un'API TypeScript per eseguire numerose attività in Excel) che risolve compiti specifici di Excel forniti tramite istruzioni in linguaggio naturale dell'utente. A tal fine, introduciamo un nuovo benchmark su larga scala, InstructExcel, creato sfruttando la funzionalità 'Automate' di Excel per generare automaticamente OfficeScripts dalle azioni degli utenti. Il nostro benchmark include oltre 10.000 campioni che coprono più di 170 operazioni di Excel su 2.000 fogli di calcolo Excel pubblicamente disponibili. Esperimenti in vari contesti zero-shot e few-shot dimostrano che InstructExcel è un benchmark impegnativo per modelli all'avanguardia come GPT-4. Osserviamo che (1) l'uso di GPT-4 rispetto a GPT-3.5, (2) la fornitura di più esempi in-context e (3) il prompting dinamico possono contribuire a migliorare le prestazioni su questo benchmark.

English

With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, InstructExcel, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that InstructExcel is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

InstructExcel: Un Benchmark per le Istruzioni in Linguaggio Naturale in Excel

InstructExcel: A Benchmark for Natural Language Instruction in Excel

Abstract

Support