透過大型語言模型的離線模擬實現軟體腳本自動化的技能發現
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
April 29, 2025
作者: Paiheng Xu, Gang Wu, Xiang Chen, Tong Yu, Chang Xiao, Franck Dernoncourt, Tianyi Zhou, Wei Ai, Viswanathan Swaminathan
cs.AI
摘要
腳本介面使使用者能夠自動化任務並自訂軟體工作流程,但傳統上創建腳本需要程式設計專業知識和對特定API的熟悉度,這對許多使用者構成了障礙。雖然大型語言模型(LLMs)能從自然語言查詢生成程式碼,但由於未經驗證的程式碼、安全風險、較長的回應時間和更高的計算成本,運行時代碼生成受到嚴重限制。為彌合這一差距,我們提出了一個離線模擬框架,通過利用LLMs和公開可用的腳本指南,來策劃一個軟體專用的技能集,即一系列經過驗證的腳本。我們的框架包含兩個組件:(1) 任務創建,使用自上而下的功能指導和自下而上的API協同探索來生成有用的任務;(2) 技能生成與試驗,基於執行反饋來精煉和驗證腳本。為了高效導航廣泛的API領域,我們引入了一個基於圖神經網絡(GNN)的鏈接預測模型,以捕捉API協同作用,從而生成涉及未充分利用API的技能,並擴展技能集的多樣性。在Adobe Illustrator上的實驗表明,與傳統的運行時代碼生成相比,我們的框架顯著提高了自動化成功率,減少了回應時間,並節省了運行時令牌成本。這是首次將軟體腳本介面作為基於LLM系統的測試平台,突顯了在受控環境中利用執行反饋的優勢,並為在專業軟體領域中對齊AI能力與使用者需求提供了寶貴的見解。
English
Scripting interfaces enable users to automate tasks and customize software
workflows, but creating scripts traditionally requires programming expertise
and familiarity with specific APIs, posing barriers for many users. While Large
Language Models (LLMs) can generate code from natural language queries, runtime
code generation is severely limited due to unverified code, security risks,
longer response times, and higher computational costs. To bridge the gap, we
propose an offline simulation framework to curate a software-specific skillset,
a collection of verified scripts, by exploiting LLMs and publicly available
scripting guides. Our framework comprises two components: (1) task creation,
using top-down functionality guidance and bottom-up API synergy exploration to
generate helpful tasks; and (2) skill generation with trials, refining and
validating scripts based on execution feedback. To efficiently navigate the
extensive API landscape, we introduce a Graph Neural Network (GNN)-based link
prediction model to capture API synergy, enabling the generation of skills
involving underutilized APIs and expanding the skillset's diversity.
Experiments with Adobe Illustrator demonstrate that our framework significantly
improves automation success rates, reduces response time, and saves runtime
token costs compared to traditional runtime code generation. This is the first
attempt to use software scripting interfaces as a testbed for LLM-based
systems, highlighting the advantages of leveraging execution feedback in a
controlled environment and offering valuable insights into aligning AI
capabilities with user needs in specialized software domains.Summary
AI-Generated Summary