通过离线模拟与大型语言模型实现软件脚本自动化的技能发现
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
April 29, 2025
作者: Paiheng Xu, Gang Wu, Xiang Chen, Tong Yu, Chang Xiao, Franck Dernoncourt, Tianyi Zhou, Wei Ai, Viswanathan Swaminathan
cs.AI
摘要
脚本接口使用户能够自动化任务并定制软件工作流程,但传统上创建脚本需要编程专业知识和对特定API的熟悉,这为许多用户设置了障碍。尽管大型语言模型(LLMs)能够根据自然语言查询生成代码,但运行时代码生成因未经验证的代码、安全风险、较长的响应时间和较高的计算成本而受到严重限制。为弥合这一差距,我们提出了一种离线模拟框架,通过利用LLMs和公开可用的脚本指南,精心策划一套软件特定的技能集,即一组经过验证的脚本。我们的框架包含两个组成部分:(1)任务创建,采用自上而下的功能指导和自下而上的API协同探索来生成有用的任务;(2)技能生成与试验,基于执行反馈精炼和验证脚本。为了高效导航广阔的API领域,我们引入了一种基于图神经网络(GNN)的链接预测模型,以捕捉API协同效应,从而生成涉及未充分利用API的技能,并扩展技能集的多样性。在Adobe Illustrator上的实验表明,与传统运行时代码生成相比,我们的框架显著提高了自动化成功率,减少了响应时间,并节省了运行时令牌成本。这是首次将软件脚本接口作为基于LLM系统的测试平台,强调了在受控环境中利用执行反馈的优势,并为在专业软件领域中将AI能力与用户需求对齐提供了宝贵的见解。
English
Scripting interfaces enable users to automate tasks and customize software
workflows, but creating scripts traditionally requires programming expertise
and familiarity with specific APIs, posing barriers for many users. While Large
Language Models (LLMs) can generate code from natural language queries, runtime
code generation is severely limited due to unverified code, security risks,
longer response times, and higher computational costs. To bridge the gap, we
propose an offline simulation framework to curate a software-specific skillset,
a collection of verified scripts, by exploiting LLMs and publicly available
scripting guides. Our framework comprises two components: (1) task creation,
using top-down functionality guidance and bottom-up API synergy exploration to
generate helpful tasks; and (2) skill generation with trials, refining and
validating scripts based on execution feedback. To efficiently navigate the
extensive API landscape, we introduce a Graph Neural Network (GNN)-based link
prediction model to capture API synergy, enabling the generation of skills
involving underutilized APIs and expanding the skillset's diversity.
Experiments with Adobe Illustrator demonstrate that our framework significantly
improves automation success rates, reduces response time, and saves runtime
token costs compared to traditional runtime code generation. This is the first
attempt to use software scripting interfaces as a testbed for LLM-based
systems, highlighting the advantages of leveraging execution feedback in a
controlled environment and offering valuable insights into aligning AI
capabilities with user needs in specialized software domains.Summary
AI-Generated Summary