オフラインシミュレーションとLLMを活用したソフトウェアスクリプティング自動化のためのスキル発見

要旨

スクリプティングインターフェースは、ユーザーがタスクを自動化し、ソフトウェアのワークフローをカスタマイズすることを可能にしますが、従来のスクリプト作成にはプログラミングの専門知識と特定のAPIへの習熟が必要であり、多くのユーザーにとって障壁となっています。大規模言語モデル（LLM）は自然言語クエリからコードを生成できますが、未検証のコード、セキュリティリスク、長い応答時間、高い計算コストのため、実行時のコード生成は大幅に制限されています。このギャップを埋めるため、我々はLLMと公開されているスクリプティングガイドを活用して、検証済みのスクリプトの集合であるソフトウェア固有のスキルセットをキュレーションするオフラインシミュレーションフレームワークを提案します。このフレームワークは2つのコンポーネントで構成されます：（1）トップダウンの機能ガイダンスとボトムアップのAPIシナジー探索を使用して有用なタスクを生成するタスク作成、（2）実行フィードバックに基づいてスクリプトを洗練し検証する試行を伴うスキル生成です。広大なAPIのランドスケープを効率的にナビゲートするために、APIシナジーを捕捉するグラフニューラルネットワーク（GNN）ベースのリンク予測モデルを導入し、未活用のAPIを含むスキルの生成を可能にし、スキルセットの多様性を拡張します。Adobe Illustratorを用いた実験では、従来の実行時コード生成と比較して、本フレームワークが自動化の成功率を大幅に向上させ、応答時間を短縮し、実行時のトークンコストを節約することが示されました。これは、ソフトウェアスクリプティングインターフェースをLLMベースシステムのテストベッドとして使用する初めての試みであり、制御された環境で実行フィードバックを活用する利点を強調し、専門ソフトウェア領域におけるAI能力とユーザーニーズの整合に関する貴重な洞察を提供します。

English

Scripting interfaces enable users to automate tasks and customize software workflows, but creating scripts traditionally requires programming expertise and familiarity with specific APIs, posing barriers for many users. While Large Language Models (LLMs) can generate code from natural language queries, runtime code generation is severely limited due to unverified code, security risks, longer response times, and higher computational costs. To bridge the gap, we propose an offline simulation framework to curate a software-specific skillset, a collection of verified scripts, by exploiting LLMs and publicly available scripting guides. Our framework comprises two components: (1) task creation, using top-down functionality guidance and bottom-up API synergy exploration to generate helpful tasks; and (2) skill generation with trials, refining and validating scripts based on execution feedback. To efficiently navigate the extensive API landscape, we introduce a Graph Neural Network (GNN)-based link prediction model to capture API synergy, enabling the generation of skills involving underutilized APIs and expanding the skillset's diversity. Experiments with Adobe Illustrator demonstrate that our framework significantly improves automation success rates, reduces response time, and saves runtime token costs compared to traditional runtime code generation. This is the first attempt to use software scripting interfaces as a testbed for LLM-based systems, highlighting the advantages of leveraging execution feedback in a controlled environment and offering valuable insights into aligning AI capabilities with user needs in specialized software domains.

オフラインシミュレーションとLLMを活用したソフトウェアスクリプティング自動化のためのスキル発見

Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs

要旨

Support