FaSTA^*：具備子程序挖掘功能的快慢路徑代理，用於高效多輪圖像編輯

摘要

我們開發了一種成本效益高的神經符號代理，用於處理具有挑戰性的多輪圖像編輯任務，例如「檢測圖像中的長椅並將其重新著色為粉色。同時，移除貓以獲得更清晰的視野，並將牆壁重新著色為黃色。」該代理結合了大型語言模型（LLMs）快速、高層次的子任務規劃能力，以及針對每個子任務的緩慢、精確、工具使用和局部A^*搜索，以找到成本效益高的工具路徑——即一系列對AI工具的調用。為了節省在相似子任務上使用A^*的成本，我們通過LLMs對先前成功的工具路徑進行歸納推理，持續提取/精煉常用子程序，並將其作為新工具用於未來任務的自適應快慢規劃中，其中高層次子程序首先被探索，僅當它們失敗時，低層次的A^*搜索才會被激活。可重用的符號子程序顯著節省了在相似圖像上應用相同類型子任務的探索成本，從而產生了一種類似人類的快慢工具路徑代理「FaSTA^*」：首先由LLMs嘗試快速子任務規劃，並基於規則選擇每個子任務的子程序，這預計能覆蓋大多數任務，而緩慢的A^*搜索僅針對新穎且具有挑戰性的子任務觸發。通過與最近的圖像編輯方法進行比較，我們證明FaSTA^*在計算效率上顯著更高，同時在成功率方面與最先進的基線方法保持競爭力。

English

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A^* search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A^* on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A^* search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA^*'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A^* search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA^* is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

FaSTA^*：具備子程序挖掘功能的快慢路徑代理，用於高效多輪圖像編輯

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

摘要

Support