FaSTA^*：快速-慢速路径代理与子程序挖掘，实现高效多轮图像编辑

摘要

我们开发了一种成本效益高的神经符号代理，用于应对复杂的多轮图像编辑任务，例如“检测图像中的长椅并将其重新着色为粉色。同时，移除猫以获得更清晰的视野，并将墙壁重新着色为黄色。”该代理结合了大型语言模型（LLMs）快速、高层次的子任务规划能力，以及针对每个子任务使用工具和局部A*搜索的慢速但精确的方法，以寻找成本效益最优的工具路径——即一系列AI工具调用的序列。为了节省在相似子任务上A*搜索的成本，我们通过LLMs对先前成功的工具路径进行归纳推理，持续提取并精炼常用子程序，将其作为新工具在未来的任务中重复使用，实现了一种自适应快慢规划策略：首先探索高层次的子程序，仅当它们失败时，才激活低层次的A*搜索。这些可重用的符号子程序显著降低了在相似图像上应用相同类型子任务时的探索成本，从而打造了一个类人类的快慢工具路径代理“FaSTA^*”：首先由LLMs尝试快速子任务规划及基于规则的子程序选择，预计能覆盖大多数任务，而慢速的A*搜索仅针对新颖且具挑战性的子任务触发。通过与近期图像编辑方法的对比，我们展示了FaSTA^*在保持与最先进基线成功率相当的同时，显著提升了计算效率。

English

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A^* search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A^* on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A^* search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA^*'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A^* search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA^* is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

FaSTA^*：快速-慢速路径代理与子程序挖掘，实现高效多轮图像编辑

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

摘要

Support