FaSTA^*: Snelle-Trage Toolpath Agent met Subroutine Mining voor Efficiënte Multi-turn Beeldbewerking

Samenvatting

We ontwikkelen een kostenefficiënte neurosymbolische agent om uitdagende meerstaps beeldbewerkingstaken aan te pakken, zoals "Detecteer de bank in de afbeelding en kleur deze roze. Verwijder ook de kat voor een duidelijker beeld en kleur de muur geel." Deze agent combineert snelle, hoogwaardige subtakenplanning door grote taalmodellen (LLMs) met langzame, nauwkeurige toolgebruik en lokale A^* zoekacties per subtask om een kostenefficiënte toolpath te vinden – een reeks aanroepen van AI-tools. Om de kosten van A^* voor vergelijkbare subtaken te besparen, voeren we inductief redeneren uit op eerder succesvolle toolpaths via LLMs om continu veelgebruikte subroutines te extraheren/verfijnen en deze te hergebruiken als nieuwe tools voor toekomstige taken in een adaptieve snel-langzaam planning, waarbij eerst de hogere subroutines worden verkend, en alleen wanneer deze falen, wordt de laagniveau A^* zoekactie geactiveerd. De herbruikbare symbolische subroutines besparen aanzienlijk verkenningskosten voor dezelfde typen subtaken die worden toegepast op vergelijkbare afbeeldingen, wat resulteert in een mensachtige snel-langzaam toolpath agent "FaSTA^*": snelle subtakenplanning gevolgd door regelgebaseerde subroutineselectie per subtask wordt eerst geprobeerd door LLMs, wat naar verwachting de meeste taken zal dekken, terwijl langzame A^* zoekacties alleen worden geactiveerd voor nieuwe en uitdagende subtaken. Door vergelijking met recente beeldbewerkingstechnieken, tonen we aan dat FaSTA^* aanzienlijk computationeel efficiënter is, terwijl het competitief blijft met de state-of-the-art baseline in termen van slagingspercentage.

English

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A^* search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A^* on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A^* search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA^*'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A^* search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA^* is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

FaSTA^*: Snelle-Trage Toolpath Agent met Subroutine Mining voor Efficiënte Multi-turn Beeldbewerking

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Samenvatting

Support