FlowCompile: Een optimaliserende compiler voor gestructureerde LLM-workflows

Samenvatting

Gestructureerde LLM-werkstromen, waarin gespecialiseerde LLM-subagenten uitvoeren volgens een vooraf gedefinieerde graaf, zijn uitgegroeid tot een krachtige abstractie voor het oplossen van complexe taken. Het optimaliseren van dergelijke werkstromen, d.w.z. het selecteren van configuraties voor elke subagent om nauwkeurigheid en latentie in evenwicht te brengen, is uitdagend vanwege de combinatorische ontwerpruimte over modelkeuzes, redeneerbudgetten en werkstroomstructuren. Bestaande kostenbewuste methoden behandelen werkstrooptimalisatie grotendeels als een routeringsprobleem, waarbij tijdens de inferentie voor elke query een configuratie wordt geselecteerd op basis van de tijdens de training gebruikte nauwkeurigheid-latentiedoelstelling. Wij stellen dat gestructureerde LLM-werkstromen ook vanuit een compilatieperspectief kunnen worden geoptimaliseerd: vóór implementatie kan het systeem de werkstroomontwerpruimte globaal verkennen en een herbruikbare set werkstroomniveauconfiguraties construeren die een breed scala aan afwegingen tussen nauwkeurigheid en latentie omvat. Geïnspireerd door machine learning-compilers introduceren wij FlowCompile, een gestructureerde LLM-werkstroomcompiler die compilatietijdontwerpruimteverkenning uitvoert om een hoogwaardige, herbruikbare afwegingenset te identificeren. FlowCompile ontleedt een werkstroom in subagenten, profileert elke subagent onder diverse configuraties en combineert deze metingen via een structuurbewuste proxy om de werkstroomniveau-nauwkeurigheid en -latentie te schatten. Vervolgens identificeert het in een enkele compilatietijdpassage diverse hoogwaardige configuraties, zonder hertraining of online aanpassing. Experimenten met diverse werkstromen en uitdagende benchmarks tonen aan dat FlowCompile consequent beter presteert dan heuristisch geoptimaliseerde werkstroomconfiguraties en routeringsgebaseerde basislijnen, met een versnelling tot 6,4x. De gecompileerde configuratieset dient verder als een herbruikbaar optimalisatie-artefact, waardoor flexibele implementatie onder variërende runtime-voorkeuren mogelijk is en stroomafwaartse selectie of routering wordt ondersteund.

English

Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.