FastContext: Training eines effizienten Repository-Explorers für Code-Agenten

Zusammenfassung

Große Sprachmodelle (LLM) als Codierungsagenten haben bei Softwareentwicklungsaufgaben starke Ergebnisse erzielt, doch die Repository-Erkundung bleibt ein wesentlicher Engpass: Das Auffinden relevanter Codes verbraucht erhebliches Token-Budget und verunreinigt den Kontext des Agenten mit irrelevanten Ausschnitten. In den meisten Agenten erkundet dasselbe Modell das Repository und löst die Aufgabe, sodass explorative Lese- und Suchvorgänge im Verlauf des Lösers verbleiben. Wir stellen FastContext vor, einen spezialisierten Erkundungs-Unteragenten, der die Repository-Erkundung von der Lösung trennt. Bei Bedarf aufgerufen, führt FastContext parallele Tool-Aufrufe durch und gibt präzise Dateipfade und Zeilenbereiche als fokussierten Kontext zurück. FastContext wird von spezialisierten Erkundungsmodellen mit 4B–30B Parametern betrieben. Wir bootstrappen diese aus starken Referenzmodell-Trajektorien und verfeinern sie mit aufgabenbasierten Belohnungen für breite Erstsuche, mehrschrittige Beweissammlung und präzise Zitationserstellung. Über SWE-bench Multilingual, SWE-bench Pro und SWE-QA hinweg verbessert die Integration von FastContext in Mini-SWE-Agent die End-to-End-Lösungsraten um bis zu 5,5 % bei gleichzeitiger Reduzierung des Token-Verbrauchs des Codierungsagenten um bis zu 60 % bei vernachlässigbarem Mehraufwand. Diese Ergebnisse zeigen, dass die Repository-Erkundung von der Lösung getrennt und effektiv von spezialisierten Modellen durchgeführt werden kann. Code und Daten: https://github.com/microsoft/fastcontext

English

Large Language Model (LLM) coding agents have achieved strong results on software engineering tasks, yet repository exploration remains a major bottleneck: locating relevant code consumes substantial token budget and pollutes the agent's context with irrelevant snippets. In most agents, the same model explores the repository and solves the task, leaving exploratory reads and searches in the solver's history. We present FastContext, a dedicated exploration subagent that separates repository exploration from solving. Invoked on demand, FastContext issues parallel tool calls and returns concise file paths and line ranges as focused context. FastContext is powered by specialized exploration models spanning 4B--30B parameters. We bootstrap them from strong reference-model trajectories and refine them with task-grounded rewards for broad first-turn search, multi-turn evidence gathering, and precise citation generation. Across SWE-bench Multilingual, SWE-bench Pro, and SWE-QA, integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates up to 5.5\% while reducing coding-agent token consumption up to 60\%, with marginal overhead. These results show that repository exploration can be separated from solving and handled effectively by specialized models. Code and data: https://github.com/microsoft/fastcontext