OpenMobile: Entwicklung offener mobiler Agenten durch Aufgaben- und Trajektoriensynthese

Zusammenfassung

Mobile Agents, die durch Vision-Language-Modelle angetrieben werden, haben beeindruckende Fähigkeiten bei der Automatisierung mobiler Aufgaben gezeigt, wobei aktuelle führende Modelle einen deutlichen Leistungssprung erzielt haben, z. B. fast 70 % Erfolgsrate auf AndroidWorld. Allerdings halten diese Systeme ihre Trainingsdaten geschlossen und bleiben undurchsichtig bezüglich ihrer Aufgaben- und Trajektoriensynthese-Verfahren. Wir stellen OpenMobile vor, ein Open-Source-Framework, das hochwertige Aufgabenanweisungen und Agenten-Trajektorien synthetisiert, mit zwei Schlüsselkomponenten: (1) einer skalierbaren Aufgaben-Synthese-Pipeline, die aus der Exploration ein globales Umgebungsgedächtnis aufbaut und dieses nutzt, um diverse und fundierte Anweisungen zu generieren, und (2) einer Policy-Switching-Strategie für das Ausrollen von Trajektorien. Durch den Wechsel zwischen Lern- und Expertenmodellen erfasst sie essentielle Fehlerbehebungsdaten, die im standardmäßigen Imitationslernen oft fehlen. Auf unseren Daten trainierte Agents erzielen wettbewerbsfähige Ergebnisse in drei dynamischen Mobile-Agent-Benchmarks: Bemerkenswerterweise erreichen unsere feinabgestimmten Modelle Qwen2.5-VL und Qwen3-VL 51,7 % bzw. 64,7 % auf AndroidWorld und übertreffen damit bestehende Open-Data-Ansätze bei weitem. Darüber hinaus führen wir transparente Analysen zur Überlappung zwischen unseren synthetischen Anweisungen und Benchmark-Testsets durch und bestätigen, dass die Leistungssteigerungen auf einer breiten Funktionsabdeckung beruhen und nicht auf Overfitting an die Benchmarks. Wir veröffentlichen Daten und Code unter https://njucckevin.github.io/openmobile/, um die Datenlücke zu schließen und breitere Forschung zu mobilen Agents zu ermöglichen.

English

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.

OpenMobile: Entwicklung offener mobiler Agenten durch Aufgaben- und Trajektoriensynthese

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Zusammenfassung

Support