MultiWorld: Skalierbare Multi-Agenten Multi-Ansicht Video-Weltmodelle

Zusammenfassung

Videoweltenmodelle haben bemerkenswerte Erfolge bei der Simulation von Umgebungsdynamiken als Reaktion auf Aktionen von Nutzern oder Agenten erzielt. Sie werden als aktionskonditionierte Videogenerierungsmodelle modelliert, die historische Frames und aktuelle Aktionen als Eingabe verwenden, um zukünftige Frames vorherzusagen. Dennoch sind die meisten bestehenden Ansätze auf Einzelagenten-Szenarien beschränkt und erfassen nicht die komplexen Interaktionen, die realen Multi-Agenten-Systemen innewohnen. Wir stellen MultiWorld vor, einen einheitlichen Rahmen für Multi-Agenten-Multi-Perspektiven-Weltmodellierung, der eine präzise Steuerung mehrerer Agenten bei gleichzeitiger Wahrung der Multi-Perspektiven-Konsistenz ermöglicht. Wir führen das Multi-Agenten-Konditionsmodul ein, um eine präzise Multi-Agenten-Steuerbarkeit zu erreichen, und den Global-State-Encoder, um kohärente Beobachtungen über verschiedene Perspektiven hinweg sicherzustellen. MultiWorld unterstützt flexible Skalierung der Anzahl von Agenten und Perspektiven und synthetisiert verschiedene Perspektiven parallel für hohe Effizienz. Experimente in Mehrspieler-Spielumgebungen und Multi-Roboter-Manipulationsaufgaben zeigen, dass MultiWorld Baseline-Methoden in Bezug auf Videotreu, Aktionsfolgefähigkeit und Multi-Perspektiven-Konsistenz übertrifft. Projektseite: https://multi-world.github.io/

English

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in parallel for high efficiency. Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action-following ability, and multi-view consistency. Project page: https://multi-world.github.io/

MultiWorld: Skalierbare Multi-Agenten Multi-Ansicht Video-Weltmodelle

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Zusammenfassung

Support