χ₀: Resource-Aware Robuste Manipulatie door het Temmen van Distributionele Inconsistente

Samenvatting

Hoogbetrouwbare robotmanipulatie op lange termijn is traditioneel afhankelijk van grootschalige gegevens en rekenkracht om complexe dynamiek in de echte wereld te begrijpen. Wij stellen echter vast dat de voornaamste beperking voor robuustheid in de echte wereld niet alleen de schaal van middelen is, maar de distributionele verschuiving tussen de verdeling van menselijke demonstraties, de door het beleid geleerde inductieve bias en de uitvoeringsverdeling tijdens tests – een systematische inconsistentie die cumulerende fouten veroorzaakt in meerfasige taken. Om deze inconsistenties te verminderen, stellen we χ₀ voor, een resource-efficiënt raamwerk met effectieve modules die zijn aangewezen om productieniveau robuustheid in robotmanipulatie te bereiken. Onze aanpak rust op drie technische pijlers: (i) Model Arithmetic, een gewichtsruimte-samenvoegstrategie die efficiënt diverse verdelingen van verschillende demonstraties opneemt, variërend van objectverschijning tot toestandsvariaties; (ii) Stage Advantage, een fasebewuste voordelschatter die stabiele, dense voortgangssignalen verschaft en de numerieke instabiliteit van eerdere niet-fasegebonden benaderingen overwint; en (iii) Train-Deploy Alignment, dat de distributiekloof overbrugt via spatio-temporele augmentatie, heuristische DAgger-correcties en temporele chunk-wise afvlakking. χ₀ stelt twee sets dual-arm robots in staat om collaboratief langetermijn kledingmanipulatie te orkestreren, van taken zoals gladstrijken, vouwen tot het ophangen van verschillende kledingstukken. Onze methode vertoont hoogbetrouwbare autonomie; we kunnen het systeem vanaf een willekeurige begintoestand 24 uur non-stop laten draaien. Experimenten valideren dat χ₀ de state-of-the-art π₀.₅ overtreft met een bijna 250% hoger slagingspercentage, met slechts 20 uur aan gegevens en 8 A100 GPU's. Code, gegevens en modellen zullen worden vrijgegeven om de gemeenschap te faciliteren.

English

High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose χ_{0}, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. χ_{0} enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that χ_{0} surpasses the state-of-the-art π_{0.5} in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.

χ₀: Resource-Aware Robuste Manipulatie door het Temmen van Distributionele Inconsistente

χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

Samenvatting

Support