ChatPaper.aiChatPaper

χ₀:通过驯服分布不一致性实现资源感知的鲁棒操作

χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

February 9, 2026
作者: Checheng Yu, Chonghao Sima, Gangcheng Jiang, Hai Zhang, Haoguang Mai, Hongyang Li, Huijie Wang, Jin Chen, Kaiyang Wu, Li Chen, Lirui Zhao, Modi Shi, Ping Luo, Qingwen Bu, Shijia Peng, Tianyu Li, Yibo Yuan
cs.AI

摘要

传统上,高可靠性长周期机器人操作依赖于大规模数据和算力来理解复杂现实世界的动态特性。但我们发现,实现现实世界鲁棒性的主要瓶颈并非仅源于资源规模,而是源于人类示范数据分布、策略学习所得的归纳偏好以及测试时执行分布之间的分布偏移——这种系统性不一致会导致多阶段任务中出现误差累积。为缓解这些不一致性,我们提出χ₀框架,该资源高效型框架通过专门设计的有效模块实现生产级机器人操作鲁棒性。我们的方法基于三大技术支柱:(一)模型算术,一种权重空间融合策略,可高效吸收从物体外观到状态变化的多样化示范分布;(二)阶段优势值,一种阶段感知的优势估计器,通过提供稳定密集的进度信号,克服了传统非阶段方法的数值不稳定性;(三)训练-部署对齐机制,通过时空增强、启发式DAgger修正和时序分块平滑来弥合分布差距。χ₀使两组双臂机器人能够协作完成长周期衣物操作任务,涵盖从铺平、折叠到悬挂不同衣物的全流程。我们的方法展现出高可靠性自主能力,可实现系统从任意初始状态连续24小时不间断运行。实验验证表明,χ₀仅使用20小时数据和8块A100 GPU,其成功率就超越最先进的π₀.5模型近250%。我们将公开代码、数据与模型以促进社区发展。
English
High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose χ_{0}, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. χ_{0} enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that χ_{0} surpasses the state-of-the-art π_{0.5} in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.
PDF161February 14, 2026