行动分块中的视野混合
Mixture of Horizons in Action Chunking
November 24, 2025
作者: Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, Mingyu Ding
cs.AI
摘要
视觉-语言-动作模型在机器人操作任务中展现出卓越能力,但其性能对训练时采用的动作块长度(即规划视野)十分敏感。我们的实证研究揭示了一个内在权衡:较长视野能提供更强的全局预见性,但会削弱细粒度精度;较短视野虽能提升局部控制精度,却难以应对长期任务,这表明固定单一视野的选择具有次优性。为缓解这一矛盾,我们提出混合视野策略。该策略将动作块重组为多个不同视野的片段,通过共享动作变换器进行并行处理,并利用轻量级线性门融合输出。该方法具有三大优势:1)在单一模型内协同利用长程预见性与短程精确性,提升复杂任务下的性能与泛化能力;2)可即插即用地适配全注意力动作模块,仅增加极小的训练/推理开销;3)支持自适应视野的动态推理,通过跨视野共识筛选稳定动作,在保持优异性能的同时实现比基线高2.5倍的吞吐量。基于流策略π₀、π₀.₅和一步回归策略π_reg的大量实验表明,混合视野策略在仿真与真实任务中均能带来持续显著的性能提升。值得注意的是,在混合任务设定下,结合混合视野的π₀.₅仅需3万次训练迭代即在LIBERO基准上达到99%的平均成功率,创下新纪录。项目页面:https://github.com/Timsty1/MixtureOfHorizons
English
Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the action chunk length used during training, termed horizon. Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal. To mitigate the trade-off, we propose a mixture of horizons (MoH) strategy. MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs with a light linear gate. It has three appealing benefits. 1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks. 2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead. 3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5times higher throughput than baselines while preserving superior performance. Extensive experiments over flow-based policies π_0, π_{0.5}, and one-step regression policy π_{reg} demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks. Notably, under mixed-task setting, π_{0.5} with MoH reaches a new state-of-the-art with 99% average success rate on LIBERO after only 30k training iterations. Project page: https://github.com/Timsty1/MixtureOfHorizons