編寫你的策略！通過測試時分佈層面的組合改進基於擴散或基於流的機器人策略

摘要

基於擴散模型的機器人控制方法，包括視覺-語言-動作（VLA）和視覺-動作（VA）策略，已展現出顯著的能力。然而，其發展受到獲取大規模互動數據集高成本的限制。本研究提出了一種無需額外模型訓練即可提升策略性能的替代範式。令人驚訝的是，我們證明了組合策略的性能可以超越任一父策略。我們的貢獻有三方面。首先，我們建立了理論基礎，證明多個擴散模型的分佈分數的凸組合可以產生比任何單一分數更優的一步函數目標。隨後，使用Grönwall型界限證明這種單步改進會傳播到整個生成軌跡，從而帶來系統性的性能提升。其次，基於這些結果，我們提出了通用策略組合（GPC），這是一種無需訓練的方法，通過凸組合和測試時搜索來結合多個預訓練策略的分佈分數，從而提升性能。GPC具有通用性，允許即插即用地組合異構策略，包括VA和VLA模型，以及基於擴散或流匹配的模型，無論其輸入視覺模態如何。第三，我們提供了廣泛的實證驗證。在Robomimic、PushT和RoboTwin基準上的實驗，以及真實世界機器人評估，均證實GPC在各種任務中持續提升性能和適應性。對替代組合運算符和加權策略的進一步分析，為GPC成功的機制提供了見解。這些結果確立了GPC作為一種簡單而有效的方法，通過利用現有策略來提升控制性能。

English

Diffusion-based models for robotic control, including vision-language-action (VLA) and vision-action (VA) policies, have demonstrated significant capabilities. Yet their advancement is constrained by the high cost of acquiring large-scale interaction datasets. This work introduces an alternative paradigm for enhancing policy performance without additional model training. Perhaps surprisingly, we demonstrate that the composed policies can exceed the performance of either parent policy. Our contribution is threefold. First, we establish a theoretical foundation showing that the convex composition of distributional scores from multiple diffusion models can yield a superior one-step functional objective compared to any individual score. A Gr\"onwall-type bound is then used to show that this single-step improvement propagates through entire generation trajectories, leading to systemic performance gains. Second, motivated by these results, we propose General Policy Composition (GPC), a training-free method that enhances performance by combining the distributional scores of multiple pre-trained policies via a convex combination and test-time search. GPC is versatile, allowing for the plug-and-play composition of heterogeneous policies, including VA and VLA models, as well as those based on diffusion or flow-matching, irrespective of their input visual modalities. Third, we provide extensive empirical validation. Experiments on Robomimic, PushT, and RoboTwin benchmarks, alongside real-world robotic evaluations, confirm that GPC consistently improves performance and adaptability across a diverse set of tasks. Further analysis of alternative composition operators and weighting strategies offers insights into the mechanisms underlying the success of GPC. These results establish GPC as a simple yet effective method for improving control performance by leveraging existing policies.

編寫你的策略！通過測試時分佈層面的組合改進基於擴散或基於流的機器人策略

Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition

摘要

Support