ACG:面向基于流的视觉语言模型的动作一致性引导
ACG: Action Coherence Guidance for Flow-based VLA models
October 25, 2025
作者: Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo
cs.AI
摘要
扩散模型与流匹配模型已成为强大的机器人策略,使视觉-语言-动作模型能够泛化至多样场景与指令。然而,当通过模仿学习进行训练时,其强大的生成能力会使其对人类演示数据中的噪声(如急动、停顿和抖动)高度敏感,这些噪声会降低动作连贯性。动作连贯性的下降将导致部署过程中出现不稳定和轨迹漂移,在需要精确操作的细粒度操控任务中,这类失败可能造成灾难性后果。本文提出面向VLA模型的动作连贯性引导算法(ACG),该无需重新训练即可在测试阶段应用的引导算法能有效提升动作连贯性,进而提高任务性能。在RoboCasa、DexMimicGen及真实世界SO-101任务上的评估表明,ACG能持续提升动作连贯性,并在多种操控任务中显著提高成功率。代码与项目页面分别发布于https://github.com/DAVIAN-Robotics/ACG 与 https://DAVIAN-Robotics.github.io/ACG。
English
Diffusion and flow matching models have emerged as powerful robot policies,
enabling Vision-Language-Action (VLA) models to generalize across diverse
scenes and instructions. Yet, when trained via imitation learning, their high
generative capacity makes them sensitive to noise in human demonstrations:
jerks, pauses, and jitter which reduce action coherence. Reduced action
coherence causes instability and trajectory drift during deployment, failures
that are catastrophic in fine-grained manipulation where precision is crucial.
In this paper, we present Action Coherence Guidance (ACG) for VLA models, a
training-free test-time guidance algorithm that improves action coherence and
thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and
real-world SO-101 tasks, ACG consistently improves action coherence and boosts
success rates across diverse manipulation tasks. Code and project page are
available at https://github.com/DAVIAN-Robotics/ACG and
https://DAVIAN-Robotics.github.io/ACG , respectively.