ACG:基於流程的視覺語言模型動作連貫性引導
ACG: Action Coherence Guidance for Flow-based VLA models
October 25, 2025
作者: Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo
cs.AI
摘要
擴散模型與流匹配模型已成為強大的機器人策略框架,使視覺-語言-動作模型能夠在不同場景與指令間實現泛化。然而,當透過模仿學習進行訓練時,其高生成能力會對人類示範中的噪音(如急動、停頓和抖動)極為敏感,這些噪音會降低動作連貫性。動作連貫性不足將導致部署時出現不穩定與軌跡漂移,在需要精確操作的細粒度操控任務中,這類失誤可能引發災難性後果。本文提出適用於VLA模型的動作連貫性引導技術,該無需重新訓練的測試階段引導演算法能提升動作連貫性,從而實現效能增益。在RoboCasa、DexMimicGen及真實世界SO-101任務上的評估顯示,ACG能持續改善動作連貫性,並在多樣化操控任務中提升成功率。程式碼與專案頁面分別公開於:https://github.com/DAVIAN-Robotics/ACG 與 https://DAVIAN-Robotics.github.io/ACG。
English
Diffusion and flow matching models have emerged as powerful robot policies,
enabling Vision-Language-Action (VLA) models to generalize across diverse
scenes and instructions. Yet, when trained via imitation learning, their high
generative capacity makes them sensitive to noise in human demonstrations:
jerks, pauses, and jitter which reduce action coherence. Reduced action
coherence causes instability and trajectory drift during deployment, failures
that are catastrophic in fine-grained manipulation where precision is crucial.
In this paper, we present Action Coherence Guidance (ACG) for VLA models, a
training-free test-time guidance algorithm that improves action coherence and
thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and
real-world SO-101 tasks, ACG consistently improves action coherence and boosts
success rates across diverse manipulation tasks. Code and project page are
available at https://github.com/DAVIAN-Robotics/ACG and
https://DAVIAN-Robotics.github.io/ACG , respectively.