ChatPaper.aiChatPaper

ACG:基於流程的視覺語言模型動作連貫性引導

ACG: Action Coherence Guidance for Flow-based VLA models

October 25, 2025
作者: Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo
cs.AI

摘要

擴散模型與流匹配模型已成為強大的機器人策略框架,使視覺-語言-動作模型能夠在不同場景與指令間實現泛化。然而,當透過模仿學習進行訓練時,其高生成能力會對人類示範中的噪音(如急動、停頓和抖動)極為敏感,這些噪音會降低動作連貫性。動作連貫性不足將導致部署時出現不穩定與軌跡漂移,在需要精確操作的細粒度操控任務中,這類失誤可能引發災難性後果。本文提出適用於VLA模型的動作連貫性引導技術,該無需重新訓練的測試階段引導演算法能提升動作連貫性,從而實現效能增益。在RoboCasa、DexMimicGen及真實世界SO-101任務上的評估顯示,ACG能持續改善動作連貫性,並在多樣化操控任務中提升成功率。程式碼與專案頁面分別公開於:https://github.com/DAVIAN-Robotics/ACG 與 https://DAVIAN-Robotics.github.io/ACG。
English
Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.
PDF362December 31, 2025