액션 청크 흐름 정책을 위한 네이티브 연속 학습

초록

액션 청킹(action chunking)은 Vision Language Action (VLA) 모델이 실시간으로 실행될 수 있게 하지만, 단순한 청크 실행 방식은 종종 청크 경계에서 불연속성을 보인다. 실시간 청킹(Real-Time Chunking, RTC)은 이 문제를 완화하지만 정책(policy)과 분리되어 있어, 부적절한 다중 모드 전환(multimodal switching)과 본질적으로 매끄럽지 않은 궤적(trajectories)을 초래한다. 본 논문에서는 액션 청킹 기반 흐름(flow-based) VLA 정책을 위한 학습 시점 전속(training-time continuation) 방법인 Legato를 제안한다. 구체적으로, Legato는 알려진 액션과 노이즈의 스케줄 형태 혼합물(schedule-shaped mixture)에서 디노이징(denoising)을 초기화하여, 모델이 부분적 액션 정보에 노출되게 한다. 더불어 Legato는 학습된 흐름 역학(flow dynamics)을 재구성하여, 단계별 지도(per-step guidance) 하에서 학습과 추론 간 디노이징 과정이 일관되게 유지되도록 보장한다. 또한 Legato는 학습 중 무작위 스케줄 조건(randomized schedule condition)을 사용하여 다양한 추론 지연을 지원하고 제어 가능한 매끄러움(controllable smoothness)을 달성한다. 실험 결과, Legato는 실행 중 더 매끄러운 궤적을 생성하고 부적절한 다중 모드 전환을 줄여, 망설임을 감소시키고 작업 완료 시간을 단축한다. 다양한 실제 환경 실험을 통해 Legato가 5가지 조작 작업(manipulation tasks)에서 RTC를 지속적으로 능가하며, 궤적 매끄러움과 작업 완료 시간 모두에서 약 10%의 향상을 달성함을 보여준다.

English

Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically, Legato produces smoother trajectories and reduces spurious multimodal switching during execution, leading to less hesitation and shorter task completion time. Extensive real-world experiments show that Legato consistently outperforms RTC across five manipulation tasks, achieving approximately 10% improvements in both trajectory smoothness and task completion time.

액션 청크 흐름 정책을 위한 네이티브 연속 학습

Learning Native Continuation for Action Chunking Flow Policies

초록

Support