Aanleren van Inheemse Voortzetting voor Actiechunking-stroombeleid

Samenvatting

Action chunking stelt Vision Language Action (VLA)-modellen in staat om in realtime te werken, maar naïeve uitgevoerde chunking vertoont vaak discontinuïteiten op de chunkgrenzen. Real-Time Chunking (RTC) verlicht dit probleem, maar is extern aan het beleid, wat leidt tot ongewenste multimodale schakelingen en trajecten die niet intrinsiek soepel zijn. Wij stellen Legato voor, een continuatiemethode tijdens de training voor op actie-chunking gebaseerde VLA-beleidsmodellen met flow. Specifiek initialiseert Legato de ruisverwijdering vanuit een op het schema gebaseerd mengsel van bekende acties en ruis, waardoor het model wordt blootgesteld aan gedeeltelijke actie-informatie. Bovendien hervormt Legato de geleerde flow-dynamiek om ervoor te zorgen dat het ruisverwijdingsproces consistent blijft tussen training en inferentie onder begeleiding per stap. Legato gebruikt verder een gerandomiseerde schema-conditionering tijdens de training om wisselende inferentievertragingen te ondersteunen en beheersbare soepelheid te bereiken. Empirisch gezien produceert Legato soepelere trajecten en vermindert het ongewenste multimodale schakelingen tijdens de uitvoering, wat leidt tot minder aarzeling en een kortere taakvoltooiingstijd. Uitgebreide experimenten in de echte wereld tonen aan dat Legato consequent beter presteert dan RTC bij vijf manipulatietaken, met verbeteringen van ongeveer 10% in zowel trajectsoepelheid als taakvoltooiingstijd.

English

Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically, Legato produces smoother trajectories and reduces spurious multimodal switching during execution, leading to less hesitation and shorter task completion time. Extensive real-world experiments show that Legato consistently outperforms RTC across five manipulation tasks, achieving approximately 10% improvements in both trajectory smoothness and task completion time.

Aanleren van Inheemse Voortzetting voor Actiechunking-stroombeleid

Learning Native Continuation for Action Chunking Flow Policies

Samenvatting

Support