Leren van Hoogfrequente Continue Actiechunks in de Latente Ruimte

Samenvatting

Moderne robotbeleidsvormen vertrouwen steeds vaker op action chunking om complexe taken in de fysieke wereld uit te voeren. Hoewel action chunking de temporele consistentie bij gematigde actiefrequenties verbetert, wordt het ontoereikend wanneer de actiefrequentie verder wordt verhoogd (bijv. tot 60 Hz). Bij zulke hoge frequenties slagen beleidsvormen er vaak niet in om acties te genereren die zowel temporeel vloeiend als ruimtelijk consistent zijn. We pakken deze uitdaging aan door hoogfrequente actieleer van de actieruimte naar een latente ruimte te verschuiven met een variational autoencoder (VAE). Deze formulering verbetert zowel de temporele als ruimtelijke consistentie van hoogfrequente besturing aanzienlijk. Om een vloeiende real-time uitvoering mogelijk te maken, introduceren we verder Reuse-then-Refine, een chunk-level verfijningsstrategie die de continuïteit tussen aangrenzende actie-chunks verbetert onder asynchrone inferentie. Als gevolg hiervan kunnen robots die door ons beleid worden aangestuurd, complexe contactrijke taken continu uitvoeren, met minder pauzes en schokkerige bewegingen. Experimenten op drie realistische contactrijke robottaken tonen aan dat onze aanpak taken consistent voltooit met vloeiende bewegingen. Onze code en gegevens zijn beschikbaar op https://github.com/tars-robotics/RTR.

English

Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequencies, policies often fail to generate actions that are both temporally smooth and spatially consistent. We address this challenge by shifting high-frequency action learning from the action space to a latent space with variational autoencoder (VAE). This formulation significantly improves both temporal and spatial consistency of high-frequency control. To enable smooth real-time execution, we further introduce Reuse-then-Refine, a chunk-level refine strategy that improves continuity between adjacent action chunks under asynchronous inference. As a result, robots controlled by our policy can execute complex contact-rich tasks continuously, with less pauses and jerky motions. Experiments on three real-world contact-rich robotic tasks show that our approach consistently completes tasks with smooth motions. Our code and data are available at https://github.com/tars-robotics/RTR.