Trasformatore Decisionale Elastico

Abstract

Questo articolo introduce l'Elastic Decision Transformer (EDT), un significativo progresso rispetto all'esistente Decision Transformer (DT) e alle sue varianti. Sebbene il DT pretenda di generare una traiettoria ottimale, evidenze empiriche suggeriscono che incontri difficoltà nel processo di "trajectory stitching", che consiste nella generazione di una traiettoria ottimale o quasi ottimale a partire dalle parti migliori di un insieme di traiettorie sub-ottimali. Il proposto EDT si distingue facilitando il trajectory stitching durante l'inferenza delle azioni al momento del test, ottenuto regolando la lunghezza della cronologia mantenuta nel DT. Inoltre, l'EDT ottimizza la traiettoria conservando una cronologia più lunga quando la traiettoria precedente è ottimale e una più breve quando è sub-ottimale, consentendogli di "cucire" con una traiettoria più ottimale. Esperimenti estesi dimostrano la capacità dell'EDT di colmare il divario prestazionale tra gli approcci basati su DT e quelli basati su Q Learning. In particolare, l'EDT supera i metodi basati su Q Learning in un regime multi-task sul benchmark di locomozione D4RL e sui giochi Atari. I video sono disponibili al seguente link: https://kristery.github.io/edt/

English

This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/