彈性決策轉換器

摘要

本文介紹了彈性決策轉換器（EDT），這是對現有決策轉換器（DT）及其變體的重大進步。儘管DT聲稱能夠生成最優軌跡，實證證據表明它在軌跡拼接方面存在困難，這是一個過程，涉及從一組次優軌跡的最佳部分生成最優或接近最優的軌跡。所提出的EDT通過在測試時促進動作推斷期間的軌跡拼接來區分自身，實現方法是調整DT中維護的歷史長度。此外，EDT通過在先前軌跡為最優時保留較長的歷史，而在次優時保留較短的歷史來優化軌跡，使其能夠與更優軌跡“拼接”。廣泛的實驗證明了EDT在DT和Q學習方法之間的性能差距上的橋樑作用。特別是，在D4RL運動基準和Atari遊戲的多任務制度中，EDT優於基於Q學習的方法。視頻可在以下網址查看：https://kristery.github.io/edt/

English

This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/