탄성 결정 트랜스포머

초록

본 논문은 기존의 Decision Transformer(DT)와 그 변형들을 크게 개선한 Elastic Decision Transformer(EDT)를 소개한다. DT는 최적의 궤적을 생성한다고 주장하지만, 실험적 증거에 따르면 DT는 궤적 스티칭(trajectory stitching) 과정에서 어려움을 겪는 것으로 나타났다. 궤적 스티칭이란 여러 차선의 궤적 중 가장 우수한 부분을 조합하여 최적 또는 근사 최적의 궤적을 생성하는 과정을 의미한다. 제안된 EDT는 테스트 시점에서의 행동 추론 과정에서 궤적 스티칭을 용이하게 함으로써 차별성을 갖는다. 이는 DT에서 유지되는 이력(history) 길이를 조정함으로써 달성된다. 또한, EDT는 이전 궤적이 최적일 때는 더 긴 이력을 유지하고, 차선일 때는 더 짧은 이력을 유지함으로써 궤적을 최적화하며, 이를 통해 더 최적의 궤적과 "스티칭"할 수 있게 한다. 광범위한 실험을 통해 EDT는 DT 기반 접근법과 Q 러닝 기반 접근법 간의 성능 격차를 줄일 수 있음을 입증했다. 특히, EDT는 D4RL 로코모션 벤치마크와 아타리 게임에서의 다중 작업 환경에서 Q 러닝 기반 방법들을 능가하는 성능을 보였다. 관련 동영상은 https://kristery.github.io/edt/에서 확인할 수 있다.

English

This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/