8トークンでの計画：潜在世界モデルのためのコンパクト離散トークナイザ

要旨

ワールドモデルは、行動や指示に条件付けられた環境ダイナミクスをシミュレートする強力な枠組みを提供し、行動計画や方策学習などの下流タスクを可能にします。近年のアプローチではワールドモデルを学習済みシミュレーターとして活用しますが、意思決定時計画への応用はリアルタイム制御において計算コストが過大となる課題があります。重要なボトルネックは潜在表現にあり、従来のトークナイザーは各観測を数百トークンに符号化するため、計画処理が低速かつリソース集約的になります。この問題に対処するため、本研究ではCompACTを提案します。これは各観測をわずか8トークンに圧縮する離散トークナイザーであり、計画に必要な本質的情報を保持しつつ計算コストを劇的に削減します。CompACTトークナイザーを組み込んだ行動条件付きワールドモデルは、数桁高速な計画処理で競争力のある計画性能を達成し、ワールドモデルの実世界展開に向けた実用的な一歩を提供します。

English

World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into as few as 8 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.

8トークンでの計画：潜在世界モデルのためのコンパクト離散トークナイザ

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

要旨

Support