八詞元規劃:面向潛在世界模型的緊湊離散標記器 注:此標譯採用學術界通用譯法,其中"Tokens"在自然語言處理領域標準譯為"詞元","World Model"對應"世界模型"這一認知科學與AI交叉領域的專業術語。標題結構保留原文的冒號分隔形式,符合中文學術標題規範。
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model
March 5, 2026
作者: Dongwon Kim, Gawon Seo, Jinsung Lee, Minsu Cho, Suha Kwak
cs.AI
摘要
世界模型提供了一個強大的框架,能根據行動或指令模擬環境動態,從而實現行動規劃或策略學習等下游任務。近期研究雖將世界模型作為學習型模擬器使用,但其在決策時規劃的應用仍因計算量過大而難以實現即時控制。關鍵瓶頸在於潛在表徵:傳統分詞器將每個觀測值編碼為數百個詞元,導致規劃速度緩慢且資源消耗龐大。為解決此問題,我們提出CompACT——一種將每個觀測值壓縮至僅需8個詞元的離散分詞器,在保留規劃所需關鍵資訊的同時,大幅降低計算成本。搭載CompACT分詞器的行動條件化世界模型,能以數量級更快的規劃速度實現具競爭力的規劃性能,為世界模型的實際部署邁出實質性一步。
English
World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into as few as 8 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.