AlphaSpace: セマンティックトークン化とシンボリック推論によるロボット行動の実現

要旨

本論文は、3次元デカルト空間ナビゲーションにおける大規模言語モデル（LLM）の空間推論能力を向上させるための新規手法「AlphaSpace」を提案する。AlphaSpaceは、意味論に基づくトークン化戦略を採用し、高さ情報を専門的な意味トークンを通じて符号化し、主に記号的な合成推論データを統合する。このアプローチにより、LLMは特定の[x, y, z]座標にオブジェクトを正確に配置することが可能となる。実験結果では、AlphaSpaceが操作サブタスクにおいて既存モデルを大幅に上回り、総合精度66.67%を達成した。これは、GPT-4oの37.5%、Claude 3.5 Sonnetの29.17%と比較して優れた性能を示している。

English

This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of large language models (LLMs) for 3D Cartesian space navigation. AlphaSpace employs a semantics-based tokenization strategy, encoding height information through specialized semantic tokens, and integrates primarily symbolic synthetic reasoning data. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates. Experimental results demonstrate that AlphaSpace significantly outperforms existing models on manipulation subtasks, achieving a total accuracy of 66.67%, compared to 37.5% for GPT-4o and 29.17% for Claude 3.5 Sonnet.

AlphaSpace: セマンティックトークン化とシンボリック推論によるロボット行動の実現

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

要旨

Support