AlphaSpace: 의미론적 토큰화와 기호적 추론을 통한 로봇 행동 구현

초록

본 논문은 대규모 언어 모델(LLM)의 3D 직교 좌표계 공간 탐색을 위한 공간 추론 능력을 향상시키기 위해 설계된 새로운 방법론인 AlphaSpace를 소개한다. AlphaSpace는 의미 기반 토큰화 전략을 사용하여, 특수한 의미 토큰을 통해 높이 정보를 인코딩하며, 주로 기호적 합성 추론 데이터를 통합한다. 이 접근법은 LLM이 특정 [x, y, z] 좌표에 객체를 정확하게 배치할 수 있도록 한다. 실험 결과는 AlphaSpace가 조작 하위 작업에서 기존 모델들을 크게 능가하며, 총 정확도 66.67%를 달성했음을 보여준다. 이는 GPT-4o의 37.5%와 Claude 3.5 Sonnet의 29.17%와 비교된다.

English

This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of large language models (LLMs) for 3D Cartesian space navigation. AlphaSpace employs a semantics-based tokenization strategy, encoding height information through specialized semantic tokens, and integrates primarily symbolic synthetic reasoning data. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates. Experimental results demonstrate that AlphaSpace significantly outperforms existing models on manipulation subtasks, achieving a total accuracy of 66.67%, compared to 37.5% for GPT-4o and 29.17% for Claude 3.5 Sonnet.

AlphaSpace: 의미론적 토큰화와 기호적 추론을 통한 로봇 행동 구현

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

초록

Support