ChatPaper.aiChatPaper

视觉-语言-动作模型解剖:从模块构成到里程碑与挑战

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

December 12, 2025
作者: Chao Xu, Suyu Zhang, Yang Liu, Baigui Sun, Weihong Chen, Bo Xu, Qi Liu, Juncheng Wang, Shujun Wang, Shan Luo, Jan Peters, Athanasios V. Vasilakos, Stefanos Zafeiriou, Jiankang Deng
cs.AI

摘要

视觉-语言-动作(VLA)模型正推动机器人技术革命,使机器能够理解指令并与物理世界交互。该领域不断涌现新模型与数据集,既令人振奋又难以全面跟进。本综述为VLA领域提供了清晰的结构化指南:我们按照研究者的自然学习路径设计框架——从基础模型模块入手,追溯关键发展里程碑,进而深入剖析定义前沿研究的核心挑战。我们的主要贡献在于对五大挑战的细致分解:(1)表征学习(2)动作执行(3)泛化能力(4)安全保障(5)数据集与评估体系。这一架构映射出通用智能体的发展路线图:建立感知-动作基础循环,拓展多模态具身能力,最终实现可信部署——所有环节均以数据基础设施为支撑。针对每个维度,我们系统评述现有方法并指明未来机遇。本文兼具双重目标:既为初学者提供基础指南,又为资深研究者呈现战略路线图,以期加速具身智能领域的学习进程并激发创新思路。本综述的动态版本将持续更新于https://suyuz1.github.io/Survery/{项目页面}。
English
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our https://suyuz1.github.io/Survery/{project page}.
PDF131December 23, 2025