视觉-语言-动作模型解剖：从模块构成到里程碑与挑战

摘要

视觉-语言-动作（VLA）模型正推动机器人技术革命，使机器能够理解指令并与物理世界交互。该领域不断涌现新模型与数据集，既令人振奋又难以全面追踪。本综述为VLA领域提供了清晰的结构化指南：我们按照研究者的自然学习路径设计内容，从基础模型模块入手，追溯关键发展里程碑，进而深入剖析定义前沿研究的核心挑战。我们的主要贡献在于系统梳理了五大挑战方向：（1）表征学习（2）动作执行（3）泛化能力（4）安全保证（5）数据集与评估体系。这一框架呼应通用智能体的发展路线图——建立感知-动作基础循环，跨载体与环境扩展能力，最终实现可信部署——所有环节均以数据基础设施为支撑。针对每个方向，我们既评述现有方法又展望未来机遇。本文兼具面向初学者的基础指南与服务于资深研究者的战略路线图双重属性，旨在加速具身智能领域的学习进程并激发创新思路。本综述的动态版本将持续更新于https://suyuz1.github.io/Survery/{项目页面}。

English

Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our https://suyuz1.github.io/Survery/{project page}.

视觉-语言-动作模型解剖：从模块构成到里程碑与挑战

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

摘要

Support