视觉-语言-动作安全：威胁、挑战、评估与机制

摘要

视觉-语言-动作（VLA）模型正逐渐成为具身智能的统一基础。这一转变引发了一系列新型安全挑战，这些挑战源于VLA系统的具身特性，包括不可逆的物理后果、跨视觉、语言和状态的多模态攻击面、防御的实时延迟约束、长时程轨迹中的误差传播，以及数据供应链中的漏洞。然而现有研究仍分散在机器人学习、对抗性机器学习、AI对齐和自主系统安全等领域。本文对视觉-语言-动作模型的安全性进行了统一且前沿的综述。我们沿着两条并行的时间轴（攻击时序——训练时与推理时，防御时序——训练时与推理时）对领域进行梳理，将每类威胁与其可被缓解的阶段相连接。我们首先界定VLA安全的范畴，将其与纯文本LLM安全及经典机器人安全相区分，并回顾VLA模型的基础架构、训练范式和推理机制。接着从四个维度审视现有研究：攻击、防御、评估与部署。我们系统梳理了训练时威胁（如数据投毒和后门攻击）以及推理时攻击（包括对抗性补丁、跨模态扰动、语义越狱和冻结攻击），综述了训练时与运行时防御技术，分析了现有基准与评估指标，并探讨了六大部署领域的安全挑战。最后，我们重点提出了五大待解难题：具身轨迹的认证鲁棒性、物理可实现的防御机制、安全感知训练、统一运行时安全架构以及标准化评估体系。

English

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.