物理AI中的静默故障:自主系统运行时动作授权的文献综述
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems
May 23, 2026
作者: Barak Or
cs.AI
摘要
物理AI系统日益将多模态观测、语言指令和学习的世界表征映射为具有物理后果的动作。机器人基础模型、视觉-语言-动作模型以及基于世界模型的自主系统能够约束决策,驱动车辆、机器人、无人机和工业设备移动。这一转变暴露了一个安全问题,该问题无法被传统AI内容审核或单一经典机器人安全措施完全涵盖:黑箱模型可能在看似自信、合理且语义一致的情况下,发出具有物理后果的动作。由此产生的故障可能是无声的,源于传感器漂移、遮挡、状态估计误差、分布偏移、幻觉可供性,或在下游硬件控制器检测到违规之前就已存在的无效物理假设。
在具身基础模型、世界模型、机器人仿真、具身安全基准、安全控制、运行时保障、不确定性估计、验证和护栏评估等领域,模型能力与安全机制大致沿着独立的技术轨道发展。本文综合发现一个反复出现的空白:本次综述考察的任何单一技术流均未能在黑箱物理AI模型与物理执行之间提供完整的运行时授权边界。由此得出的分析建立了有界的问题表述、无声物理动作故障的定义、运行时护栏功能的分类,以及将护栏作为物理AI保障机制进行比较的评估要求。
English
Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation.
Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.