物理人工智慧中的靜默失效:自主系統運行時動作授權的文獻回顧
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems
May 23, 2026
作者: Barak Or
cs.AI
摘要
物理AI系統日益將多模態觀測、語言指令以及學習到的世界表徵映射為具物理後果的行動。機器人基礎模型、視覺-語言-動作模型以及基於世界模型的自動系統,能夠對車輛、機器人、無人機及工業機械的移動決策進行條件化設定。此一轉變揭露了傳統AI內容審查或單純經典機器人安全無法完全涵蓋的安全問題:黑箱模型可能在看似自信、合理且語義一致的情況下,發出具物理後果的行動指令。此類失效能悄無聲息地發生,源於感測器漂移、遮擋、狀態估計誤差、分佈偏移、幻覺可供性,或在下游硬體控制器偵測到違規之前就已存在的無效物理假設。
在具身基礎模型、世界模型、機器人模擬、具身安全基準、安全控制、運行時保證、不確定性估計、驗證以及防護欄評估等領域中,模型能力與安全機制的發展大致沿著各自獨立的技術路徑前進。本文綜述所歸納出的一個持續存在的缺口是:沒有任何一條被審視的技術路線能提供一個完整的運行時授權邊界,介於黑箱物理AI模型與物理執行之間。由此產生的分析,發展出一個有界問題形式化定義、一種寂靜物理行動失敗的定義、一套運行時防護欄功能的分類法,以及用於比較防護欄作為物理AI保證機制的評估要求。
English
Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation.
Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.