物理AIにおけるサイレント障害：自律システムの実行時行動承認に関する文献レビュー

要旨

Physical AIシステムは、マルチモーダル観測、言語命令、学習された世界表現を物理的に重大な行動へとマッピングすることが増えている。ロボティクス基盤モデル、視覚-言語-行動モデル、世界モデルベースの自律システムは、車両、ロボット、ドローン、産業機械を動かす判断を条件付けることができる。この移行は、従来のAIコンテンツモデレーションや古典的なロボット安全だけでは完全には捉えられない安全性の問題を露呈する。すなわち、ブラックボックスモデルが、自信に満ち、もっともらしく、意味的に整合しているように見えながら、物理的に重大な行動を出力する可能性がある。その結果生じる障害は、センサドリフト、オクルージョン、状態推定誤差、分布シフト、幻覚的アフォーダンス、あるいは下流のハードウェア制御装置が違反を検出する前の無効な物理的仮定に起因し、無音で発生しうる。具現化基盤モデル、世界モデル、ロボティクスシミュレーション、具現化安全性ベンチマーク、安全制御、実行時保証、不確かさ推定、検証、およびガードレール評価にわたって、モデルの能力と安全メカニズムは、ほぼ別個の技術的経路に沿って進展してきた。本レビューで総合される繰り返し発生するギャップは、調査対象のどの単一の流れも、ブラックボックスPhysical AIモデルと物理的実行との間の完全な実行時認可境界を提供していないことである。その結果としての分析は、境界付き問題定式化、無音物理行動障害の定義、実行時ガードレール機能の分類法、およびガードレールをPhysical AI保証メカニズムとして比較するための評価要件を導き出す。

English

Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.