물리적 AI에서의 무음 오류: 자율 시스템을 위한 런타임 작업 승인에 관한 문헌 검토

초록

물리적 AI 시스템은 점점 더 다중 양식 관측, 언어 명령, 그리고 학습된 세계 표현을 물리적 결과를 초래하는 행동으로 매핑하고 있다. 로봇 기반 모델, 시각-언어-행동 모델, 세계 모델 기반 자율 시스템은 차량, 로봇, 드론, 산업 기계를 움직이는 결정을 조건화할 수 있다. 이러한 전환은 기존의 AI 콘텐츠 모더레이션이나 고전적인 로봇 안전만으로는 완전히 포착되지 않는 안전 문제를 드러낸다. 즉, 블랙박스 모델이 자신감 있고 그럴듯하며 의미적으로 정렬된 것처럼 보이면서 물리적 결과를 초래하는 행동을 내놓을 수 있다. 그로 인한 실패는 하드웨어 하류 제어기가 위반을 감지하기 전에 센서 드리프트, 폐색, 상태 추정 오류, 분포 변화, 환각된 어포던스, 또는 무효한 물리적 가정으로 인해 발생하는 조용한 실패일 수 있다. 구현 기반 모델, 세계 모델, 로봇 시뮬레이션, 구현 안전 벤치마크, 안전 제어, 런타임 보증, 불확실성 추정, 검증, 가드레일 평가 전반에 걸쳐, 모델 성능과 안전 메커니즘은 대체로 별개의 기술적 경로를 따라 발전해 왔다. 본 리뷰에서 종합된 반복적인 격차는, 검토된 어떤 단일 흐름도 블랙박스 물리적 AI 모델과 물리적 실행 사이의 완전한 런타임 권한 경계를 제공하지 않는다는 점이다. 결과적인 분석은 제한된 문제 공식화, 조용한 물리적 행동 실패의 정의, 런타임 가드레일 기능의 분류, 그리고 물리적 AI 보증 메커니즘으로서 가드레일을 비교하기 위한 평가 요구사항을 개발한다.

English

Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.