Stille fouten in fysieke AI: een literatuuroverzicht van runtime-actie-autorisatie voor autonome systemen

Samenvatting

Physical AI-systemen brengen steeds vaker multimodale observaties, taalopdrachten en aangeleerde wereldrepresentaties in kaart in fysiek consequente acties. Robotica-fundamentmodellen, visie-taal-actiemodellen en op wereldmodellen gebaseerde autonome systemen kunnen beslissingen sturen die voertuigen, robots, drones en industriële machines verplaatsen. Deze transitie brengt een veiligheidsprobleem aan het licht dat niet volledig wordt gedekt door conventionele AI-contentmoderatie of door klassieke robotveiligheid alleen: een black-boxmodel kan een fysiek consequente actie uitvoeren terwijl het zelfverzekerd, plausibel en semantisch afgestemd lijkt. De resulterende faling kan stil zijn, voortkomend uit sensordrift, occlusie, schattingsfouten in de toestand, distributieverschuiving, gehallucineerde affordances of ongeldige fysieke aannames voordat stroomafwaartse hardwarecontrollers een overtreding detecteren. Bij belichaamde fundamentmodellen, wereldmodellen, robotica-simulatie, veiligheidsbenchmarks voor belichaamde systemen, veilige controle, runtime-zekerheid, onzekerheidsschatting, verificatie en evaluatie van guardrails, zijn modelcapaciteit en veiligheidsmechanismen grotendeels langs afzonderlijke technische sporen gevorderd. Een terugkerend hiaat dat hier wordt samengevat, is dat geen enkele stroom die in dit overzicht is onderzocht, een volledige runtime-autorisatiegrens biedt tussen black-box Physical AI-modellen en fysieke uitvoering. De resulterende analyse ontwikkelt een begrensde probleemformulering, een definitie van stille fysieke-actiefout, een taxonomie van runtime-guardrailfuncties en evaluatievereisten voor het vergelijken van guardrails als Physical AI-zekerheidsmechanismen.

English

Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.