PAI-Bench:面向物理人工智能的综合基准测试平台
PAI-Bench: A Comprehensive Benchmark For Physical AI
December 1, 2025
作者: Fengzhe Zhou, Jiannan Huang, Jialuo Li, Deva Ramanan, Humphrey Shi
cs.AI
摘要
物理人工智能旨在开发能够感知并预测现实世界动态的模型,然而当前多模态大语言模型与视频生成模型对这些能力的支持程度尚未得到充分认知。我们推出物理AI基准测试(PAI-Bench),该统一综合性基准通过视频生成、条件视频生成和视频理解三大任务评估感知与预测能力,包含2,808个真实场景案例,并采用任务导向的度量指标以捕捉物理合理性与领域特异性推理。本研究对前沿模型进行系统评估,发现视频生成模型虽具有较强视觉保真度,却常难以保持物理连贯的动态表现;而多模态大语言模型在动态预测与因果推理方面存在明显局限。这些现象表明现有系统尚处于满足物理AI感知与预测需求的初级阶段。总体而言,PAI-Bench为评估物理AI建立了现实基础,并揭示了未来系统需解决的关键技术缺口。
English
Physical AI aims to develop models that can perceive and predict real-world dynamics; yet, the extent to which current multi-modal large language models and video generative models support these abilities is insufficiently understood. We introduce Physical AI Bench (PAI-Bench), a unified and comprehensive benchmark that evaluates perception and prediction capabilities across video generation, conditional video generation, and video understanding, comprising 2,808 real-world cases with task-aligned metrics designed to capture physical plausibility and domain-specific reasoning. Our study provides a systematic assessment of recent models and shows that video generative models, despite strong visual fidelity, often struggle to maintain physically coherent dynamics, while multi-modal large language models exhibit limited performance in forecasting and causal interpretation. These observations suggest that current systems are still at an early stage in handling the perceptual and predictive demands of Physical AI. In summary, PAI-Bench establishes a realistic foundation for evaluating Physical AI and highlights key gaps that future systems must address.