ChatPaper.aiChatPaper

PhyCritic:面向物理人工智慧的多模態批判模型

PhyCritic: Multimodal Critic Models for Physical AI

February 11, 2026
作者: Tianyi Xiong, Shihao Wang, Guilin Liu, Yi Dong, Ming Li, Heng Huang, Jan Kautz, Zhiding Yu
cs.AI

摘要

隨著大型多模態模型的快速發展,可靠的評判與批判模型已成為開放式評估和偏好校準的關鍵工具,能為模型生成回應提供配對偏好、數值評分及解釋性理由。然而現有的批判模型主要基於通用視覺領域(如圖說生成或圖像問答)進行訓練,導致涉及感知、因果推理與規劃的物理AI任務長期缺乏深入探索。我們提出PhyCritic——一種通過兩階段RLVR流程優化的多模態物理AI批判模型:首先通過物理技能熱身階段強化物理導向的感知與推理能力,接著進行自參照批判微調,使模型在評判候選回應前先生成自身預測作為內部參考,從而提升判斷穩定性與物理正確性。在物理與通用多模態評判基準測試中,PhyCritic相較開源基準模型實現顯著性能提升,且作為策略模型應用時,能進一步增強物理情境任務中的感知與推理能力。
English
With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.
PDF431February 13, 2026