ChatPaper.aiChatPaper

P1-VL:橋接物理奧林匹克中的視覺感知與科學推理

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

February 10, 2026
作者: Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding, Bowen Zhou, Peng Ye, Ganqu Cui
cs.AI

摘要

從符號操作到科學級推理的轉變,代表著大型語言模型(LLM)發展的關鍵前沿,而物理學正是將抽象邏輯與物理現實綁定的核心試驗場。物理學要求模型必須與宇宙運作法則保持物理一致性,這項任務從根本上需要多模態感知能力來將抽象邏輯落地於現實。在奧林匹克競賽層級,圖示往往具有構成性而非僅是輔助說明,其中包含文本未提及的關鍵約束條件,例如邊界條件和空間對稱性。為彌合這種視覺-邏輯鴻溝,我們推出P1-VL系列開源視覺語言模型,專為高階科學推理設計。我們的方法融合課程強化學習(通過漸進式難度擴展穩定訓練後階段)與智能體增強技術(實現推理時的迭代自我驗證)。在涵蓋2024-2025年13場競賽的嚴苛基準HiPhO上評估,旗艦模型P1-VL-235B-A22B成為首個斬獲12面金牌的開源視覺語言模型(VLM),並在開源模型中實現最先進性能。我們的智能體增強系統位列全球總排名第二,僅次於Gemini-3-Pro。在物理學之外,P1-VL更展現出卓越的科學推理能力與泛化性,在STEM基準測試中相較基礎模型建立顯著優勢。通過開源P1-VL,我們為實現通用物理智能邁出奠基性一步,使機器科學發現能更好地將視覺感知與抽象物理法則對齊。
English
The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training, with Agentic Augmentation, enabling iterative self-verification at inference. Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024-2025, our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models. Our agent-augmented system achieves the No.2 overall rank globally, trailing only Gemini-3-Pro. Beyond physics, P1-VL demonstrates remarkable scientific reasoning capacity and generalizability, establishing significant leads over base models in STEM benchmarks. By open-sourcing P1-VL, we provide a foundational step toward general-purpose physical intelligence to better align visual perceptions with abstract physical laws for machine scientific discovery.
PDF511February 12, 2026