P1-VL:连接物理奥林匹克竞赛中的视觉感知与科学推理
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
February 10, 2026
作者: Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding, Bowen Zhou, Peng Ye, Ganqu Cui
cs.AI
摘要
从符号处理向科学级推理的转变,是大型语言模型(LLMs)发展的关键前沿领域,而物理学作为连接抽象逻辑与物理现实的试金石,具有决定性意义。物理学要求模型必须遵循宇宙规律保持物理一致性,这一任务本质上需要多模态感知能力来将抽象逻辑锚定于现实。在奥赛层面,图表往往具有构成性而非辅助性,其中包含文本未提及的关键约束条件(如边界条件和空间对称性)。为弥合这种视觉逻辑鸿沟,我们推出了P1-VL系列开源视觉语言模型,专为高级科学推理设计。该方法将课程强化学习(采用渐进式难度扩展以稳定后训练)与智能体增强技术(支持推理时的迭代自验证)相融合。在包含2024-2025年13场考试的严苛基准HiPhO上评估,我们的旗舰模型P1-VL-235B-A22B成为首个斩获12枚金牌的开源视觉语言模型(VLM),并在开源模型中实现最先进性能。我们的智能增强系统位列全球总排名第二,仅次于Gemini-3-Pro。在物理学之外,P1-VL展现出卓越的科学推理能力与泛化性,在STEM基准测试中相对基础模型建立显著优势。通过开源P1-VL,我们为迈向通用物理智能奠定了基石,使机器科学发现能更好地实现视觉感知与抽象物理定律的协同。
English
The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training, with Agentic Augmentation, enabling iterative self-verification at inference. Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024-2025, our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models. Our agent-augmented system achieves the No.2 overall rank globally, trailing only Gemini-3-Pro. Beyond physics, P1-VL demonstrates remarkable scientific reasoning capacity and generalizability, establishing significant leads over base models in STEM benchmarks. By open-sourcing P1-VL, we provide a foundational step toward general-purpose physical intelligence to better align visual perceptions with abstract physical laws for machine scientific discovery.