ChatPaper.aiChatPaper

模仿物理學家之眼:以視覺語言模型為核心的物理公式發現方法

Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

August 24, 2025
作者: Jiaqi Liu, Songning Lai, Pengze Li, Di Yu, Wenjie Zhou, Yiyang Zhou, Peng Xia, Zijun Wang, Xi Chen, Shixiang Tang, Lei Bai, Wanli Ouyang, Mingyu Ding, Huaxiu Yao, Aoran Wang
cs.AI

摘要

從現實世界的觀測數據中自動發現物理定律,是人工智能領域的一大挑戰。現有方法依賴於符號回歸或大型語言模型(LLMs),僅限於單模態數據,忽視了對物理學家而言不可或缺的豐富視覺現象學運動表徵。這種“感官剝奪”嚴重削弱了它們解讀動態現象內在時空模式的能力。為彌補這一不足,我們提出了VIPER-R1,這是一個多模態模型,旨在通過視覺歸納進行基於物理的方程推理,以發現基礎符號公式。該模型整合了視覺感知、軌跡數據與符號推理,模擬科學發現過程。模型通過運動結構歸納(MSI)課程進行訓練,利用監督微調來解讀運動學相圖,並在因果思維鏈(C-CoT)的指導下構建假設,隨後通過獎勵引導的符號校準(RGSC)利用強化學習精煉公式結構。在推理階段,訓練完成的VIPER-R1作為代理:首先提出一個高置信度的符號假設,然後主動調用外部符號回歸工具執行符號殘差重對齊(SR^2)。這一步驟類似於物理學家的微擾分析,旨在調合理論模型與實證數據。為支持此研究,我們引入了PhysSymbol,一個包含5,000個實例的新多模態語料庫。實驗表明,VIPER-R1在準確性和可解釋性上持續超越現有最先進的視覺語言模型(VLM)基線,實現了更精確的物理定律發現。項目頁面:https://jiaaqiliu.github.io/VIPER-R1/
English
Automated discovery of physical laws from observational data in the real world is a grand challenge in AI. Current methods, relying on symbolic regression or LLMs, are limited to uni-modal data and overlook the rich, visual phenomenological representations of motion that are indispensable to physicists. This "sensory deprivation" severely weakens their ability to interpret the inherent spatio-temporal patterns within dynamic phenomena. To address this gap, we propose VIPER-R1, a multimodal model that performs Visual Induction for Physics-based Equation Reasoning to discover fundamental symbolic formulas. It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process. The model is trained via a curriculum of Motion Structure Induction (MSI), using supervised fine-tuning to interpret kinematic phase portraits and to construct hypotheses guided by a Causal Chain of Thought (C-CoT), followed by Reward-Guided Symbolic Calibration (RGSC) to refine the formula structure with reinforcement learning. During inference, the trained VIPER-R1 acts as an agent: it first posits a high-confidence symbolic ansatz, then proactively invokes an external symbolic regression tool to perform Symbolic Residual Realignment (SR^2). This final step, analogous to a physicist's perturbation analysis, reconciles the theoretical model with empirical data. To support this research, we introduce PhysSymbol, a new 5,000-instance multimodal corpus. Experiments show that VIPER-R1 consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability, enabling more precise discovery of physical laws. Project page: https://jiaaqiliu.github.io/VIPER-R1/
PDF42September 1, 2025