ChatPaper.aiChatPaper

EMMA:从多模态数据中提取多个物理参数

EMMA: Extracting Multiple physical parameters from Multimodal Data

May 21, 2026
作者: Farhat Shaikh, Ayan Banerjee, Sandeep Gupta
cs.AI

摘要

我们提出EMMA,一种融合物理信息的多模态框架,能够直接从原始视频、音频及基于图像的时间序列观测中恢复系统的全部可辨识动力学参数。与以往仅依赖视频、难以处理遮挡状态、隐藏驱动输入或需预设初始条件及坐标系的方案不同,EMMA在统一连续时间模型内联合推断显式参数、隐式动力学分量及标定不变量。EMMA借助液体时不变(LTC)网络从异质模态中学习隐动力学,同时通过物理约束损失确保与支配微分方程的一致性。统一的特征处理管线实现了视频轨迹、声学特征及图表测量值之间的对齐,使EMMA能够在无需分割掩膜、可微渲染或专用传感器的情况下,估计受迫、隐式及多变量动力学下的参数。在涵盖五个标准动力学基准(75段Delfys视频)、含隐藏输入的真实世界漫游车与四旋翼系统,以及跨生物与混沌系统的仿真-图表案例研究等100余个场景中,EMMA实现了稳健的多参数恢复,显著优于现有单模态及方程发现基线方法。实验结果证明EMMA是从机会性多模态数据中提取物理一致模型的一种通用、可扩展的解决方案。代码与数据见:https://github.com/ImpactLabASU/EMMA-CVPR2026
English
We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026