EMMA: 멀티모달 데이터로부터 다중 물리적 파라미터 추출

초록

EMMA는 물리 정보 기반의 다중 양식 프레임워크로, 원시 비디오, 오디오 및 이미지 기반 시계열 관측으로부터 시스템의 모든 식별 가능한 동적 매개변수를 직접 복구한다. 이전의 비디오 전용 접근법이 가려진 상태, 숨겨진 작동 입력, 또는 알려진 초기 조건 및 좌표 프레임에 대한 가정에 어려움을 겪는 반면, EMMA는 통합 연속 시간 모델 내에서 명시적 매개변수, 암시적 동적 구성요소 및 교정 불변량의 결합 추론을 수행한다. EMMA는 Liquid Time-Constant (LTC) 네트워크를 활용하여 이종 양식에서 잠재 역학을 학습하는 동시에, 물리 제약 손실이 지배 미분 방정식과의 일관성을 강제한다. 통합 특징 파이프라인은 비디오 궤적, 음향 신호 및 차트 기반 측정 간의 일관된 정렬을 가능하게 하여, EMMA가 세분화 마스크, 미분 가능 렌더링 또는 특수 센서 없이 강제, 암시적 및 다변량 역학 하에서 매개변수를 추정할 수 있게 한다. EMMA는 5개의 표준 동적 벤치마크(75개의 Delfys 비디오), 숨겨진 입력이 있는 실제 로버 및 쿼드로터 시스템, 생물학적 및 혼돈 시스템을 포괄하는 시뮬레이션-차트 사례 연구 등 100개 이상의 시나리오에서 강력한 다중 매개변수 복구를 제공하며, 기존의 단일 양식 및 방정식 발견 기준선을 크게 능가한다. 우리의 결과는 EMMA를 기회적 다중 양식 데이터로부터 물리적으로 일관된 모델 추출을 위한 일반적이고 확장 가능한 솔루션으로 확립한다. 코드와 데이터는 다음에서 확인할 수 있다: https://github.com/ImpactLabASU/EMMA-CVPR2026

English

We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026