R1-Omni:基於強化學習的可解釋全方位多模態情感識別
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
March 7, 2025
作者: Jiaxing Zhao, Xihan Wei, Liefeng Bo
cs.AI
摘要
在本研究中,我們首次將可驗證獎勵的強化學習(RLVR)應用於全模態大型語言模型,針對情感識別這一任務進行優化,其中視覺和音頻模態均扮演著關鍵角色。我們利用RLVR來優化全模態模型,顯著提升了其在三個關鍵方面的表現:推理能力、情感識別準確度以及泛化能力。RLVR的引入不僅提升了模型在分佈內數據上的整體性能,還在分佈外數據集評估中展現出卓越的魯棒性。更重要的是,增強後的推理能力使得我們能夠清晰分析不同模態,特別是視覺和音頻信息,在情感識別過程中的貢獻。這為多模態大型語言模型的優化提供了寶貴的洞見。
English
In this work, we present the first application of Reinforcement Learning with
Verifiable Reward (RLVR) to an Omni-multimodal large language model in the
context of emotion recognition, a task where both visual and audio modalities
play crucial roles. We leverage RLVR to optimize the Omni model, significantly
enhancing its performance in three key aspects: reasoning capability, emotion
recognition accuracy, and generalization ability. The introduction of RLVR not
only improves the model's overall performance on in-distribution data but also
demonstrates superior robustness when evaluated on out-of-distribution
datasets. More importantly, the improved reasoning capability enables clear
analysis of the contributions of different modalities, particularly visual and
audio information, in the emotion recognition process. This provides valuable
insights into the optimization of multimodal large language models.Summary
AI-Generated Summary