LLaVAction:評估與訓練多模態大型語言模型以實現動作識別
LLaVAction: evaluating and training multi-modal large language models for action recognition
March 24, 2025
作者: Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis
cs.AI
摘要
理解人類行為需要對行為動作進行測量。由於其複雜性,行為最好映射到一個豐富的語義結構中,例如語言。最近發展的多模態大型語言模型(MLLMs)在廣泛的行為理解任務中展現出巨大潛力。在本研究中,我們專注於評估並改進MLLMs以執行動作識別。我們將EPIC-KITCHENS-100,這一最大且最具挑戰性的第一人稱動作數據集,重新格式化為視頻多選問答形式(EPIC-KITCHENS-100-MQA)。我們發現,當我們採樣困難的錯誤答案作為干擾項時,領先的MLLMs在識別正確動作方面表現不佳。我們提出了一系列方法,顯著提升了MLLMs的動作識別能力,在EPIC-KITCHENS-100驗證集上達到了最先進水平,並在EPIC-KITCHENS-100-MQA上以21個百分點的準確率優勢超越了GPT-4o。最後,我們展示了在其他動作相關視頻基準測試(如EgoSchema、PerceptionTest、LongVideoBench、VideoMME和MVBench)上的改進,表明MLLMs在處理複雜動作任務方面具有廣闊前景。代碼和模型可在以下網址獲取:https://github.com/AdaptiveMotorControlLab/LLaVAction。
English
Understanding human behavior requires measuring behavioral actions. Due to
its complexity, behavior is best mapped onto a rich, semantic structure such as
language. The recent development of multi-modal large language models (MLLMs)
is a promising candidate for a wide range of action understanding tasks. In
this work, we focus on evaluating and then improving MLLMs to perform action
recognition. We reformulate EPIC-KITCHENS-100, one of the largest and most
challenging egocentric action datasets, to the form of video multiple question
answering (EPIC-KITCHENS-100-MQA). We show that when we sample difficult
incorrect answers as distractors, leading MLLMs struggle to recognize the
correct actions. We propose a series of methods that greatly improve the MLLMs'
ability to perform action recognition, achieving state-of-the-art on both the
EPIC-KITCHENS-100 validation set, as well as outperforming GPT-4o by 21 points
in accuracy on EPIC-KITCHENS-100-MQA. Lastly, we show improvements on other
action-related video benchmarks such as EgoSchema, PerceptionTest,
LongVideoBench, VideoMME and MVBench, suggesting that MLLMs are a promising
path forward for complex action tasks. Code and models are available at:
https://github.com/AdaptiveMotorControlLab/LLaVAction.Summary
AI-Generated Summary