VideoGameQA-Bench:評估視覺語言模型於電玩遊戲品質保證之效能
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
May 21, 2025
作者: Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer
cs.AI
摘要
隨著電子遊戲現已成為娛樂產業中收入最高的領域,優化遊戲開發工作流程對於該行業的持續增長變得至關重要。視覺語言模型(VLMs)的最新進展為自動化和提升遊戲開發的各個方面提供了巨大潛力,尤其是在質量保證(QA)領域,這仍然是該行業中最為勞動密集型且自動化選項有限的流程之一。為了準確評估VLMs在電子遊戲QA任務中的表現,並確定其在處理實際場景中的有效性,顯然需要標準化的基準測試,因為現有的基準測試不足以滿足該領域的特定需求。為彌補這一差距,我們推出了VideoGameQA-Bench,這是一個全面的基準測試,涵蓋了廣泛的遊戲QA活動,包括視覺單元測試、視覺回歸測試、大海撈針任務、故障檢測以及針對各種遊戲的圖像和視頻的錯誤報告生成。代碼和數據可在以下網址獲取:https://asgaardlab.github.io/videogameqa-bench/
English
With video games now generating the highest revenues in the entertainment
industry, optimizing game development workflows has become essential for the
sector's sustained growth. Recent advancements in Vision-Language Models (VLMs)
offer considerable potential to automate and enhance various aspects of game
development, particularly Quality Assurance (QA), which remains one of the
industry's most labor-intensive processes with limited automation options. To
accurately evaluate the performance of VLMs in video game QA tasks and
determine their effectiveness in handling real-world scenarios, there is a
clear need for standardized benchmarks, as existing benchmarks are insufficient
to address the specific requirements of this domain. To bridge this gap, we
introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array
of game QA activities, including visual unit testing, visual regression
testing, needle-in-a-haystack tasks, glitch detection, and bug report
generation for both images and videos of various games. Code and data are
available at: https://asgaardlab.github.io/videogameqa-bench/Summary
AI-Generated Summary