ChatPaper.aiChatPaper

VideoGameQA-Bench:评估视觉语言模型在视频游戏质量保证中的应用

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

May 21, 2025
作者: Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer
cs.AI

摘要

随着电子游戏在娱乐产业中创下最高营收,优化游戏开发流程已成为该行业持续增长的关键。近期,视觉-语言模型(VLMs)的进展为自动化并提升游戏开发的多个环节,尤其是质量保证(QA)领域,提供了巨大潜力。然而,QA作为行业内劳动密集度最高且自动化选项有限的环节,其效率提升尤为迫切。为了准确评估VLMs在电子游戏QA任务中的表现,并衡量其处理实际场景的有效性,建立标准化基准显得尤为重要,现有基准尚无法满足该领域的特定需求。为此,我们推出了VideoGameQA-Bench,一个全面覆盖多种游戏QA活动的基准测试,包括视觉单元测试、视觉回归测试、大海捞针任务、故障检测,以及针对各类游戏图像和视频的缺陷报告生成。代码与数据可通过以下链接获取:https://asgaardlab.github.io/videogameqa-bench/
English
With video games now generating the highest revenues in the entertainment industry, optimizing game development workflows has become essential for the sector's sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to automate and enhance various aspects of game development, particularly Quality Assurance (QA), which remains one of the industry's most labor-intensive processes with limited automation options. To accurately evaluate the performance of VLMs in video game QA tasks and determine their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain. To bridge this gap, we introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack tasks, glitch detection, and bug report generation for both images and videos of various games. Code and data are available at: https://asgaardlab.github.io/videogameqa-bench/

Summary

AI-Generated Summary

PDF172May 23, 2025