VideoGameQA-Bench: 비디오 게임 품질 보증을 위한 비전-언어 모델 평가

초록

비디오 게임이 이제 엔터테인먼트 산업에서 가장 높은 수익을 창출함에 따라, 게임 개발 워크플로우를 최적화하는 것은 이 산업의 지속 가능한 성장을 위해 필수적이 되었습니다. 최근 비전-언어 모델(Vision-Language Models, VLMs)의 발전은 게임 개발의 다양한 측면, 특히 자동화 옵션이 제한적이며 여전히 노동 집약적인 프로세스인 품질 보증(Quality Assurance, QA)을 자동화하고 개선할 수 있는 상당한 잠재력을 제공합니다. 비디오 게임 QA 작업에서 VLMs의 성능을 정확하게 평가하고 실제 시나리오를 처리하는 데 있어 그 효과를 판단하기 위해서는, 이 도메인의 특정 요구 사항을 충족시키기에는 기존 벤치마크가 부족하므로 표준화된 벤치마크가 명확히 필요합니다. 이러한 격차를 해소하기 위해, 우리는 다양한 게임의 이미지와 비디오에 대한 시각적 단위 테스트, 시각적 회귀 테스트, 바늘 찾기 작업, 결함 감지, 버그 리포트 생성 등 광범위한 게임 QA 활동을 포괄하는 종합적인 벤치마크인 VideoGameQA-Bench를 소개합니다. 코드와 데이터는 https://asgaardlab.github.io/videogameqa-bench/에서 확인할 수 있습니다.

English

With video games now generating the highest revenues in the entertainment industry, optimizing game development workflows has become essential for the sector's sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to automate and enhance various aspects of game development, particularly Quality Assurance (QA), which remains one of the industry's most labor-intensive processes with limited automation options. To accurately evaluate the performance of VLMs in video game QA tasks and determine their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain. To bridge this gap, we introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack tasks, glitch detection, and bug report generation for both images and videos of various games. Code and data are available at: https://asgaardlab.github.io/videogameqa-bench/

VideoGameQA-Bench: 비디오 게임 품질 보증을 위한 비전-언어 모델 평가

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

초록

Support