时序演绎:游戏RAG基准中双重动态与真实性的建模框架
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
October 21, 2025
作者: Liyang He, Yuren Zhang, Ziwei Zhu, Zhenghui Li, Shiwei Tong
cs.AI
摘要
在动态领域(如在线游戏)中,检索增强生成系统日益重要,但专用基准的缺失阻碍了该领域的标准化评估。核心难点在于双重动态性:游戏内容更新与玩家社群关注点转移之间的持续相互作用。此外,自动化基准构建需满足以玩家为中心的真实性要求,确保生成的问题符合实际场景。针对这一综合性挑战,我们提出ChronoPlay——一种用于自动化持续生成游戏RAG基准的创新框架。该框架采用双重动态更新机制追踪两类变化,并通过融合官方资源与玩家社群的双源合成引擎,同时保障事实准确性与查询模式真实性。我们在三款差异化游戏上实例化该框架,创建了游戏领域首个动态RAG基准,为复杂现实条件下模型性能评估提供了新视角。代码已开源:https://github.com/hly1998/ChronoPlay。
English
Retrieval Augmented Generation (RAG) systems are increasingly vital in
dynamic domains like online gaming, yet the lack of a dedicated benchmark has
impeded standardized evaluation in this area. The core difficulty lies in Dual
Dynamics: the constant interplay between game content updates and the shifting
focus of the player community. Furthermore, the necessity of automating such a
benchmark introduces a critical requirement for player-centric authenticity to
ensure generated questions are realistic. To address this integrated challenge,
we introduce ChronoPlay, a novel framework for the automated and continuous
generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update
mechanism to track both forms of change, and a dual-source synthesis engine
that draws from official sources and player community to ensure both factual
correctness and authentic query patterns. We instantiate our framework on three
distinct games to create the first dynamic RAG benchmark for the gaming domain,
offering new insights into model performance under these complex and realistic
conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.