ChronoPlay: ゲームRAGベンチマークにおける二重ダイナミクスと真正性のモデリングフレームワーク

要旨

Retrieval Augmented Generation (RAG) システムは、オンラインゲームのような動的領域において重要性を増しているが、この分野における標準化された評価は、専用ベンチマークの欠如によって妨げられてきた。中核的な難しさは、デュアルダイナミクス、すなわちゲームコンテンツの更新とプレイヤーコミュニティの注目の変化との間で絶えず行われる相互作用にある。さらに、このようなベンチマークを自動化する必要性は、生成される質問を現実的なものにするために、プレイヤー中心の真正性という重要な要件をもたらす。この統合された課題に取り組むため、我々はゲームRAGベンチマークを自動的かつ継続的に生成する新しいフレームワーク、ChronoPlayを提案する。ChronoPlayは、両方の変化を追跡するデュアルダイナミック更新メカニズムと、公式情報源とプレイヤーコミュニティの両方から情報を引き出して事実の正確性と質問パターンの真正性の両方を確保するデュアルソース合成エンジンを利用する。我々はこのフレームワークを3つの異なるゲームに適用し、ゲーム領域初の動的RAGベンチマークを構築した。これにより、これらの複雑で現実的な条件下でのモデル性能に関する新たな知見が得られる。コードは https://github.com/hly1998/ChronoPlay で公開されている。

English

Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.