ChatPaper.aiChatPaper

MeepleLM:模拟多样化主观体验的虚拟游戏测试员

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

January 12, 2026
作者: Zizhen Li, Chuanhao Li, Yibin Wang, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Yifei Huang, Kaipeng Zhang
cs.AI

摘要

近期研究进展已将大语言模型在棋盘游戏中的角色从游戏代理扩展至创意协同设计者。然而当前系统存在关键缺陷:缺乏基于 emergent 用户体验的建构性批判能力。弥补这一差距对实现人机协作和谐至关重要,它既能通过外部视角助力设计师完善创作,又可引导模型规避偏见或不可预测的结果。棋盘游戏自动批判面临双重挑战:在缺乏显式引擎的情况下推断规则与游戏体验间的潜在动态关联,以及建模不同玩家群体的主观异质性。为此,我们构建了包含1,727份结构校正规则书和15万条经质量评分与多维度采样筛选的评论数据集,并引入机制-动态-美学(MDA)推理框架显式弥合书面规则与玩家体验间的因果鸿沟。我们进一步提炼玩家画像,提出MeepleLM模型——该专用模型内化了基于画像的推理模式,能精准模拟不同玩家原型的主观反馈。实验表明,MeepleLM在社区契合度与批判质量上显著优于最新商用模型(如GPT-5.1、Gemini3-Pro),在实用性评估的用户研究中获得70%的偏好率。该模型可作为通用交互系统的可靠虚拟试玩员,标志着向受众对齐、体验感知型人机协作迈出关键一步。
English
Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.
PDF82January 27, 2026