垄断交易：有限单边响应博弈的基准环境

摘要

卡牌游戏被广泛用于研究不确定性下的序列决策问题，在谈判、金融和网络安全领域具有现实对应模型。根据控制流模式，这类游戏通常可分为三类：严格顺序型（玩家轮替执行单动作）、确定性响应型（特定动作触发固定结果）以及无界互惠响应型（允许交替反制）。一种研究较少但策略丰富的结构是有限单边响应机制——当玩家行动短暂将控制权转移给对手时，对手必须通过一个或多个操作满足固定条件才能结束回合。我们将具有此机制的游戏称为有限单边响应游戏（BORGs）。我们以改良版《地产大亨卡牌游戏》作为基准环境来隔离这种动态机制，其中"收取租金"行动会强制对手选择支付资产。金牌算法反事实遗憾最小化（CFR）无需新算法扩展即可收敛于有效策略。我们构建的轻量级全栈研究平台整合了游戏环境、并行化CFR运行时及可人机对战的网页界面。训练完成的CFR智能体及源代码已发布于https://monopolydeal.ai。

English

Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. These games typically fall into three categories based on the flow of control: strictly sequential (players alternate single actions), deterministic response (some actions trigger a fixed outcome), and unbounded reciprocal response (alternating counterplays are permitted). A less-explored but strategically rich structure is the bounded one-sided response, where a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs). We introduce a modified version of Monopoly Deal as a benchmark environment that isolates this dynamic, where a Rent action forces the opponent to choose payment assets. The gold-standard algorithm, Counterfactual Regret Minimization (CFR), converges on effective strategies without novel algorithmic extensions. A lightweight full-stack research platform unifies the environment, a parallelized CFR runtime, and a human-playable web interface. The trained CFR agent and source code are available at https://monopolydeal.ai.