ChatPaper.aiChatPaper

GameTalk:训练大型语言模型进行策略性对话

GameTalk: Training LLMs for Strategic Conversation

January 22, 2026
作者: Victor Conchello Vendrell, Max Ruiz Luyten, Mihaela van der Schaar
cs.AI

摘要

在多智能体环境中进行战略决策是大型语言模型面临的关键挑战,尤其在需要通过多轮对话实现协作与协商的场景下。尽管近期研究探索了LLM在独立决策任务中的应用,但如何通过对话优化长期目标的研究仍属空白。我们提出GameTalk框架,通过多轮交互训练LLM进行战略决策。与以往聚焦单轮目标或静态动作预测的研究不同,我们训练LLM在整个对话过程中优化全局目标。通过改进GRPO、DPO和STaR等微调方法,使其能够融合依赖完整交互过程的奖励信号。我们在系列复杂度递增的博弈环境中评估该方法,这些环境专门用于检验推理、协作和对手建模等不同能力。实验结果表明,GameTalk显著优于未训练模型,尤其在奖励塑形条件下表现突出,其中DPO方法持续带来最大性能提升。这些发现表明,对话式微调是实现LLM在交互环境中推理、协商与行动的有效路径。
English
Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce GameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.
PDF84January 27, 2026