GTAlign:基于博弈论的大语言模型助手对齐以实现共同福祉
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
October 10, 2025
作者: Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
cs.AI
摘要
大型语言模型(LLMs)在推理方面取得了显著进展,但在诸如写作、信息检索或提供实用指导等任务中,有时生成的回答对用户而言并非最佳。传统的对齐实践通常假设最大化模型奖励即等同于最大化用户福祉,然而这一假设在实践中往往不成立:模型可能会过度解释或生成冗长的推理过程,而用户则偏好简洁的答案。此类行为类似于囚徒困境,其中个体理性的选择导致了社会次优的结果。根本挑战在于缺乏一种原则性的决策机制,能够同时惠及LLM与用户。我们提出了博弈论对齐(GTAlign),一个将博弈论决策融入推理与训练的对齐框架。在推理过程中,模型明确将用户与LLM的互动视为策略博弈:它在推理链中构建收益矩阵,以估算自身与用户的福祉,随后选择对双方均有利的行动。在训练阶段,我们引入了一种互惠奖励机制,强化合作性回应,使模型行为与社会效率结果对齐。此外,我们还引入了一种推理技术,利用博弈论推理在LLM服务定价策略变化时动态调整LLM的响应。大量实验表明,与基线相比,GTAlign在多种任务中显著提升了推理效率、回答质量及互惠福祉。代码已发布于https://github.com/ulab-uiuc/GTAlign。
English
Large Language Models (LLMs) have achieved remarkable progress in reasoning,
yet sometimes produce responses that are suboptimal for users in tasks such as
writing, information seeking, or providing practical guidance. Conventional
alignment practices typically assume that maximizing model reward also
maximizes user welfare, but this assumption frequently fails in practice:
models may over-clarify or generate overly verbose reasoning when users prefer
concise answers. Such behaviors resemble the prisoner's dilemma, where
individually rational choices lead to socially suboptimal outcomes. The
fundamental challenge is the lack of a principled decision making mechanism
that mutually benefits both the LLM and the user. We propose Game-Theoretic
Alignment (GTAlign), an alignment framework that integrates game-theoretic
decision making into both reasoning and training. During reasoning, the model
explicitly treats user-LLM interaction as a strategic game: it constructs
payoff matrices within its reasoning chain to estimate welfare for both itself
and the user, and then selects actions that are mutually beneficial. During
training, we introduce a mutual welfare reward that reinforces cooperative
responses, aligning model behavior with socially efficient outcomes. In
addition, we introduce an inference technique that leverages game-theoretic
reasoning to dynamically adapt LLM's response when pricing policies of LLM
service change. Extensive experiments demonstrate that GTAlign substantially
improves reasoning efficiency, answer quality, and mutual welfare compared to
baselines across diverse tasks. The code is available at
https://github.com/ulab-uiuc/GTAlign .