ChatPaper.aiChatPaper

GTAlign:基於博弈論的LLM助手對齊以促進共同福祉

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

October 10, 2025
作者: Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
cs.AI

摘要

大型語言模型(LLMs)在推理方面取得了顯著進展,但在寫作、信息檢索或提供實用指導等任務中,有時會產生對用戶而言次優的回應。傳統的對齊實踐通常假設最大化模型獎勵也等同於最大化用戶福祉,但這一假設在實踐中往往不成立:當用戶偏好簡潔答案時,模型可能會過度解釋或生成冗長的推理。這類行為類似於囚徒困境,其中個體理性的選擇導致了社會次優的結果。根本挑戰在於缺乏一種既有利於LLM又有利於用戶的原則性決策機制。我們提出了博弈論對齊(GTAlign),這是一個將博弈論決策整合到推理和訓練中的對齊框架。在推理過程中,模型明確將用戶與LLM的互動視為策略性博弈:它在推理鏈中構建收益矩陣,以估算自身與用戶的福祉,然後選擇對雙方都有利的行動。在訓練階段,我們引入了一種互惠福祉獎勵,強化合作性回應,使模型行為與社會效率結果保持一致。此外,我們還引入了一種推理技術,利用博弈論推理在LLM服務定價策略變化時動態調整LLM的回應。大量實驗表明,與基線相比,GTAlign在多樣化任務中顯著提升了推理效率、答案質量及互惠福祉。代碼已公開於https://github.com/ulab-uiuc/GTAlign。
English
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
PDF23October 13, 2025