ChatPaper.aiChatPaper

探討LLM風險決策中的結果層面相似性與機制層面對齊:來自聖彼得堡遊戲的證據

Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game

June 3, 2026
作者: Chensong Huang, Changyu Chen, Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo
cs.AI

摘要

LLM在風險決策任務中可能表現出謹慎,但看似謹慎的輸出並不一定代表其與人類決策機制一致。我們以聖彼得堡博弈作為受控測試平台,探討此區別——該經典悖論中期望報酬無限,但人類通常願意支付較低且有限的價格。我們評估了28個LLM,使用結構化提示組合,包含原始博弈;操縱截斷、重複遊戲、數值稟賦及職業身份的控制決策變體;要求模型以人類決策者角度推理的人類視角提示;以及基礎模型與其指令微調版本的配對比較。在原始博弈中,多數模型產生有限出價,呈現類似人類的風險行為。然而,此結果層面的相似性掩蓋了顯著的機制層面差異。控制變體顯示,模型往往偏離原始博弈中的人類行為,轉向條件性與計算理性行為。人類線索提示與指令微調通常降低出價並減少某些可見病態,但多數機制層面的反應模式仍大致不變。這些發現表明,風險決策中的行為對齊可能僅是表面層面的:LLM能產生類人類風險決策,卻不展現人類一致的機制。因此,高風險LLM決策評估應超越結果相似性,審視對齊是否建立在機制層面的一致性之上。
English
LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is infinite, yet humans typically report low, finite willingness to pay. We evaluate 28 LLMs with a structured prompt suite that includes the original game; controlled decision variants that perturb truncation, repeated play, numeric endowment, and occupational identity; a human-perspective prompt that asks models to reason as human decision makers; and paired comparisons between base models and their instruction-tuned counterparts. In the original game, most models generate finite bids, creating the appearance of human-like risk behavior. However, this outcome-level resemblance masks substantial mechanism-level differences. The controlled variants reveal that rather than maintaining human-like behavior seen in the original game, models often shift to conditionally and computationally rational behavior. Human-cue prompting and instruction tuning often lower bids and reduce some visible pathologies, but most mechanism-level response patterns remain largely unchanged. These findings show that behavioral alignment in risk decision-making can be surface-level: LLMs may produce human-like risk decisions without exhibiting human-consistent mechanisms. High-stakes evaluations of LLM decision-making should therefore move beyond outcome similarity and examine whether the alignment is supported by mechanism-level consistency.