経済学者のように推論する：経済問題に関する事後学習がLLMに戦略的一般化を誘発する

要旨

大規模言語モデル（LLM）をマルチエージェントシステム（MAS）向けに直接トレーニングすることは、複雑な報酬モデリング、動的なエージェント間相互作用、そして高い汎化要件のため、依然として困難です。本論文では、特に教師ありファインチューニング（SFT）と検証可能な報酬を用いた強化学習（RLVR）といったポストトレーニング技術が、マルチエージェントシナリオに効果的に汎化できるかどうかを探ります。我々は経済推論をテストベッドとして活用し、その数学的およびゲーム理論的基盤の強さ、構造化された分析的推論の必要性、そして市場設計、資源配分、政策分析といった実世界の応用との関連性を利用します。我々はRecon（Reasoning like an ECONomist）を紹介します。これは2,100の高品質な経済推論問題からなる手作業でキュレートされたデータセットでポストトレーニングされた7BパラメータのオープンソースLLMです。経済推論ベンチマークとマルチエージェントゲームにおける包括的な評価は、構造化された推論と経済的合理性の明確な改善を示しています。これらの結果は、ドメインに沿ったポストトレーニングが推論とエージェントの整合性を向上させる可能性を強調し、SFTとRLがモデルの振る舞いを形成する上での役割に光を当てます。コードはhttps://github.com/MasterZhou1/Recon で利用可能です。

English

Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements. This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively generalize to multi-agent scenarios. We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis. We introduce Recon (Reasoning like an ECONomist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems. Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality. These results underscore the promise of domain-aligned post-training for enhancing reasoning and agent alignment, shedding light on the roles of SFT and RL in shaping model behavior. Code is available at https://github.com/MasterZhou1/Recon .

経済学者のように推論する：経済問題に関する事後学習がLLMに戦略的一般化を誘発する

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

要旨

Support