發現合作管道：序列社會困境中的自主研究

摘要

我們研究了一種用於合作的自動化雙層研究框架：外層AI智能體自主重新設計內層LLM策略合成系統的管線，應用於多智能體順序社會困境。研究者智能體R（以編碼智能體形式運作）會讀取內層原始碼、編輯系統提示詞、回饋函式、輔助函式庫及迭代邏輯，執行評估並決定保留項目，遵循自動研究典範。在兩種遊戲（Cleanup與Gathering）、兩種策略合成LLM及兩種福利目標（功利效率與羅爾斯式最大最小原則）下，研究者均可靠地超越人工設計的基準線，顯著縮小實驗間變異，並勝過僅最佳化提示詞的方法。所發現的管線會依目標而異：只有在最大最小原則下，研究者才會在合成器管線中注入明確的公平機制，而此類機制在其自身無關目標的系統提示詞及所有效率最佳化管線中均不存在。這支持了一種資訊設計的解讀：研究者會根據福利目標，選擇向有限理性的合成器揭露哪些資訊。程式碼位於 https://github.com/vicgalle/autoresearch-social-dilemmas。

English

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSDs). A researcher agent R (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following the autoresearch paradigm. Across two games (Cleanup and Gathering), two policy-synthesizer LLMs, and two welfare objectives (utilitarian efficiency and Rawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports an information-design reading in which the researcher chooses what to reveal to the boundedly rational synthesizer as a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.