发现协作管道：序贯社会困境中的自主研究

摘要

我们研究用于合作的两级自动研究框架：外层AI智能体自主重新设计内层流水线，该流水线用于多智能体序列社会困境（SSDs）的LLM策略合成系统。研究者智能体R（作为编码智能体运行）读取内层源代码，编辑系统提示、反馈函数、辅助库和迭代逻辑，运行评估并决定保留哪些内容，遵循自动研究范式。在两个游戏（Cleanup和Gathering）、两个策略合成LLM以及两个福利目标（功利主义效率和罗尔斯最大化最小原则）下，研究者可靠地超过手工设计的基线，大幅缩小运行间方差，并优于仅提示优化。发现的流水线依赖于目标：仅在最大化最小原则下，研究者向合成器流水线注入了明确的公平性机制，而这类机制在其自身目标无关的系统提示和每个效率优化的流水线中都不存在。这支持了一种信息设计解读，即研究者根据福利目标选择向有限理性的合成器揭示什么。代码见https://github.com/vicgalle/autoresearch-social-dilemmas。

English

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSDs). A researcher agent R (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following the autoresearch paradigm. Across two games (Cleanup and Gathering), two policy-synthesizer LLMs, and two welfare objectives (utilitarian efficiency and Rawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports an information-design reading in which the researcher chooses what to reveal to the boundedly rational synthesizer as a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.