협력적 파이프라인 발견: 순차적 사회적 딜레마를 위한 자동 연구

초록

우리는 협력을 위한 2단계 자동 연구(auto research)를 연구한다: 외부 루프 AI 에이전트가 다중 에이전트 순차적 사회적 딜레마(SSD)를 위한 LLM 정책 합성 시스템의 내부 루프 파이프라인을 자율적으로 재설계한다. 연구자 에이전트 R(코딩 에이전트로 실행됨)은 내부 루프 소스 코드를 읽고, 시스템 프롬프트, 피드백 함수, 헬퍼 라이브러리 및 반복 로직을 편집하며, 평가를 실행하고, 유지할 항목을 결정함으로써 자동 연구 패러다임을 따른다. 두 게임(Cleanup 및 Gathering), 두 개의 정책 합성기 LLM, 두 가지 복지 목표(공리주의적 효율성 및 롤스적 맥시민)에 걸쳐, 연구자는 수동 설계 기준선을 확실히 능가하고, 실행 간 분산을 급격히 좁히며, 프롬프트 전용 최적화보다 성능이 우수하다. 발견된 파이프라인은 목표 의존적이다: 맥시민 하에서만 연구자는 합성기 파이프라인에 명시적 공정성 메커니즘을 주입하는데, 이는 연구자 자신의 목표 비의존적 시스템 프롬프트와 모든 효율성 최적화 파이프라인에는 없는 메커니즘 유형이다. 이는 연구자가 복지 목표의 함수로서 제한적 합리성을 가진 합성기에게 무엇을 공개할지 선택하는 정보 설계(information-design) 해석을 뒷받침한다. 코드는 https://github.com/vicgalle/autoresearch-social-dilemmas에서 확인할 수 있다.

English

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSDs). A researcher agent R (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following the autoresearch paradigm. Across two games (Cleanup and Gathering), two policy-synthesizer LLMs, and two welfare objectives (utilitarian efficiency and Rawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports an information-design reading in which the researcher chooses what to reveal to the boundedly rational synthesizer as a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.