대규모 장기 생산성 시뮬레이션을 위한 합성 컴퓨터

초록

현실적인 장기간 생산성 작업은 사용자별 컴퓨터 환경에 크게 좌우되며, 작업 컨텍스트의 상당 부분은 디렉토리 구조와 콘텐츠가 풍부한 아티팩트를 통해 저장 및 구성됩니다. 이러한 생산성 시나리오에 대한 합성 데이터 생성의 규모를 확장하기 위해, 우리는 현실적인 폴더 계층 구조와 콘텐츠가 풍부한 아티팩트(예: 문서, 스프레드시트, 프레젠테이션)를 갖춘 환경을 생성하는 확장 가능한 방법론인 Synthetic Computers at Scale을 소개합니다. 각 합성 컴퓨터를 기반으로 장기 시뮬레이션을 실행합니다. 한 에이전트는 컴퓨터 사용자에게 특화되고 여러 전문 산출물과 약 한 달 분량의 인간 작업을 필요로 하는 생산성 목표를 생성합니다. 그런 다음 다른 에이전트가 해당 사용자 역할을 맡아 이러한 목표가 완료될 때까지 컴퓨터 전체에서 지속적으로 작업합니다(예: 파일 시스템 탐색을 통한 근거 설정, 시뮬레이션된 협력자와의 조정, 전문 아티팩트 생성). 예비 실험에서 우리는 1,000개의 합성 컴퓨터를 생성하고 이들에 대해 장기 시뮬레이션을 실행했습니다. 각 실행에는 8시간 이상의 에이전트 실행 시간이 소요되며 평균 2,000턴 이상을 포괄합니다. 이러한 시뮬레이션은 풍부한 경험적 학습 신호를 생성하며, 그 효과는 도메인 내 및 도메인 외 생산성 평가에서의 에이전트 성능 향상을 통해 검증되었습니다. 페르소나가 수십억 규모로 풍부하다는 점을 고려할 때, 이 방법론은 충분한 컴퓨팅 자원이 있다면 원칙적으로 수백만乃至수십억 개의 합성 사용자 세계로 확장될 수 있어 다양한 직업, 역할, 컨텍스트, 환경 및 생산성 요구사항을 더 넓게 포괄할 수 있습니다. 우리는 확장 가능한 합성 컴퓨터 생성과 대규모 시뮬레이션의 결합이 장기간 생산성 시나리오에서 에이전트 자기 개선 및 에이전트 강화 학습을 위한 기반 기질로서 매우 유망하다고 주장합니다.

English

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

대규모 장기 생산성 시뮬레이션을 위한 합성 컴퓨터

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

초록

Support