大規模合成コンピュータによる長期的生産性シミュレーション

要旨

現実的な長期生産性タスクは、ユーザー固有のコンピュータ環境に強く依存しており、作業コンテキストの多くはディレクトリ構造とコンテンツ豊富な成果物を通じて保存・整理されている。このような生産性シナリオ向け合成データ作成を拡張するため、我々はSynthetic Computers at Scaleを提案する。これは、現実的なフォルダ階層とコンテンツ豊富な成果物（文書、スプレッドシート、プレゼンテーションなど）を備えた環境を構築するスケーラブルな手法である。各合成コンピュータを条件として、長期シミュレーションを実行する。一方のエージェントが、コンピュータのユーザーに特化し、複数の専門的な成果物と約1ヶ月分の人間の作業を要する生産性目標を作成し、もう一方のエージェントがそのユーザーとして動作し、これらの目標が完了するまで、例えばファイルシステムを操作して接地を行い、シミュレートされた協力者と調整し、専門的な成果物を生成するといった作業をコンピュータ上で継続する。予備実験では、1,000台の合成コンピュータを作成し、それらで長期シミュレーションを実行した。各実行には8時間以上のエージェント実行時間を要し、平均2,000ターン以上に及んだ。これらのシミュレーションは豊富な経験的学習信号を生成し、その有効性は、ドメイン内およびドメイン外の生産性評価におけるエージェント性能の顕著な改善によって実証されている。人物像が数十億規模で存在することを考慮すると、この手法は原理的に、十分な計算資源があれば数百万乃至数十億の合成ユーザーワールドへ拡張可能であり、多様な職業、役割、文脈、環境、生産性ニーズの広範なカバレッジを実現する。我々は、スケーラブルな合成コンピュータ作成と大規模シミュレーションの組み合わせが、長期生産性シナリオにおけるエージェント自己改善とエージェント強化学習の基盤として極めて有望であると主張する。

English

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

大規模合成コンピュータによる長期的生産性シミュレーション

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

要旨

Support