规模化合成计算机在长期生产力模拟中的应用

摘要

现实场景中的长周期生产力工作高度依赖于用户特定的计算机环境，其中大部分工作上下文通过目录结构和内容丰富的文件进行存储与组织。为在此类生产力场景中实现合成数据的规模化生成，我们提出"规模化合成计算机"方法——一种可扩展的技术方案，用于创建具有逼真文件夹层级和丰富内容文件（如文档、表格、演示文稿）的虚拟环境。基于每个合成计算机，我们运行长周期模拟：一个智能体生成与该计算机用户相关的生产力目标，这些目标需要交付多个专业成果并耗费约一个月的人工工时；另一个智能体则扮演该用户角色，持续在计算机环境中开展工作——包括通过文件系统进行任务定位、与模拟协作者协调配合、产出专业文件等——直至完成所有目标。在初步实验中，我们创建了1,000台合成计算机并运行长周期模拟，每次模拟平均需要超过8小时的智能体运行时间，涉及2,000余次交互操作。这些模拟产生了丰富的经验学习信号，其有效性通过智能体在领域内和跨领域生产力评估中的显著性能提升得到验证。鉴于人格角色数据可达十亿规模，该方法论原则上可扩展至数百万甚至数十亿个合成用户世界（在充足算力支持下），从而实现对不同职业、角色、场景、环境和生产力需求的广泛覆盖。我们认为，可扩展的合成计算机创建与大规模模拟相结合，极有希望成为长周期生产力场景中智能体自我改进与智能体强化学习的基础支撑平台。

English

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.