远程劳动指数:衡量远程工作的人工智能自动化程度
Remote Labor Index: Measuring AI Automation of Remote Work
October 30, 2025
作者: Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Ye Htet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernandez Cardona, Annette Diamond, Summer Yue, Alexandr Wang, Bing Liu, Ernesto Hernandez, Dan Hendrycks
cs.AI
摘要
人工智能在知识推理类研究型基准测试中取得了飞速进展,但这些成果如何转化为经济价值与自动化效能仍不明确。为量化这一转化效果,我们推出远程劳动指数——一个涵盖多领域的综合性基准体系,通过真实场景中具有经济价值的项目来评估实际环境下的端到端智能体表现。当前AI智能体在该指数中的表现接近基准下限,表现最优的智能体仅实现2.5%的自动化率。这些研究结果将人工智能自动化的讨论锚定于实证依据,为追踪AI影响建立统一基准,助力利益相关者主动应对AI驱动的劳动力自动化变革。
English
AIs have made rapid progress on research-oriented benchmarks of knowledge and
reasoning, but it remains unclear how these gains translate into economic value
and automation. To measure this, we introduce the Remote Labor Index (RLI), a
broadly multi-sector benchmark comprising real-world, economically valuable
projects designed to evaluate end-to-end agent performance in practical
settings. AI agents perform near the floor on RLI, with the highest-performing
agent achieving an automation rate of 2.5%. These results help ground
discussions of AI automation in empirical evidence, setting a common basis for
tracking AI impacts and enabling stakeholders to proactively navigate AI-driven
labor automation.