安全かつスケーラブルなWebエージェント学習のための再現Webサイト環境

要旨

自律的なWebエージェントの訓練は、学習元となる環境によって根本的に制限されている。実世界のWebサイトは探索が安全ではなく、リセットが困難で、検証可能なフィードバックをほとんど提供しない。本論文では、言語モデルを環境創造器として扱い、実世界のWebサイトを完全に実行可能で検証可能な合成環境へ自動複製するフレームワーク「VeriEnv」を提案する。Python SDKを介して制御された内部アクセスを公開することで、VeriEnvはエージェントが決定論的かつプログラム的に検証可能な報酬を伴うタスクを自己生成することを可能にし、ヒューリスティックやLLMベースの評価器への依存を排除する。この設計は、安全でない実世界との相互作用からエージェントの学習を分離しつつ、環境拡張を通じたスケーラブルな自己進化を可能にする。Webエージェントベンチマークを用いた実験により、VeriEnvで訓練されたエージェントは未見のWebサイトへ一般化し、自己進化的な訓練を通じてサイト特化的な熟達を達成し、訓練環境数のスケーリングから恩恵を受けることを示す。コードとリソースは採択後、https://github.com/kyle8581/VeriEnv で公開予定である。

English

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.

安全かつスケーラブルなWebエージェント学習のための再現Webサイト環境

Safe and Scalable Web Agent Learning via Recreated Websites

要旨

Support