재생성된 웹사이트를 통한 안전하고 확장 가능한 웹 에이전트 학습

초록

자율 웹 에이전트 훈련은 학습 환경에 의해 근본적으로 제한됩니다: 실제 웹사이트는 탐험하기에 안전하지 않고, 초기화가 어려우며, 검증 가능한 피드백을 거의 제공하지 않습니다. 본 논문에서는 언어 모델을 환경 생성자로 활용하여 실제 웹사이트를 완전히 실행 가능하고 검증 가능한 합성 환경으로 자동 복제하는 VeriEnv 프레임워크를 제안합니다. Python SDK를 통해 제어된 내부 접근을 제공함으로써, VeriEnv는 에이전트가 결정론적이고 프로그램적으로 검증 가능한 보상을 갖는 작업을 자체 생성할 수 있게 하여 휴리스틱 또는 LLM 기반 평가자에 대한 의존성을 제거합니다. 이 설계는 안전하지 않은 실제 상호작용으로부터 에이전트 학습을 분리하면서 환경 확장을 통한 확장 가능한 자기 진화를 가능하게 합니다. 웹 에이전트 벤치마크 실험을 통해 VeriEnv로 훈련된 에이전트가 보지 않은 웹사이트로 일반화되고, 자기 진화 훈련을 통해 사이트 특화 숙달을 달성하며, 훈련 환경 수의 확장으로 이점을 얻음을 보여줍니다. 코드와 리소스는 승인 시 https://github.com/kyle8581/VeriEnv 에 공개될 예정입니다.

English

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.

재생성된 웹사이트를 통한 안전하고 확장 가능한 웹 에이전트 학습

Safe and Scalable Web Agent Learning via Recreated Websites

초록

Support