WebGen-R1: 강화 학습을 통해 기능적이고 미적 감각을 갖춘 웹사이트 생성을 유도하는 대규모 언어 모델

초록

대규모 언어 모델(LLM)은 함수 수준의 코드 생성에서는 뛰어난 성능을 보이지만, 기능적이고 시각적으로 미적이며 다중 페이지로 구성된 웹사이트를 생성하는 프로젝트 수준의 작업은 여전히 매우 어려운 과제로 남아 있습니다. 기존 연구들은 주로 단일 페이지 정적 웹사이트에 국한되는 반면, 에이전트 기반 프레임워크는 일반적으로 독점 모델을 이용한 다중 턴 실행에 의존하여 상당한 토큰 비용, 높은 지연 시간, 취약한 통합을 초래합니다. 강화 학습(RL)을 통해 소규모 LLM을 종단간(end-to-end)으로 훈련하는 것은 유망한 대안이지만, 웹사이트 생성을 위해 신뢰할 수 있고 계산적으로 실현 가능한 보상을 설계하는 데 있어 중요한 병목 현상에 직면합니다. 단위 테스트로 검증 가능한 단일 파일 코딩 작업과 달리, 웹사이트 생성은 본질적으로 주관적인 미적 요소, 페이지 간 상호작용, 기능적 정확성을 평가해야 합니다. 이를 위해 우리는 프로젝트 수준 웹사이트 생성을 위해 맞춤화된 종단간 RL 프레임워크인 WebGen-R1을 제안합니다. 먼저, 방대하고 개방된 행동 공간을 제약하고 아키텍처 무결성을 유지하는 스캐폴드 기반 구조적 생성 패러다임을 소개합니다. 그런 다음, 구조적 보장과 실행 기반 기능적 피드백, 그리고 비전 기반 미적 평가를 원활하게 결합한 새로운 계단식 다중모달 보상을 설계합니다. 광범위한 실험을 통해 우리의 WebGen-R1이 7B 기반 모델이 거의 기능하지 않는 웹사이트를 생성하던 상태에서 배포 가능하고 미적으로 정렬된 다중 페이지 웹사이트를 생산하는 모델로 크게 변모시킨다는 것을 입증했습니다. 주목할 만하게도, 우리의 WebGen-R1은 대규모 오픈소스 모델(최대 72B)을 꾸준히 능가할 뿐만 아니라, 기능적 성공률에서는 최첨단 DeepSeek-R1(671B)과도 맞서며, 유효 렌더링 및 미적 정렬 측면에서는 이를 크게 능가합니다. 이러한 결과는 WebGen-R1이 소규모 오픈 모델의 규모를 함수 수준 코드 생성에서 프로젝트 수준 웹 애플리케이션 생성으로 확장하는 실현 가능한 경로로 자리매김하게 합니다.

English

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execution with proprietary models, leading to substantial token costs, high latency, and brittle integration. Training a small LLM end-to-end with reinforcement learning (RL) is a promising alternative, yet it faces a critical bottleneck in designing reliable and computationally feasible rewards for website generation. Unlike single-file coding tasks that can be verified by unit tests, website generation requires evaluating inherently subjective aesthetics, cross-page interactions, and functional correctness. To this end, we propose WebGen-R1, an end-to-end RL framework tailored for project-level website generation. We first introduce a scaffold-driven structured generation paradigm that constrains the large open-ended action space and preserves architectural integrity. We then design a novel cascaded multimodal reward that seamlessly couples structural guarantees with execution-grounded functional feedback and vision-based aesthetic supervision. Extensive experiments demonstrate that our WebGen-R1 substantially transforms a 7B base model from generating nearly nonfunctional websites into producing deployable, aesthetically aligned multi-page websites. Remarkably, our WebGen-R1 not only consistently outperforms heavily scaled open-source models (up to 72B), but also rivals the state-of-the-art DeepSeek-R1 (671B) in functional success, while substantially exceeding it in valid rendering and aesthetic alignment. These results position WebGen-R1 as a viable path for scaling small open models from function-level code generation to project-level web application generation.

WebGen-R1: 강화 학습을 통해 기능적이고 미적 감각을 갖춘 웹사이트 생성을 유도하는 대규모 언어 모델

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

초록

Support