WebGen-R1：強化学習による大規模言語モデルへの機能性と美的感性を兼ね備えたWebサイト生成の促進

要旨

大規模言語モデル（LLM）は関数レベルのコード生成において優れた性能を発揮するが、機能的かつ視覚的に美的な複数ページから成るウェブサイトの生成といったプロジェクトレベルのタスクは依然として非常に困難である。既存の研究は単一ページの静的なウェブサイトに限定されることが多く、エージェント型フレームワークは通常、プロプライエタリなモデルを用いたマルチターン実行に依存するため、多大なトークンコスト、高いレイテンシ、脆弱な統合という問題が生じる。強化学習（RL）を用いて小型LLMをエンドツーエンドで訓練することは有望な代替手段であるが、ウェブサイト生成において信頼性が高く計算量的に実行可能な報酬を設計するという重大なボトルネックに直面する。単一ファイルのコーディングタスクが単体テストで検証可能であるのに対し、ウェブサイト生成には、本質的に主観的な美的要素、ページ間の相互作用、機能的正しさの評価が必要となる。そこで我々は、プロジェクトレベルのウェブサイト生成に特化したエンドツーエンドRLフレームワークであるWebGen-R1を提案する。まず、広大なオープンエンドな行動空間を制約し、アーキテクチャの完全性を保持するスキャフォールド駆動の構造化生成パラダイムを導入する。次に、構造的保証と実行に基づく機能的反饋、および視覚に基づく美的監督をシームレスに結合する新しいカスケード型マルチモーダル報酬を設計する。大規模な実験により、我々のWebGen-R1が、7Bのベースモデルを、ほぼ機能しないウェブサイトを生成する状態から、デプロイ可能で美的に調整された複数ページのウェブサイトを生成するように大幅に変革することが実証された。特筆すべきは、我々のWebGen-R1が、大規模なオープンソースモデル（最大72B）を一貫して凌駕するだけでなく、機能的な成功度において最新のDeepSeek-R1（671B）と互角であり、有効なレンダリングと美的整合性においてそれを大幅に上回ることである。これらの結果は、WebGen-R1が、小型のオープンモデルを関数レベルのコード生成からプロジェクトレベルのウェブアプリケーション生成へとスケーリングするための実行可能な道筋であることを示している。

English

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execution with proprietary models, leading to substantial token costs, high latency, and brittle integration. Training a small LLM end-to-end with reinforcement learning (RL) is a promising alternative, yet it faces a critical bottleneck in designing reliable and computationally feasible rewards for website generation. Unlike single-file coding tasks that can be verified by unit tests, website generation requires evaluating inherently subjective aesthetics, cross-page interactions, and functional correctness. To this end, we propose WebGen-R1, an end-to-end RL framework tailored for project-level website generation. We first introduce a scaffold-driven structured generation paradigm that constrains the large open-ended action space and preserves architectural integrity. We then design a novel cascaded multimodal reward that seamlessly couples structural guarantees with execution-grounded functional feedback and vision-based aesthetic supervision. Extensive experiments demonstrate that our WebGen-R1 substantially transforms a 7B base model from generating nearly nonfunctional websites into producing deployable, aesthetically aligned multi-page websites. Remarkably, our WebGen-R1 not only consistently outperforms heavily scaled open-source models (up to 72B), but also rivals the state-of-the-art DeepSeek-R1 (671B) in functional success, while substantially exceeding it in valid rendering and aesthetic alignment. These results position WebGen-R1 as a viable path for scaling small open models from function-level code generation to project-level web application generation.

WebGen-R1：強化学習による大規模言語モデルへの機能性と美的感性を兼ね備えたWebサイト生成の促進

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

要旨

Support