ChatPaper.aiChatPaper

PokeRL:基于强化学习的《宝可梦红》游戏策略研究

PokeRL: Reinforcement Learning for Pokemon Red

April 12, 2026
作者: Dheeraj Mudireddy, Sai Patibandla
cs.AI

摘要

《寶可夢 紅》作為一款長線日式角色扮演遊戲,其獎勵機制稀疏、環境可觀測性受限,加之獨特的操作邏輯,使其成為強化學習領域中極具挑戰性的測試平台。儘管近期研究表明,基於PPO的智能體能通過精細設計的獎勵函數與觀測空間突破前兩個道館,但實戰訓練仍存在脆弱性——智能體常陷入動作循環、選單濫用或無效遊蕩等異常行為。本文提出PokeRL模組化系統,該系統通過深度強化學習訓練智能體完成《寶可夢 紅》早期任務,包括離開玩家住宅、探索真新鎮至進入草叢區域,以及贏得首場宿敵對戰。我們的核心貢獻在於:基於PyBoy模擬器構建具備地圖遮罩功能的循環感知環境封裝層、多層級防循環與防濫用機制,以及密集分層獎勵設計。我們主張,像PokeRL這類能顯式建模循環與濫用等失敗模式的實用系統,是連接玩具基準測試與完整寶可夢聯盟冠軍智能體的必要過渡階段。程式碼已開源於:https://github.com/reddheeraj/PokemonRL
English
Pokemon Red is a long-horizon JRPG with sparse rewards, partial observability, and quirky control mechanics that make it a challenging benchmark for reinforcement learning. While recent work has shown that PPO agents can clear the first two gyms using heavy reward shaping and engineered observations, training remains brittle in practice, with agents often degenerating into action loops, menu spam, or unproductive wandering. In this paper, we present PokeRL, a modular system that trains deep reinforcement learning agents to complete early game tasks in Pokemon Red, including exiting the player's house, exploring Pallet Town to reach tall grass, and winning the first rival battle. Our main contributions are a loop-aware environment wrapper around the PyBoy emulator with map masking, a multi-layer anti-loop and anti-spam mechanism, and a dense hierarchical reward design. We argue that practical systems like PokeRL, which explicitly model failure modes such as loops and spam, are a necessary intermediate step between toy benchmarks and full Pokemon League champion agents. Code is available at https://github.com/reddheeraj/PokemonRL
PDF11April 16, 2026