AlphaTransit: 学习设计城市级公交路线

摘要

设计公交网络需要大量连续的线路延伸决策，但这些决策的质量往往只有在完整网络构建完成后方能显现。这种延迟反馈的挑战正是公交线路网络设计问题（TRNDP）的核心所在——线路间的相互作用可能具有欺骗性：看似局部有益的延伸可能造成换乘瓶颈、产生冗余重叠或降低整体通行能力。为在延迟模拟器反馈下指导线路构建，我们提出AlphaTransit——一种面向城市规模公交网络设计的搜索式规划框架。AlphaTransit将蒙特卡洛树搜索（MCTS）与神经策略-价值网络相结合：策略网络负责提出线路延伸方案，价值网络评估下游设计质量，搜索机制则利用这些预测来优化每个决策。这种设计使得线路构建过程中无需在搜索树内运行模拟器rollout即可实现决策时前瞻。我们在布卢明顿TRNDP新基准上（基于真实道路拓扑与人口普查需求数据）评估了AlphaTransit在混合及全公交需求场景下的表现。在布卢明顿网络中，AlphaTransit在两种需求场景下均实现了最高服务率，分别达到54.6%和82.1%。相较于无搜索的强化学习方法，服务率分别提升9.9%和11.4%；相较于无学习引导的MCTS方法，服务率分别提升2.5%和11.2%。结果表明，将学习引导与MCTS相结合对于公交网络设计而言，比单独使用任一方法更为有效。我们的代码与数据已开源至https://github.com/poudel-bibek/AlphaTransit。

English

Designing a transit network requires many sequential route extension decisions, but their quality is often visible only after the full network is assembled. This delayed-feedback challenge lies at the heart of the Transit Route Network Design Problem (TRNDP), where route interactions can be deceptive: an extension that appears useful locally can create transfer bottlenecks, produce redundant overlap, or reduce overall throughput. To guide route construction under delayed simulator feedback, we introduce AlphaTransit, a search-based planning framework for cityscale bus network design. AlphaTransit couples Monte Carlo Tree Search (MCTS) with a neural policy-value network: the policy proposes route extensions, the value estimates downstream design quality, and search uses these predictions to refine each decision. This provides decision-time lookahead during route construction without running simulator rollouts inside the search tree. We evaluate AlphaTransit on a new Bloomington TRNDP benchmark with realistic road topology and censusderived demand, under mixed and full transit demand settings. In the Bloomington network, AlphaTransit attains the highest service rate in both demand settings, reaching 54.6% and 82.1%, respectively. Relative to reinforcement learning without search, these correspond to 9.9% and 11.4% service rate gains; relative to MCTS without learned guidance, they correspond to 2.5% and 11.2% gains. These results suggest that coupling learned guidance with MCTS is more effective than using either approach alone for transit network design. Our code and data are publicly available in https://github.com/poudel-bibek/AlphaTransit.