AlphaTransit: 都市規模の交通ルート設計の学習

要旨

トランジットネットワークの設計には、多くの逐次的な路線延長の決定が必要となるが、その品質は完全なネットワークが構築された後でなければ評価できないことが多い。この遅延フィードバックの課題は、交通路線ネットワーク設計問題（TRNDP）の中核に位置しており、路線間の相互作用は誤解を招きやすい。局所的には有用に見える延長が、乗り換えのボトルネックを生み出したり、重複する冗長性を発生させたり、全体的な輸送能力を低下させる可能性がある。シミュレータからの遅延フィードバックの下で路線構築を導くために、我々はAlphaTransitを導入する。これは都市規模のバスネットワーク設計のための探索ベースの計画フレームワークである。AlphaTransitはモンテカルロ木探索（MCTS）とニューラルポリシー・バリューネットワークを組み合わせる。ポリシーは路線延長を提案し、バリューは下流の設計品質を推定し、探索はこれらの予測を用いて各決定を洗練する。これにより、探索木内でシミュレータのロールアウトを実行することなく、路線構築中の決定時の先読みを実現する。我々はAlphaTransitを、現実的な道路トポロジーと国勢調査に基づく需要を用いた新しいBloomington TRNDPベンチマークにおいて、複合需要設定と全交通需要設定の下で評価する。Bloomingtonネットワークでは、AlphaTransitは両方の需要設定で最高のサービス率を達成し、それぞれ54.6%と82.1%に達した。探索なしの強化学習と比較すると、これらはサービス率で9.9%と11.4%の向上に相当し、学習によるガイダンスなしのMCTSと比較すると、それぞれ2.5%と11.2%の向上に相当する。これらの結果は、学習によるガイダンスとMCTSを組み合わせることが、交通ネットワーク設計においてどちらか一方のみを使用するよりも効果的であることを示唆している。我々のコードとデータはhttps://github.com/poudel-bibek/AlphaTransitで公開されている。

English

Designing a transit network requires many sequential route extension decisions, but their quality is often visible only after the full network is assembled. This delayed-feedback challenge lies at the heart of the Transit Route Network Design Problem (TRNDP), where route interactions can be deceptive: an extension that appears useful locally can create transfer bottlenecks, produce redundant overlap, or reduce overall throughput. To guide route construction under delayed simulator feedback, we introduce AlphaTransit, a search-based planning framework for cityscale bus network design. AlphaTransit couples Monte Carlo Tree Search (MCTS) with a neural policy-value network: the policy proposes route extensions, the value estimates downstream design quality, and search uses these predictions to refine each decision. This provides decision-time lookahead during route construction without running simulator rollouts inside the search tree. We evaluate AlphaTransit on a new Bloomington TRNDP benchmark with realistic road topology and censusderived demand, under mixed and full transit demand settings. In the Bloomington network, AlphaTransit attains the highest service rate in both demand settings, reaching 54.6% and 82.1%, respectively. Relative to reinforcement learning without search, these correspond to 9.9% and 11.4% service rate gains; relative to MCTS without learned guidance, they correspond to 2.5% and 11.2% gains. These results suggest that coupling learned guidance with MCTS is more effective than using either approach alone for transit network design. Our code and data are publicly available in https://github.com/poudel-bibek/AlphaTransit.