AlphaTransit：學習設計城市規模交通路線

摘要

設計公車路網需要一連串依序的路線延伸決策，但其品質往往只有在完整網路組裝完成後才能顯現。這種延遲回饋的挑戰正是公交路線網路設計問題（TRNDP）的核心，而路線之間會產生誤導性的相互作用：看似有益於局部路線的延伸，卻可能引發轉乘瓶頸、造成重疊冗餘，或降低整體運輸效能。為了解決在延遲模擬回饋下引導路線建構的問題，我們提出AlphaTransit，這是一個基於搜尋的規劃框架，專為城市規模的公車路網設計而設計。AlphaTransit將蒙地卡羅樹搜尋（MCTS）與神經政策-價值網路相結合：政策網路負責提出路線延伸方案，價值網路則評估下游設計品質，而搜尋過程利用這些預測來優化每個決策。此方法無需在搜尋樹內執行模擬推展，即可在路線建構過程中提供決策當下的前瞻能力。我們在一個新的布魯明頓TRNDP基準上評估AlphaTransit，該基準採用真實的道路拓撲及人口普查推估的交通需求，並在混合與全公交需求設定下進行測試。在布魯明頓路網中，AlphaTransit在兩種需求設定下均達到最高的服務涵蓋率，分別為54.6%與82.1%。與未使用搜尋的強化學習相比，這分別對應到9.9%與11.4%的服務率提升；相較於未經學習引導的MCTS，則分別提升2.5%與11.2%。這些結果表明，將學習引導與MCTS結合，在公車路網設計上比單獨使用任一種方法更為有效。我們的程式碼與資料已公開於 https://github.com/poudel-bibek/AlphaTransit。

English

Designing a transit network requires many sequential route extension decisions, but their quality is often visible only after the full network is assembled. This delayed-feedback challenge lies at the heart of the Transit Route Network Design Problem (TRNDP), where route interactions can be deceptive: an extension that appears useful locally can create transfer bottlenecks, produce redundant overlap, or reduce overall throughput. To guide route construction under delayed simulator feedback, we introduce AlphaTransit, a search-based planning framework for cityscale bus network design. AlphaTransit couples Monte Carlo Tree Search (MCTS) with a neural policy-value network: the policy proposes route extensions, the value estimates downstream design quality, and search uses these predictions to refine each decision. This provides decision-time lookahead during route construction without running simulator rollouts inside the search tree. We evaluate AlphaTransit on a new Bloomington TRNDP benchmark with realistic road topology and censusderived demand, under mixed and full transit demand settings. In the Bloomington network, AlphaTransit attains the highest service rate in both demand settings, reaching 54.6% and 82.1%, respectively. Relative to reinforcement learning without search, these correspond to 9.9% and 11.4% service rate gains; relative to MCTS without learned guidance, they correspond to 2.5% and 11.2% gains. These results suggest that coupling learned guidance with MCTS is more effective than using either approach alone for transit network design. Our code and data are publicly available in https://github.com/poudel-bibek/AlphaTransit.