AlphaTransit: 도시 규모 대중교통 경로 설계 학습

초록

대중교통 네트워크를 설계하려면 많은 순차적 노선 확장 결정이 필요하지만, 그 품질은 종종 전체 네트워크가 완성된 후에야 드러난다. 이러한 지연된 피드백 문제는 대중교통 노선 네트워크 설계 문제(TRNDP)의 핵심에 자리 잡고 있으며, 노선 간 상호작용은 기만적일 수 있다. 즉, 국소적으로는 유용해 보이는 확장이 환승 병목을 만들거나, 중복 구간을 발생시키거나, 전체 처리량을 감소시킬 수 있다. 지연된 시뮬레이터 피드백 하에서의 노선 구축을 안내하기 위해, 우리는 도시 규모 버스 네트워크 설계를 위한 탐색 기반 계획 프레임워크인 AlphaTransit을 도입한다. AlphaTransit은 몬테카를로 트리 탐색(MCTS)과 신경망 정책-가치 네트워크를 결합한다. 정책은 노선 확장을 제안하고, 가치는 하류 설계 품질을 추정하며, 탐색은 이러한 예측을 활용하여 각 결정을 개선한다. 이를 통해 탐색 트리 내에서 시뮬레이터 롤아웃을 실행하지 않고도 노선 구축 중 의사결정 시점의 선행 탐색이 가능하다. 우리는 혼합 및 전체 대중교통 수요 설정 하에서 현실적인 도로 토폴로지와 인구조사 기반 수요를 갖춘 새로운 블루밍턴 TRNDP 벤치마크에서 AlphaTransit을 평가한다. 블루밍턴 네트워크에서 AlphaTransit은 두 수요 설정 모두에서 가장 높은 서비스율을 달성하여 각각 54.6%와 82.1%에 도달했다. 탐색 없는 강화학습과 비교하면 이는 각각 9.9%와 11.4%의 서비스율 향상에 해당하며, 학습된 안내 없는 MCTS와 비교하면 각각 2.5%와 11.2%의 향상에 해당한다. 이러한 결과는 학습된 안내와 MCTS의 결합이 대중교통 네트워크 설계에서 어느 한 접근법만 사용하는 것보다 더 효과적임을 시사한다. 우리의 코드와 데이터는 https://github.com/poudel-bibek/AlphaTransit에서 공개적으로 이용 가능하다.

English

Designing a transit network requires many sequential route extension decisions, but their quality is often visible only after the full network is assembled. This delayed-feedback challenge lies at the heart of the Transit Route Network Design Problem (TRNDP), where route interactions can be deceptive: an extension that appears useful locally can create transfer bottlenecks, produce redundant overlap, or reduce overall throughput. To guide route construction under delayed simulator feedback, we introduce AlphaTransit, a search-based planning framework for cityscale bus network design. AlphaTransit couples Monte Carlo Tree Search (MCTS) with a neural policy-value network: the policy proposes route extensions, the value estimates downstream design quality, and search uses these predictions to refine each decision. This provides decision-time lookahead during route construction without running simulator rollouts inside the search tree. We evaluate AlphaTransit on a new Bloomington TRNDP benchmark with realistic road topology and censusderived demand, under mixed and full transit demand settings. In the Bloomington network, AlphaTransit attains the highest service rate in both demand settings, reaching 54.6% and 82.1%, respectively. Relative to reinforcement learning without search, these correspond to 9.9% and 11.4% service rate gains; relative to MCTS without learned guidance, they correspond to 2.5% and 11.2% gains. These results suggest that coupling learned guidance with MCTS is more effective than using either approach alone for transit network design. Our code and data are publicly available in https://github.com/poudel-bibek/AlphaTransit.