MapAgent: 도시 규모 차선 수준 지도 생성을 위한 산업 수준의 에이전트 기반 프레임워크

초록

차선 수준 지도는 자율주행 및 차선 수준 내비게이션을 위한 핵심 인프라이나, 수백 개 도시에 걸쳐 표준화된 차선 네트워크를 구축하고 유지하는 데는 여전히 많은 인력이 필요하다. 최근의 엔드투엔드 벡터화 매핑 방법은 센서 데이터로부터 차선의 기하학과 위상 정보를 직접 예측할 수 있지만, 일반적으로 매핑 사양과 교통 규정을 암시적이고 데이터셋에 의존적인 지도 학습 방식으로 처리한다. 또한 복잡한 장면(예: 마모되거나 누락된 표시, 가려짐)에서는 올바른 차선 구성을 시각적 증거만으로 판단하기 어려운 경우가 많아, 사양 위반이 사람의 사후 편집을 필요로 하는 주요 원인이 된다. 본 논문에서는 사양을 준수하는 차선 지도 생성을 위해 벡터화 백본을 보강하는 산업용 에이전트 아키텍처인 MapAgent를 제안한다. MapAgent는 단순히 지도 예측에 에이전트 루프를 추가하는 대신, 백본 인식을 명시적 사양 검증, 제약 조건 인식 추론, 그리고 제한된 검증 기반 Judge-Planner-Worker 루프 하에서의 결정론적 지도 편집과 결합한다. 비전-언어 Judge는 시각적 증거와 초안 벡터를 함께 검사하여 오류를 진단하고, 도구 호출 Planner는 최소한의 수정 편집을 생성한 후 재검증을 수행한다. 도시 규모 생산성에 확장성을 유지하기 위해 MapAgent는 백본 신뢰도가 낮은 타일에만 선택적으로 트리거되어 처리량을 유지하면서 추가 부담을 최소화한다. 실제 데이터셋 실험에서 강력한 프로덕션 기준선 대비 일관된 성능 향상을 보였으며, 특히 복잡하고 긴꼬리(long-tail) 시나리오에서 두드러졌다. 또한 MapAgent는 바이두 지도에 통합되어 전국 360개 이상 도시의 차선 수준 지도 생성을 지원하고, 전체 생산 자동화율을 95% 이상으로 끌어올려 대규모 차선 수준 지도 생성을 위한 실용성과 효과성을 입증했다.

English

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directly from sensor data, but they typically treat mapping specifications and traffic regulations as implicit, dataset-dependent supervision. Moreover, in complex scenes (e.g., worn or missing markings and occlusions), correct lane configurations are often under-determined by visual evidence alone, making specification violations a major source of human post-editing. We propose MapAgent, an industrial-grade agentic architecture that augments a vectorization backbone for specification-compliant lane-map production. Rather than merely adding an agent loop to map prediction, MapAgent couples backbone perception with explicit specification verification, constraint-aware reasoning, and deterministic map editing under a bounded, verification-driven Judge-Planner-Worker loop. A vision-language Judge diagnoses errors by jointly inspecting visual evidence and draft vectors, while a tool-calling Planner generates minimal corrective edits with post-edit re-validation. To remain scalable for city-scale production, MapAgent is selectively triggered only on tiles with low backbone confidence, adding modest overhead while preserving throughput. Experiments on real-world datasets show consistent gains over strong production baselines, especially in complex and long-tail scenarios. Additionally, MapAgent has been integrated into Baidu Maps, supporting lane-level map generation for over 360 cities nationwide and elevating the overall production automation to over 95%, demonstrating MapAgent's practicality and effectiveness for large-scale lane-level map generation.