생성형 AI를 활용한 교통 안전 데이터 접근성 확대: 공간 자연어 질의를 위한 스키마 기반 프레임워크

초록

교통 안전 분석은 충돌 기록, 도로 속성 및 지리공간 데이터를 GIS 기반 워크플로우를 통해 통합해야 하지만, 기관 및 지역사회 이해관계자 간의 접근성은 여전히 불균등하다. 기술적 선행 조건들은 안전 계획의 핵심인 분석 도구와 이를 활용할 수 있는 실무자 간의 격차를 만들어낸다. 지역 기관, 학교 위원회, 주민들은 안전 문제를 인식하고 있지만, 관련 데이터를 검색, 필터링, 매핑 및 분석할 수 있는 역량은 제한적일 수 있다. 생성형 AI는 이러한 격차를 좁힐 방법을 제시하지만, 공공 부문에서의 사용은 신뢰성, 재현성 및 거버넌스에 대한 의문을 제기한다. 본 논문은 교통 안전 분석을 위한 스키마 기반 자연어 인터페이스를 제시하며, 대규모 언어 모델(LLM)을 사용하여 사용자 의도를 해석하는 동시에 권위 있는 데이터베이스에 대한 결정론적이고 검토 가능한 실행을 보장한다. 사용자 질의는 구조화된 의미 프레임으로 변환되고, 규칙 기반 계층에 의해 검증되며, 공간 연산의 유형화된 방향성 비순환 그래프로 컴파일된 후 PostGIS 데이터베이스에서 실행된다. 이 경계가 정해진 설계는 언어 해석과 결정론적 실행을 분리하여 결과의 재현성과 스키마 기반성을 유지하면서 접근 장벽을 제거한다. 프레임워크는 학교, 버스 정류장, 횡단보도, 지자체 경계를 포함한 충돌 기록, 도로 속성 및 지리공간 계층을 통합한 매사추세츠 전역 교통 안전 데이터베이스를 사용하여 평가되었다. 모든 질의가 성공적으로 실행되었으며, 검증 계층은 평가 질의의 29%에서 오류를 수정하였으며, 이는 유연한 자연어와 엄격한 스키마 기반 요구사항 간의 격차를 반영한다. 결과는 자연어 접근성과 결정론적 실행의 결합이 교통 안전 데이터에 대한 접근성을 확대하기 위한 실용적인 방향임을 시사하며, 이는 공공 부문 계획에서의 신뢰할 수 있는 AI에 대한 함의를 제공한다.

English

Transportation safety analysis requires integrating crash records, roadway attributes, and geospatial data through GIS-based workflows, but access remains uneven across agencies and community stakeholders. Technical prerequisites create a gap between analytical tools central to safety planning and the practitioners able to use them. Local agencies, school committees, and residents may have safety concerns but limited capacity to retrieve, filter, map, and analyze relevant data. Generative AI offers a way to narrow this divide, but its public-sector use raises questions about reliability, reproducibility, and governance. This paper presents a schema-grounded natural language interface for transportation safety analysis, using a large language model (LLM) to interpret user intent while preserving deterministic, reviewable execution against an authoritative database. User queries are translated into structured semantic frames, validated by a rule-based layer, compiled into a typed directed acyclic graph of spatial operations, and executed against a PostGIS database. This bounded design separates language interpretation from deterministic execution, keeping results reproducible and schema-grounded while removing access barriers. The framework is evaluated using a statewide Massachusetts transportation safety database integrating crash records, roadway attributes, and geospatial layers including schools, bus stops, crosswalks, and municipal boundaries. All queries executed successfully; the validation layer corrects errors in 29% of evaluation queries, reflecting the gap between flexible natural language and strict schema-grounded requirements. The results suggest that combining natural language accessibility with deterministic execution is a practical direction for broadening access to transportation safety data, with implications for trustworthy AI in public-sector planning.