生成AIによる交通安全データへのアクセス拡大：空間的自然言語クエリのためのスキーマ基盤フレームワーク

要旨

交通安全分析では、事故記録、道路属性、地理空間データをGISベースのワークフローを通じて統合する必要があるが、そのアクセスは行政機関や地域コミュニティの関係者の間で不均一である。技術的な前提条件により、安全計画の中心となる分析ツールと、それらを実際に使用できる実務者の間には隔たりが生じている。地方自治体、学校委員会、住民は安全上の懸念を抱えていても、関連データを取得、フィルタリング、マッピング、分析する能力が限られていることがある。生成AIはこの格差を縮める手段を提供するが、その公共部門での利用は、信頼性、再現性、ガバナンスに関する疑問を提起する。本論文では、交通安全性分析のためのスキーマに基づく自然言語インターフェースを提示する。大規模言語モデル（LLM）を用いてユーザーの意図を解釈しつつ、信頼できるデータベースに対する確定的かつ検証可能な実行を保持する。ユーザーのクエリは構造化された意味フレームに変換され、ルールベースのレイヤーで検証され、空間操作の型付き有向非巡回グラフにコンパイルされ、PostGISデータベース上で実行される。この制限された設計により、言語解釈と確定的実行が分離され、結果の再現可能性とスキーマへの準拠を維持しつつ、アクセス障壁が取り除かれる。本フレームワークは、マサチューセッツ州全域の交通安全性データベースを用いて評価される。このデータベースは、事故記録、道路属性、学校、バス停、横断歩道、自治体境界などの地理空間レイヤーを統合している。すべてのクエリは正常に実行され、評価クエリの29%において検証レイヤーがエラーを修正しており、柔軟な自然言語と厳格なスキーマ要件との間の乖離を反映している。この結果は、自然言語のアクセシビリティと確定的実行を組み合わせることが、交通安全性データへのアクセスを拡大するための実践的な方向性であり、公共部門の計画における信頼できるAIへの示唆を持つことを示唆している。

English

Transportation safety analysis requires integrating crash records, roadway attributes, and geospatial data through GIS-based workflows, but access remains uneven across agencies and community stakeholders. Technical prerequisites create a gap between analytical tools central to safety planning and the practitioners able to use them. Local agencies, school committees, and residents may have safety concerns but limited capacity to retrieve, filter, map, and analyze relevant data. Generative AI offers a way to narrow this divide, but its public-sector use raises questions about reliability, reproducibility, and governance. This paper presents a schema-grounded natural language interface for transportation safety analysis, using a large language model (LLM) to interpret user intent while preserving deterministic, reviewable execution against an authoritative database. User queries are translated into structured semantic frames, validated by a rule-based layer, compiled into a typed directed acyclic graph of spatial operations, and executed against a PostGIS database. This bounded design separates language interpretation from deterministic execution, keeping results reproducible and schema-grounded while removing access barriers. The framework is evaluated using a statewide Massachusetts transportation safety database integrating crash records, roadway attributes, and geospatial layers including schools, bus stops, crosswalks, and municipal boundaries. All queries executed successfully; the validation layer corrects errors in 29% of evaluation queries, reflecting the gap between flexible natural language and strict schema-grounded requirements. The results suggest that combining natural language accessibility with deterministic execution is a practical direction for broadening access to transportation safety data, with implications for trustworthy AI in public-sector planning.