利用生成式人工智能拓宽交通安全数据访问：一种面向空间自然语言查询的模式驱动框架

摘要

交通安全性分析需要整合事故记录、道路属性以及地理空间数据，并通过基于地理信息系统的工作流程来实现，但各机构与社区利益相关者在数据获取方面仍存在不均衡现象。技术前提条件导致安全性规划的核心分析工具与实际应用者之间存在鸿沟。地方机构、学校委员会及居民可能对安全问题有所关切，但在检索、筛选、制图及分析相关数据方面的能力有限。生成式人工智能为缩小这一差距提供了可能，但其在公共部门的应用引发了关于可靠性、可重复性及治理的疑问。本文提出了一种基于模式（schema-grounded）的自然语言接口，用于交通安全性分析，利用大型语言模型（LLM）解析用户意图，同时确保在权威数据库上执行的结果具有确定性与可审查性。用户查询被转化为结构化的语义框架，经过基于规则的验证层校验，编译成空间操作的有向无环图（DAG），并在PostGIS数据库中执行。这种有界限的设计将语言解析与确定性执行相分离，在消除获取障碍的同时，确保了结果的可重复性与模式基础。该框架利用马萨诸塞州全州范围的交通安全性数据库进行评估，该数据库整合了事故记录、道路属性以及包括学校、公交站、人行横道和行政边界在内的地理空间图层。所有查询均成功执行；验证层纠正了29%评估查询中的错误，反映出灵活的自然语言与严格的模式基础需求之间的差距。结果表明，将自然语言的易用性与确定性执行相结合，是扩大交通安全性数据获取范围的实际方向，对公共部门规划中可信赖人工智能的应用具有启示意义。

English

Transportation safety analysis requires integrating crash records, roadway attributes, and geospatial data through GIS-based workflows, but access remains uneven across agencies and community stakeholders. Technical prerequisites create a gap between analytical tools central to safety planning and the practitioners able to use them. Local agencies, school committees, and residents may have safety concerns but limited capacity to retrieve, filter, map, and analyze relevant data. Generative AI offers a way to narrow this divide, but its public-sector use raises questions about reliability, reproducibility, and governance. This paper presents a schema-grounded natural language interface for transportation safety analysis, using a large language model (LLM) to interpret user intent while preserving deterministic, reviewable execution against an authoritative database. User queries are translated into structured semantic frames, validated by a rule-based layer, compiled into a typed directed acyclic graph of spatial operations, and executed against a PostGIS database. This bounded design separates language interpretation from deterministic execution, keeping results reproducible and schema-grounded while removing access barriers. The framework is evaluated using a statewide Massachusetts transportation safety database integrating crash records, roadway attributes, and geospatial layers including schools, bus stops, crosswalks, and municipal boundaries. All queries executed successfully; the validation layer corrects errors in 29% of evaluation queries, reflecting the gap between flexible natural language and strict schema-grounded requirements. The results suggest that combining natural language accessibility with deterministic execution is a practical direction for broadening access to transportation safety data, with implications for trustworthy AI in public-sector planning.