以LOCUS解放法律:美国地方条例语料库
Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States
June 17, 2026
作者: Denis Peskoff, Joe Barrow, Christopher Vu, Diag Davenport
cs.AI
摘要
法律人工智能的进步日益依赖于大规模获取权威性法律文本。然而,美国法律体系中最为重要的层级之一——地方条例——在很大程度上仍未被现有机器可读语料库所覆盖。地方法规涵盖分区管理、住房、商业许可、公共卫生、噪音管控、动物管理及其他诸多日常监管领域,但这些法规分散在专为人工浏览而非批量研究访问设计的供应商平台上。我们推出LOCUS——美国地方条例语料库——这是一个综合性语料库及面向美国市县条例的县归一化访问层。原始语料库(可供研究人员获取)涵盖了几乎所有公开可用的市县条例文本。由此生成的原始语料库包含9,239个市县的法规数据。规模较小的县归一化LOCUS访问层覆盖了美国3,144个县中人口占比最大的2,309个县,覆盖了多数人口。我们采用OCR技术处理大量阻碍法律成为公共资源的多样化文件格式。随语料库一同发布的覆盖元数据将支持可复现性、下游法律AI研究以及地方法律机器可读访问的渐进扩展。我们训练了一组基于ModernBERT的分类器和评分器,以便从多个维度(如不透明性与家长主义)分析美国地方法律——这些维度此前从未在如此规模下进行研究。LOCUS-v1及其衍生模型可通过以下地址获取:https://huggingface.co/datasets/LocalLaws/LOCUS-v1
English
Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1