GLiNER2:一种高效的多任务信息抽取系统,具备模式驱动接口
GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface
July 24, 2025
作者: Urchade Zaratiana, Gil Pasternak, Oliver Boyd, George Hurn-Maloney, Ash Lewis
cs.AI
摘要
信息抽取(IE)是众多自然语言处理应用的基础,然而现有解决方案往往需要针对不同任务定制专门模型,或依赖于计算成本高昂的大型语言模型。我们推出了GLiNER2,这是一个统一框架,它在原有GLiNER架构的基础上进行了增强,支持命名实体识别、文本分类以及层次化结构化数据抽取,所有功能集成于一个高效模型之中。基于预训练的Transformer编码器架构,GLiNER2在保持CPU高效性和紧凑体积的同时,通过直观的基于模式的接口引入了多任务组合能力。我们的实验表明,在抽取和分类任务上,GLiNER2展现了竞争力,并在部署便捷性上相比基于大语言模型的方案有显著提升。我们将GLiNER2作为开源项目发布,提供pip可安装的库、预训练模型及详细文档,访问地址为https://github.com/fastino-ai/GLiNER2。
English
Information extraction (IE) is fundamental to numerous NLP applications, yet
existing solutions often require specialized models for different tasks or rely
on computationally expensive large language models. We present GLiNER2, a
unified framework that enhances the original GLiNER architecture to support
named entity recognition, text classification, and hierarchical structured data
extraction within a single efficient model. Built pretrained transformer
encoder architecture, GLiNER2 maintains CPU efficiency and compact size while
introducing multi-task composition through an intuitive schema-based interface.
Our experiments demonstrate competitive performance across extraction and
classification tasks with substantial improvements in deployment accessibility
compared to LLM-based alternatives. We release GLiNER2 as an open-source
pip-installable library with pre-trained models and documentation at
https://github.com/fastino-ai/GLiNER2.