GLiNER2: 스키마 기반 인터페이스를 갖춘 효율적인 다중 작업 정보 추출 시스템

초록

정보 추출(Information Extraction, IE)은 다양한 NLP 애플리케이션의 기초가 되지만, 기존 솔루션들은 종종 특정 작업에 맞춰 전문화된 모델을 요구하거나 계산 비용이 많이 드는 대형 언어 모델에 의존합니다. 우리는 GLiNER2를 제안합니다. 이는 원래의 GLiNER 아키텍처를 개선하여 명명된 개체 인식, 텍스트 분류, 계층적 구조화 데이터 추출을 단일 효율적인 모델 내에서 지원하는 통합 프레임워크입니다. 사전 학습된 트랜스포머 인코더 아키텍처를 기반으로 구축된 GLiNER2는 CPU 효율성과 컴팩트한 크기를 유지하면서 직관적인 스키마 기반 인터페이스를 통해 다중 작업 구성을 도입합니다. 우리의 실험은 추출 및 분류 작업에서 경쟁력 있는 성능을 보여주며, LLM 기반 대안에 비해 배포 접근성이 크게 개선되었음을 입증합니다. 우리는 GLiNER2를 오픈소스 pip 설치 가능한 라이브러리로 공개하며, 사전 학습된 모델과 문서를 https://github.com/fastino-ai/GLiNER2에서 제공합니다.

English

Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built pretrained transformer encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across extraction and classification tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source pip-installable library with pre-trained models and documentation at https://github.com/fastino-ai/GLiNER2.

GLiNER2: 스키마 기반 인터페이스를 갖춘 효율적인 다중 작업 정보 추출 시스템

GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface

초록

Support