GLiNER2：スキーマ駆動型インターフェースを備えた効率的なマルチタスク情報抽出システム

要旨

情報抽出（IE）は、多くのNLPアプリケーションにおいて基本的な技術であるが、既存のソリューションは、異なるタスクに対して専門化されたモデルを必要とするか、計算コストの高い大規模言語モデルに依存することが多い。本論文では、GLiNER2を紹介する。これは、元のGLiNERアーキテクチャを強化し、固有表現認識、テキスト分類、階層構造化データ抽出を単一の効率的なモデルでサポートする統一フレームワークである。事前学習済みのトランスフォーマーエンコーダアーキテクチャを基盤として構築されたGLiNER2は、CPU効率とコンパクトなサイズを維持しつつ、直感的なスキーマベースのインターフェースを通じてマルチタスク構成を導入する。実験結果は、抽出および分類タスクにおいて競争力のある性能を示し、LLMベースの代替手法と比較して展開のアクセシビリティが大幅に向上していることを実証している。GLiNER2は、事前学習済みモデルとドキュメントを備えたオープンソースのpipインストール可能なライブラリとして、https://github.com/fastino-ai/GLiNER2 で公開されている。

English

Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built pretrained transformer encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across extraction and classification tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source pip-installable library with pre-trained models and documentation at https://github.com/fastino-ai/GLiNER2.

GLiNER2：スキーマ駆動型インターフェースを備えた効率的なマルチタスク情報抽出システム

GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface

要旨

Support