SLIMER-IT: イタリア語におけるゼロショットNER

要旨

従来の固有表現認識（NER）のアプローチは、BIO系列ラベリング問題にタスクを枠組みにします。これらのシステムはしばしば対象の下流タスクで優れた成績を収めますが、豊富な注釈付きデータが必要であり、分布外の入力ドメインや未知のエンティティタイプへの一般化が難しいという課題があります。それに対して、大規模言語モデル（LLMs）は強力なゼロショット能力を示しています。英語におけるゼロショットNERに取り組む研究はいくつかありますが、他言語においてはほとんど行われていません。本論文では、イタリア語に適用するゼロショットNERの評価フレームワークを定義します。さらに、SLIMERのイタリア語版であるSLIMER-ITを紹介します。これは、定義とガイドラインで充実させたプロンプトを活用する、ゼロショットNER向けのインストラクションチューニングアプローチです。他の最先端モデルとの比較により、SLIMER-ITが以前に見たことのないエンティティタグにおいて優位性を示しています。

English

Traditional approaches to Named Entity Recognition (NER) frame the task into a BIO sequence labeling problem. Although these systems often excel in the downstream task at hand, they require extensive annotated data and struggle to generalize to out-of-distribution input domains and unseen entity types. On the contrary, Large Language Models (LLMs) have demonstrated strong zero-shot capabilities. While several works address Zero-Shot NER in English, little has been done in other languages. In this paper, we define an evaluation framework for Zero-Shot NER, applying it to the Italian language. Furthermore, we introduce SLIMER-IT, the Italian version of SLIMER, an instruction-tuning approach for zero-shot NER leveraging prompts enriched with definition and guidelines. Comparisons with other state-of-the-art models, demonstrate the superiority of SLIMER-IT on never-seen-before entity tags.

SLIMER-IT: イタリア語におけるゼロショットNER

SLIMER-IT: Zero-Shot NER on Italian Language

要旨

Support