Knesset-DictaBERT：議会議事録向けヘブライ語言語モデル

要旨

我々は、イスラエル議会の議事録から構成されるKnesset Corpusを用いてファインチューニングされた大規模ヘブライ語モデル「Knesset-DictaBERT」を提案する。本モデルはDictaBERTアーキテクチャを基盤としており、MLMタスクにおいて議会言語の理解において顕著な改善を示している。モデルの性能を詳細に評価し、ベースラインのDictaBERTモデルと比較してパープレキシティと精度の向上を確認した。

English

We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understanding parliamentary language according to the MLM task. We provide a detailed evaluation of the model's performance, showing improvements in perplexity and accuracy over the baseline DictaBERT model.

Knesset-DictaBERT：議会議事録向けヘブライ語言語モデル

Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

要旨

Support