Knesset-DictaBERT:用于议会程序的希伯来语言模型
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
July 30, 2024
作者: Gili Goldin, Shuly Wintner
cs.AI
摘要
我们介绍了Knesset-DictaBERT,这是一个在Knesset语料库上进行微调的大型希伯来语言模型,该语料库包括以色列议会的会议记录。该模型基于DictaBERT架构,根据MLM任务在理解议会语言方面取得了显著进展。我们对模型性能进行了详细评估,展示了在困惑度和准确性方面相对于基准DictaBERT模型的改进。
English
We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the
Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is
based on the DictaBERT architecture and demonstrates significant improvements
in understanding parliamentary language according to the MLM task. We provide a
detailed evaluation of the model's performance, showing improvements in
perplexity and accuracy over the baseline DictaBERT model.