ChatPaper.aiChatPaper

BioMamba:一個利用 Mamba 的預訓練生物醫學語言表示模型

BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba

August 5, 2024
作者: Ling Yue, Sixue Xing, Yingzhou Lu, Tianfan Fu
cs.AI

摘要

在生物學中,自然語言處理(NLP)的進步取決於模型解釋複雜的生物醫學文獻的能力。傳統模型通常難以應對這個領域中複雜且具有特定領域的語言。本文介紹了BioMamba,這是一個專門為生物醫學文本挖掘設計的預訓練模型。BioMamba基於Mamba架構,並在大量生物醫學文獻語料庫上進行了預訓練。我們的實證研究表明,BioMamba在各種生物醫學任務上明顯優於BioBERT和通用領域的Mamba等模型。例如,BioMamba在BioASQ測試集上將困惑度降低了100倍,交叉熵損失降低了4倍。我們概述了模型架構、預訓練過程和微調技術。此外,我們釋出代碼和訓練好的模型,以促進進一步的研究。
English
The advancement of natural language processing (NLP) in biology hinges on models' ability to interpret intricate biomedical literature. Traditional models often struggle with the complex and domain-specific language in this field. In this paper, we present BioMamba, a pre-trained model specifically designed for biomedical text mining. BioMamba builds upon the Mamba architecture and is pre-trained on an extensive corpus of biomedical literature. Our empirical studies demonstrate that BioMamba significantly outperforms models like BioBERT and general-domain Mamba across various biomedical tasks. For instance, BioMamba achieves a 100 times reduction in perplexity and a 4 times reduction in cross-entropy loss on the BioASQ test set. We provide an overview of the model architecture, pre-training process, and fine-tuning techniques. Additionally, we release the code and trained model to facilitate further research.

Summary

AI-Generated Summary

PDF112November 28, 2024