AstroLLaMA:朝向天文學專業基礎模型
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
September 12, 2023
作者: Tuan Dung Nguyen, Yuan-Sen Ting, Ioana Ciucă, Charlie O'Neill, Ze-Chang Sun, Maja Jabłońska, Sandor Kruk, Ernest Perkowski, Jack Miller, Jason Li, Josh Peek, Kartheik Iyer, Tomasz Różański, Pranav Khetarpal, Sharaf Zaman, David Brodrick, Sergio J. Rodríguez Méndez, Thang Bui, Alyssa Goodman, Alberto Accomazzi, Jill Naiman, Jesse Cranney, Kevin Schawinski, UniverseTBD
cs.AI
摘要
大型語言模型在許多人類語言任務中表現出色,但在學術天文等高度專業領域常常表現不佳。為了彌合這一差距,我們介紹了AstroLLaMA,這是一個從LLaMA-2微調而來的70億參數模型,使用了來自arXiv的超過30萬篇天文摘要。AstroLLaMA經過傳統因果語言建模的優化,比Llama-2的困惑度低30%,顯示出明顯的領域適應能力。我們的模型生成的文本補全和嵌入提取比最先進的基礎模型更具洞察力和科學相關性,盡管參數明顯較少。AstroLLaMA作為一個強大的、面向特定領域的模型,具有廣泛的微調潛力。其公開發布旨在推動以天文為重點的研究,包括自動論文摘要和對話代理的開發。
English
Large language models excel in many human-language tasks but often falter in
highly specialized domains like scholarly astronomy. To bridge this gap, we
introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using
over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal
language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2,
showing marked domain adaptation. Our model generates more insightful and
scientifically relevant text completions and embedding extraction than
state-of-the-arts foundation models despite having significantly fewer
parameters. AstroLLaMA serves as a robust, domain-specific model with broad
fine-tuning potential. Its public release aims to spur astronomy-focused
research, including automatic paper summarization and conversational agent
development.