ChatPaper.aiChatPaper

Mol-LLaMA:邁向大規模分子語言模型中的分子通用理解

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

February 19, 2025
作者: Dongki Kim, Wonbin Lee, Sung Ju Hwang
cs.AI

摘要

理解分子是理解生物體並推動藥物發現進步的關鍵,這需要跨化學與生物學的跨學科知識。儘管大型分子語言模型在解釋分子結構方面取得了顯著成功,但其指令數據集僅限於任務導向數據集中的特定知識,並未全面涵蓋分子的基本特徵,這限制了它們作為通用分子助手的能力。為解決這一問題,我們提出了Mol-LLaMA,這是一個通過多模態指令調優掌握以分子為核心的通用知識的大型分子語言模型。為此,我們設計了涵蓋分子基本特徵的關鍵數據類型,並整合了分子結構中的核心知識。此外,為了提升對分子特徵的理解,我們引入了一個模塊,該模塊整合了來自不同分子編碼器的互補信息,充分利用了不同分子表示方式的獨特優勢。我們的實驗結果表明,Mol-LLaMA能夠理解分子的通用特徵,並針對用戶的查詢生成相關回應及詳細解釋,展現了其作為通用分子分析助手的潛力。
English
Understanding molecules is key to understanding organisms and driving advances in drug discovery, requiring interdisciplinary knowledge across chemistry and biology. Although large molecular language models have achieved notable success in interpreting molecular structures, their instruction datasets are limited to the specific knowledge from task-oriented datasets and do not fully cover the fundamental characteristics of molecules, hindering their abilities as general-purpose molecular assistants. To address this issue, we propose Mol-LLaMA, a large molecular language model that grasps the general knowledge centered on molecules via multi-modal instruction tuning. To this end, we design key data types that encompass the fundamental features of molecules, incorporating essential knowledge from molecular structures. In addition, to improve understanding of molecular features, we introduce a module that integrates complementary information from different molecular encoders, leveraging the distinct advantages of different molecular representations. Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules and generating relevant responses to users' queries with detailed explanations, implying its potential as a general-purpose assistant for molecular analysis.

Summary

AI-Generated Summary

PDF462February 24, 2025