走向语言模型中的三维分子文本解释

摘要

语言模型（LMs）在各个领域产生了巨大影响。然而，它们在理解3D分子结构方面的固有局限性显著限制了它们在生物分子领域的潜力。为了弥合这一差距，我们专注于3D分子-文本解释，并提出3D-MoLM：3D-分子语言建模。具体而言，3D-MoLM通过为LM配备3D分子编码器，使LM能够解释和分析3D分子。这种集成是通过3D分子-文本投影仪实现的，它连接了3D分子编码器的表示空间和LM的输入空间。此外，为了增强3D-MoLM对跨模态分子理解和指导遵循的能力，我们精心策划了一个3D分子为中心的指导调整数据集 - 3D-MoIT。通过3D分子-文本对齐和3D分子为中心的指导调整，3D-MoLM建立了3D分子编码器和LM的集成。它在下游任务中显著超越了现有基线，包括分子-文本检索、分子字幕生成，以及更具挑战性的开放文本分子问答任务，特别侧重于3D相关属性。

English

Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties.

走向语言模型中的三维分子文本解释

Towards 3D Molecule-Text Interpretation in Language Models

摘要

Support