언어 모델에서의 3D 분자-텍스트 해석을 향하여

초록

언어 모델(LMs)은 다양한 분야에 큰 영향을 미쳤습니다. 그러나 3D 분자 구조를 이해하는 데 있어 본질적인 한계로 인해 생물 분자 영역에서의 잠재력이 상당히 제한되어 왔습니다. 이러한 격차를 해소하기 위해 우리는 3D 분자-텍스트 해석에 초점을 맞추고, 3D-MoLM: 3D 분자 언어 모델링을 제안합니다. 구체적으로, 3D-MoLM은 언어 모델에 3D 분자 인코더를 장착함으로써 3D 분자를 해석하고 분석할 수 있도록 합니다. 이 통합은 3D 분자 인코더의 표현 공간과 언어 모델의 입력 공간을 연결하는 3D 분자-텍스트 프로젝터를 통해 이루어집니다. 또한, 3D-MoLM의 교차 모달 분자 이해 및 명령어 수행 능력을 향상시키기 위해, 우리는 3D 분자 중심의 명령어 튜닝 데이터셋인 3D-MoIT를 신중하게 구축했습니다. 3D 분자-텍스트 정렬과 3D 분자 중심의 명령어 튜닝을 통해, 3D-MoLM은 3D 분자 인코더와 언어 모델의 통합을 이루어냅니다. 이는 분자-텍스트 검색, 분자 캡셔닝, 그리고 특히 3D 의존적 특성에 초점을 맞춘 더 도전적인 개방형 텍스트 분자 QA 작업을 포함한 다운스트림 작업에서 기존 베이스라인을 크게 능가합니다.

English

Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties.

언어 모델에서의 3D 분자-텍스트 해석을 향하여

Towards 3D Molecule-Text Interpretation in Language Models

초록

Support