ChatPaper.aiChatPaper

NExT-Mol:3D擴散與1D語言建模融合的3D分子生成技術

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

February 18, 2025
作者: Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui Shi, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
cs.AI

摘要

三維分子生成對於藥物發現和材料設計至關重要。儘管先前的研究主要關注三維擴散模型在模擬連續三維構象上的優勢,但卻忽視了一維基於SELFIES的語言模型(LMs)的優點,後者能夠生成100%有效的分子並利用數十億規模的一維分子數據集。為了結合這些優勢進行三維分子生成,我們提出了一個基礎模型——NExT-Mol:三維擴散與一維語言建模相結合的三維分子生成模型。NExT-Mol利用經過廣泛預訓練的分子LM進行一維分子生成,隨後使用三維擴散模型預測生成分子的三維構象。我們通過擴大LM的模型規模、改進擴散神經架構以及應用一維到三維的遷移學習來提升NExT-Mol的性能。值得注意的是,我們的一維分子LM在保證有效性的同時,在分佈相似性上顯著超越了基線模型,而我們的三維擴散模型在構象預測方面取得了領先的表現。鑑於這些在一維和三維建模上的改進,NExT-Mol在GEOM-DRUGS數據集上的全新三維生成任務中實現了26%的相對提升,在QM9-2014數據集上的條件三維生成任務中平均獲得了13%的相對增益。我們的代碼和預訓練檢查點可在https://github.com/acharkq/NExT-Mol獲取。
English
3D molecule generation is crucial for drug discovery and material design. While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. NExT-Mol uses an extensively pretrained molecule LM for 1D molecule generation, and subsequently predicts the generated molecule's 3D conformers with a 3D diffusion model. We enhance NExT-Mol's performance by scaling up the LM's model size, refining the diffusion neural architecture, and applying 1D to 3D transfer learning. Notably, our 1D molecule LM significantly outperforms baselines in distributional similarity while ensuring validity, and our 3D diffusion model achieves leading performances in conformer prediction. Given these improvements in 1D and 3D modeling, NExT-Mol achieves a 26% relative improvement in 3D FCD for de novo 3D generation on GEOM-DRUGS, and a 13% average relative gain for conditional 3D generation on QM9-2014. Our codes and pretrained checkpoints are available at https://github.com/acharkq/NExT-Mol.

Summary

AI-Generated Summary

PDF82February 20, 2025