MolHIT:基於分層離散擴散模型的分子圖生成技術進展
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
February 19, 2026
作者: Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong
cs.AI
摘要
基於擴散模型的分子生成已成為AI驅動藥物發現與材料科學中極具前景的研究方向。儘管二維分子圖的離散特性使圖擴散模型被廣泛採用,但現有模型存在化學有效性不足的問題,且相較於一維建模方法更難滿足目標屬性要求。本研究提出MolHIT——一個突破現有方法性能瓶頸的強大分子圖生成框架。該框架基於層次化離散擴散模型,將離散擴散推廣至能編碼化學先驗的附加類別,並採用解耦原子編碼技術根據原子化學角色進行類型劃分。在MOSES數據集上,MolHIT首次實現了接近完美的化學有效性,在圖擴散領域創下最新性能紀錄,並在多項指標上超越強勁的一維基線模型。我們進一步驗證了該框架在下游任務中的卓越表現,包括多屬性引導生成與骨架擴展等應用場景。
English
Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.