L^2M^3OF:面向金属有机框架的大语言多模态模型
L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks
October 23, 2025
作者: Jiyu Cui, Fang Wu, Haokai Zhao, Minggao Feng, Xenophon Evangelopoulos, Andrew I. Cooper, Yejin Choi
cs.AI
摘要
大语言模型已在各类自然语言任务中展现出卓越的推理能力,但在科学发现领域的可比性突破仍较为有限,因为理解复杂物理现象需要远超纯语言的多维度表征。以MOF(金属有机框架)这类功能材料的设计为例——其对碳捕集、储氢等重要应用至关重要。由于存在海量可能的三维原子排列方式,且需严格遵循配位几何与拓扑的网状规则,在LLMs可解读的语言化表征中导航其庞大而复杂的设计空间极具挑战性。尽管LLM在辅助简单材料系统发现方面已取得早期成果,MOF设计仍高度依赖难以仅通过文本信息编码的隐性人类专家经验。为突破此限制,我们提出首个MOF多模态大模型L2M3OF。该模型通过融合晶体表征学习与语言理解,可联合处理结构、文本与知识模态。L2M3OF采用预训练晶体编码器与轻量级投影层,将结构信息压缩至令牌空间,实现与语言指令的高效对齐。为促进训练与评估,我们构建了晶体材料的结构-属性-知识数据库,并在属性预测与知识生成任务中,将L2M3OF与GPT-5、Gemini-2.5-Pro、DeepSeek-R1等顶尖闭源大模型进行基准测试。实验表明,尽管参数量显著减少,L2M3OF在多项任务中仍优于领先的纯文本闭源模型。这些成果凸显了多模态方法在理解多孔材料中的重要性,并为新一代材料发现人工智能系统奠定了基石。
English
Large language models have demonstrated remarkable reasoning capabilities
across diverse natural language tasks. However, comparable breakthroughs in
scientific discovery are more limited, because understanding complex physical
phenomena demands multifaceted representations far beyond language alone. A
compelling example is the design of functional materials such as MOFs-critical
for a range of impactful applications like carbon capture and hydrogen storage.
Navigating their vast and intricate design space in language-based
representations interpretable by LLMs is challenging due to the numerous
possible three-dimensional atomic arrangements and strict reticular rules of
coordination geometry and topology. Despite promising early results in
LLM-assisted discovery for simpler materials systems, MOF design remains
heavily reliant on tacit human expertise rarely codified in textual information
alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM
for MOFs. L2M3OF integrates crystal representation learning with language
understanding to process structural, textual, and knowledge modalities jointly.
L2M3OF employs a pre-trained crystal encoder with a lightweight projection
layer to compress structural information into a token space, enabling efficient
alignment with language instructions. To facilitate training and evaluation, we
curate a structure-property-knowledge database of crystalline materials and
benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5,
Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms
leading text-based closed-source LLMs in property prediction and knowledge
generation tasks, despite using far fewer parameters. These results highlight
the importance of multimodal approaches for porous material understanding and
establish L2M3OF as a foundation for next-generation AI systems in materials
discovery.