ChatPaper.aiChatPaper

L^2M^3OF:面向金属有机框架的大语言多模态模型

L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

October 23, 2025
作者: Jiyu Cui, Fang Wu, Haokai Zhao, Minggao Feng, Xenophon Evangelopoulos, Andrew I. Cooper, Yejin Choi
cs.AI

摘要

大型语言模型在多样化自然语言任务中展现出卓越的推理能力,但在科学发现领域的可比性突破仍较为有限,因为理解复杂物理现象需要远超纯语言的多维度表征。以功能性材料(如对碳捕集、储氢等重要应用至关重要的金属有机框架材料)的设计为例,由于其存在海量可能的三维原子排列方式及严格的配位几何与拓扑规则,在LLMs可解读的基于语言的表征体系中导航其庞大而复杂的设计空间极具挑战性。尽管LLM在辅助简单材料体系发现方面已取得早期成果,MOF设计仍高度依赖难以仅通过文本信息编码的隐性人类专业知识。为突破此限制,我们提出首个面向MOF的多模态大语言模型L2M3OF。该模型通过融合晶体表征学习与语言理解能力,可联合处理结构、文本与知识模态。L2M3OF采用预训练晶体编码器与轻量级投影层,将结构信息压缩至词元空间,实现与语言指令的高效对齐。为促进训练与评估,我们构建了晶体材料的结构-属性-知识数据库,并以GPT-5、Gemini-2.5-Pro和DeepSeek-R1等顶尖闭源LLM为基准进行测试。实验表明,L2M3OF在属性预测与知识生成任务中均优于领先的纯文本闭源LLM,且参数量显著减少。这些成果凸显了多模态方法对多孔材料理解的重要性,并为新一代材料发现人工智能系统奠定了基石。
English
Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.
PDF22December 2, 2025