大型語言模型對科學發現的影響：使用 GPT-4 進行初步研究

摘要

近年來，自然語言處理領域取得了突破性進展，催生了功能強大的大型語言模型（LLMs），展現出在包括自然語言理解、生成和翻譯以及超越語言處理範疇的任務中卓越的能力。本報告深入探討LLMs在科學發現背景下的表現，專注於當前最先進的語言模型 GPT-4。我們的研究涵蓋了藥物發現、生物學、計算化學（密度泛函理論（DFT）和分子動力學（MD））、材料設計以及偏微分方程（PDE）等多樣的科學領域。對GPT-4在科學任務上進行評估對於揭示其在各種研究領域的潛力、驗證其特定領域專業知識、加速科學進展、優化資源配置、引導未來模型發展以及促進跨學科研究至關重要。我們的探索方法主要包括專家驅動的案例評估，提供對模型對複雜科學概念和關係的理解的定性見解，以及偶爾的基準測試，定量評估模型解決明確領域特定問題的能力。我們的初步探索顯示，GPT-4展現出在各種科學應用中具有潛力，表現出處理複雜問題解決和知識整合任務的能力。廣義而言，我們評估了GPT-4的知識庫、科學理解、科學數值計算能力以及各種科學預測能力。

English

In recent years, groundbreaking advancements in natural language processing have culminated in the emergence of powerful large language models (LLMs), which have showcased remarkable capabilities across a vast array of domains, including the understanding, generation, and translation of natural language, and even tasks that extend beyond language processing. In this report, we delve into the performance of LLMs within the context of scientific discovery, focusing on GPT-4, the state-of-the-art language model. Our investigation spans a diverse range of scientific areas encompassing drug discovery, biology, computational chemistry (density functional theory (DFT) and molecular dynamics (MD)), materials design, and partial differential equations (PDE). Evaluating GPT-4 on scientific tasks is crucial for uncovering its potential across various research domains, validating its domain-specific expertise, accelerating scientific progress, optimizing resource allocation, guiding future model development, and fostering interdisciplinary research. Our exploration methodology primarily consists of expert-driven case assessments, which offer qualitative insights into the model's comprehension of intricate scientific concepts and relationships, and occasionally benchmark testing, which quantitatively evaluates the model's capacity to solve well-defined domain-specific problems. Our preliminary exploration indicates that GPT-4 exhibits promising potential for a variety of scientific applications, demonstrating its aptitude for handling complex problem-solving and knowledge integration tasks. Broadly speaking, we evaluate GPT-4's knowledge base, scientific understanding, scientific numerical calculation abilities, and various scientific prediction capabilities.

大型語言模型對科學發現的影響：使用 GPT-4 進行初步研究

The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

摘要

Support