GPTモデルは化学分野で実際に何ができるのか？8つのタスクに関する包括的ベンチマーク

要旨

自然言語処理タスクにおいて優れた能力を持つ大規模言語モデル（LLMs）が登場し、科学、金融、ソフトウェア工学など様々な分野で急速に応用されています。しかし、LLMsが化学分野を進展させる能力についてはまだ明らかではありません。本論文では、1）名称予測、2）特性予測、3）収率予測、4）反応予測、5）逆合成（生成物から反応物を予測）、6）テキストベースの分子設計、7）分子キャプショニング、8）試薬選択を含む8つの実践的な化学タスクからなる包括的なベンチマークを確立しました。我々の分析は、BBBP、Tox21、PubChem、USPTO、ChEBIなどの広く認知されたデータセットを活用し、実践的な化学の文脈におけるLLMsの能力を広範に探求します。3つのGPTモデル（GPT-4、GPT-3.5、Davinci-003）を、ゼロショットおよび少数ショットのインコンテキスト学習設定で各化学タスクに対して評価し、慎重に選ばれたデモンストレーション例と特別に設計されたプロンプトを使用しました。我々の調査の主な結果は以下の通りです：1）GPT-4は評価された3つのモデルの中で他の2つを上回る性能を示した、2）GPTモデルは、反応予測や逆合成など、分子のSMILES表現を正確に理解する必要があるタスクでは競争力のある性能を示さない、3）GPTモデルは、分子キャプショニングなどのテキスト関連の説明タスクにおいて強い能力を示す、4）GPTモデルは、特性予測や収率予測など、分類またはランキングタスクに変換可能な化学問題において、古典的な機械学習モデルと同等またはそれ以上の性能を示す。

English

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been rapidly applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper,we establish a comprehensive benchmark containing 8 practical chemistry tasks, including 1) name prediction, 2) property prediction, 3) yield prediction, 4) reaction prediction, 5) retrosynthesis (prediction of reactants from products), 6)text-based molecule design, 7) molecule captioning, and 8) reagent selection. Our analysis draws on widely recognized datasets including BBBP, Tox21, PubChem, USPTO, and ChEBI, facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Three GPT models (GPT-4, GPT-3.5,and Davinci-003) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. The key results of our investigation are 1) GPT-4 outperforms the other two models among the three evaluated; 2) GPT models exhibit less competitive performance in tasks demanding precise understanding of molecular SMILES representation, such as reaction prediction and retrosynthesis;3) GPT models demonstrate strong capabilities in text-related explanation tasks such as molecule captioning; and 4) GPT models exhibit comparable or better performance to classical machine learning models when applied to chemical problems that can be transformed into classification or ranking tasks, such as property prediction, and yield prediction.

GPTモデルは化学分野で実際に何ができるのか？8つのタスクに関する包括的ベンチマーク

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

要旨

Support