화학 분야에서 GPT 모델은 실제로 무엇을 할 수 있는가? 8가지 과제에 대한 포괄적 벤치마크

초록

자연어 처리 작업에서 강력한 능력을 보이는 대형 언어 모델(LLMs)이 등장하여 과학, 금융, 소프트웨어 공학 등 다양한 분야에 빠르게 적용되고 있습니다. 그러나 화학 분야를 발전시킬 수 있는 LLMs의 능력은 아직 명확하지 않습니다. 본 논문에서는 1) 이름 예측, 2) 특성 예측, 3) 수율 예측, 4) 반응 예측, 5) 역합성(생성물로부터 반응물 예측), 6) 텍스트 기반 분자 설계, 7) 분자 설명, 8) 시약 선택을 포함한 8가지 실용적인 화학 작업을 포괄하는 벤치마크를 구축했습니다. 우리의 분석은 BBBP, Tox21, PubChem, USPTO, ChEBI와 같은 널리 인정받는 데이터셋을 기반으로 하여, 실용적인 화학 맥락에서 LLMs의 능력을 광범위하게 탐구할 수 있도록 합니다. 세 가지 GPT 모델(GPT-4, GPT-3.5, Davinci-003)이 각 화학 작업에 대해 제로샷 및 퓨샷 인컨텍스트 학습 설정에서 신중하게 선택된 데모 예제와 특별히 설계된 프롬프트를 사용하여 평가되었습니다. 우리 연구의 주요 결과는 다음과 같습니다: 1) GPT-4가 평가된 세 모델 중 다른 두 모델을 능가함; 2) GPT 모델은 반응 예측 및 역합성과 같이 분자 SMILES 표현의 정확한 이해를 요구하는 작업에서 경쟁력이 떨어짐; 3) GPT 모델은 분자 설명과 같은 텍스트 관련 설명 작업에서 강력한 능력을 보임; 4) GPT 모델은 특성 예측 및 수율 예측과 같이 분류 또는 순위 지정 작업으로 변환할 수 있는 화학 문제에서 기존의 기계 학습 모델과 비슷하거나 더 나은 성능을 보임.

English

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been rapidly applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper,we establish a comprehensive benchmark containing 8 practical chemistry tasks, including 1) name prediction, 2) property prediction, 3) yield prediction, 4) reaction prediction, 5) retrosynthesis (prediction of reactants from products), 6)text-based molecule design, 7) molecule captioning, and 8) reagent selection. Our analysis draws on widely recognized datasets including BBBP, Tox21, PubChem, USPTO, and ChEBI, facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Three GPT models (GPT-4, GPT-3.5,and Davinci-003) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. The key results of our investigation are 1) GPT-4 outperforms the other two models among the three evaluated; 2) GPT models exhibit less competitive performance in tasks demanding precise understanding of molecular SMILES representation, such as reaction prediction and retrosynthesis;3) GPT models demonstrate strong capabilities in text-related explanation tasks such as molecule captioning; and 4) GPT models exhibit comparable or better performance to classical machine learning models when applied to chemical problems that can be transformed into classification or ranking tasks, such as property prediction, and yield prediction.

화학 분야에서 GPT 모델은 실제로 무엇을 할 수 있는가? 8가지 과제에 대한 포괄적 벤치마크

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

초록

Support