대형 언어 모델이 새로운 과학 연구 아이디어를 발굴할 수 있을까요?

초록

"아이디어는 새로운 조합에 불과하다." (Young, J.W.). 대형 언어 모델 (LLM)과 공개적으로 이용 가능한 ChatGPT의 널리 퍼지는 채택은 인공 지능 (AI)을 사람들의 일상생활에 통합하는 중요한 전환점을 표시했다. 본 연구는 연구 논문 정보를 기반으로 LLM의 혁신적인 연구 아이디어 생성 능력을 탐구한다. 우리는 화학, 컴퓨터, 경제, 의학 및 물리학 등 다섯 분야에서 4개의 LLM을 철저히 조사했다. Claude-2와 GPT-4에 의해 생성된 미래 연구 아이디어가 GPT-3.5와 Gemini보다 저자의 시각과 더 일치하는 것으로 발견했다. 또한 Claude-2가 GPT-4, GPT-3.5 및 Gemini 1.0보다 다양한 미래 연구 아이디어를 생성한다는 것을 발견했다. 우리는 또한 생성된 미래 연구 아이디어의 혁신성, 관련성 및 실행 가능성에 대한 인간 평가를 수행했다. 이 조사는 아이디어 생성에서 LLM의 진화하는 역할에 대한 통찰력을 제공하며, 그 능력과 한계를 강조한다. 우리의 작업은 미래 연구 아이디어 생성을 위해 언어 모델을 평가하고 활용하는 지속적인 노력에 기여한다. 우리는 데이터셋과 코드를 공개적으로 제공한다.

English

"An idea is nothing more nor less than a new combination of old elements" (Young, J.W.). The widespread adoption of Large Language Models (LLMs) and publicly available ChatGPT have marked a significant turning point in the integration of Artificial Intelligence (AI) into people's everyday lives. This study explores the capability of LLMs in generating novel research ideas based on information from research papers. We conduct a thorough examination of 4 LLMs in five domains (e.g., Chemistry, Computer, Economics, Medical, and Physics). We found that the future research ideas generated by Claude-2 and GPT-4 are more aligned with the author's perspective than GPT-3.5 and Gemini. We also found that Claude-2 generates more diverse future research ideas than GPT-4, GPT-3.5, and Gemini 1.0. We further performed a human evaluation of the novelty, relevancy, and feasibility of the generated future research ideas. This investigation offers insights into the evolving role of LLMs in idea generation, highlighting both its capability and limitations. Our work contributes to the ongoing efforts in evaluating and utilizing language models for generating future research ideas. We make our datasets and codes publicly available.