ChatPaper.aiChatPaper

利用大型语言模型探索麻省理工学院数学和电子工程计算机科学课程

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

June 15, 2023
作者: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori
cs.AI

摘要

我们整理了一个包含4,550个问题和解决方案的全面数据集,这些问题和解决方案来自麻省理工学院数学、电气工程和计算机科学(EECS)专业的所有必修课程的习题集、期中考试和期末考试。我们评估大型语言模型实现麻省理工学院数学和EECS专业任何专业的毕业要求的能力。我们的结果表明,GPT-3.5成功解决了整个麻省理工学院课程的三分之一,而GPT-4在排除基于图像的问题后,在测试集上通过提示工程实现了完美解决率。我们在这个数据集上对一个开源的大型语言模型进行了微调。我们利用GPT-4自动评分模型响应,并提供了按课程、问题和答案类型详细的性能分析。通过将问题嵌入到低维空间中,我们探索了问题、主题和课程之间的关系,并发现哪些问题和课程需要通过少样本学习来解决其他问题和课程。我们的分析为课程先修要求和课程设计提供了宝贵的见解,突出了语言模型在学习和改进数学和EECS教育方面的潜力。
English
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.
PDF92December 15, 2024