ChatPaper.aiChatPaper

利用大型語言模型探索麻省理工學院數學和電機工程學課程

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

June 15, 2023
作者: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori
cs.AI

摘要

我們匯編了一個包含4,550道問題和解決方案的全面數據集,來自麻省理工學院數學、電機工程和計算機科學(EECS)課程的問題集、期中考試和期末考試,這些課程是獲得學位所必需的。我們評估大型語言模型實現麻省理工學院數學和EECS專業的畢業要求的能力。我們的結果表明,GPT-3.5成功解決了整個麻省理工學院課程的三分之一,而GPT-4在排除基於圖像的問題後,通過提示工程實現了完美的解決率。我們在這個數據集上對一個開源的大型語言模型進行了微調。我們利用GPT-4自動評分模型回答,提供了按課程、問題和答案類型詳細的性能分析。通過將問題嵌入到低維空間中,我們探索了問題、主題和課程之間的關係,並發現哪些問題和課程需要通過少量樣本學習來解決其他問題和課程。我們的分析提供了有價值的見解,突顯了語言模型在學習和改進數學和EECS教育方面的潛力,同時強調了課程先決條件和課程設計。
English
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.
PDF92December 15, 2024