數學家的大型語言模型

摘要

大型語言模型（LLMs）如ChatGPT因其通用語言理解能力而受到廣泛關注，特別是它們生成高質量文本或電腦代碼的能力。對許多行業來說，LLMs是一個無價的工具，可以加快工作速度並提高工作質量。在本文中，我們討論它們在幫助專業數學家方面的潛力。我們首先對所有現代語言模型中使用的Transformer模型進行數學描述。基於最近的研究，我們概述最佳實踐和潛在問題，並報告語言模型的數學能力。最後，我們闡明了LLMs改變數學家工作方式的潛力。

English

Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We first provide a mathematical description of the transformer model used in all modern language models. Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of language models. Finally, we shed light on the potential of LMMs to change how mathematicians work.

數學家的大型語言模型

Large Language Models for Mathematicians

摘要

Support