数学者のための大規模言語モデル

要旨

ChatGPTのような大規模言語モデル（LLMs）は、汎用的な言語理解能力、特に高品質なテキストやコンピュータコードを生成する能力に対して大きな注目を集めています。多くの職業において、LLMsは作業の速度を向上させ、品質を高めるための貴重なツールとなっています。本稿では、プロの数学者がLLMsをどの程度活用できるかについて議論します。まず、現代の言語モデルで使用されているトランスフォーマーモデルの数学的記述を提供します。次に、最近の研究に基づいて、ベストプラクティスと潜在的な課題を概説し、言語モデルの数学的能力について報告します。最後に、LLMsが数学者の働き方をどのように変える可能性があるかについて考察します。

English

Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We first provide a mathematical description of the transformer model used in all modern language models. Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of language models. Finally, we shed light on the potential of LMMs to change how mathematicians work.

数学者のための大規模言語モデル

Large Language Models for Mathematicians

要旨

Support