颱風 T1：一個開放的泰國推理模型

摘要

本文介紹了颱風 T1，這是一個開放的努力，旨在開發一個開放的泰國推理模型。推理模型是建立在大型語言模型（LLMs）之上的一種相對新型的生成模型。推理模型在最終得出答案之前生成一長串思維，這種方法被發現能夠提高在複雜任務上的表現。然而，對於開發這種能夠在低資源語言中生成跡象的推理模型的細節相對有限。颱風 T1 提出了一個開放的努力，深入探討以監督微調利用開放數據集，而非強化學習的方式更具成本效益地開發推理模型的細節。本文分享了有關合成數據生成和訓練的細節，以及我們的數據集和模型權重。此外，我們提供了從開發一個能夠在不同領域通用並能夠使用泰語等低資源語言生成推理跡象的推理模型中獲得的見解。我們希望這一開放努力為這一領域的進一步研究奠定基礎。

English

This paper introduces Typhoon T1, an open effort to develop an open Thai reasoning model. A reasoning model is a relatively new type of generative model built on top of large language models (LLMs). A reasoning model generates a long chain of thought before arriving at a final answer, an approach found to improve performance on complex tasks. However, details on developing such a model are limited, especially for reasoning models that can generate traces in a low-resource language. Typhoon T1 presents an open effort that dives into the details of developing a reasoning model in a more cost-effective way by leveraging supervised fine-tuning using open datasets, instead of reinforcement learning. This paper shares the details about synthetic data generation and training, as well as our dataset and model weights. Additionally, we provide insights gained from developing a reasoning model that generalizes across domains and is capable of generating reasoning traces in a low-resource language, using Thai as an example. We hope this open effort provides a foundation for further research in this field.