思考:更少數據,更佳推理——重新審視法語大型語言模型
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
March 17, 2025
作者: Huy Hoang Ha
cs.AI
摘要
大型語言模型(LLMs)在多種自然語言處理任務中展現了卓越的能力。然而,要在數學推理和非英語語言等專業領域取得強勁表現,通常需要對海量數據集進行廣泛訓練。本文探討了一種截然不同的方法:在一個小型、高質量的雙語(英法)數據集上進行策略性微調,以增強大型語言模型的推理能力和法語熟練度。我們不依賴於規模,而是探索了這樣一個假設:針對性的數據策劃和優化訓練能夠實現競爭力,甚至更優的表現。我們通過僅對2000個精心挑選的樣本進行有監督的微調(SFT),展示了在數學推理方面的顯著提升。具體而言,Pensez 7B在AIME25上的基礎模型準確率提高了20%,在法語MATH五級基準測試中提高了12%。這些結果挑戰了普遍認為海量數據集是LLMs強推理性能前提的假設,凸顯了策略性數據策劃和優化微調在提升專業技能和多語言能力方面的潛力。我們的研究結果對於高效開發高性能、多語言的LLMs,特別是在資源受限的情況下,具有重要意義。
English
Large language models (LLMs) have demonstrated remarkable capabilities in
various natural language processing tasks. However, achieving strong
performance in specialized domains like mathematical reasoning and non-English
languages often requires extensive training on massive datasets. This paper
investigates a contrasting approach: strategic fine-tuning on a small,
high-quality, bilingual (English-French) dataset to enhance both the reasoning
capabilities and French language proficiency of a large language model. Rather
than relying on scale, we explore the hypothesis that targeted data curation
and optimized training can achieve competitive, or even superior, performance.
We demonstrate, through targeted supervised fine-tuning (SFT) on only 2,000
carefully selected samples, significant improvements in mathematical reasoning.
Specifically, Pensez 7B exhibits an increase in accuracy of the base model up
to 20% on the AIME25 and a 12% increase on a French MATH level 5 benchmark.
These results challenge the prevailing assumption that massive datasets are
aprerequisite for strong reasoning performance in LLMs, highlighting the
potential of strategic data curation and optimized fine-tuning for enhancing
both specialized skills and multilingual capabilities. Our findings have
implications for the efficient development of high-performing, multilingual
LLMs, especially in resource-constrained scenarios.Summary
AI-Generated Summary