ChatPaper.aiChatPaper

思考:更少数据,更优推理——重新审视法语大语言模型

Pensez: Less Data, Better Reasoning -- Rethinking French LLM

March 17, 2025
作者: Huy Hoang Ha
cs.AI

摘要

大型语言模型(LLMs)在多种自然语言处理任务中展现了卓越的能力。然而,在数学推理和非英语语言等专业领域实现强劲表现,通常需要在大规模数据集上进行广泛训练。本文探讨了一种截然不同的方法:通过对一个高质量、双语(英法)的小型数据集进行策略性微调,以增强大型语言模型的推理能力和法语熟练度。我们并未依赖数据规模,而是探索了这样一个假设:针对性的数据筛选和优化训练能够实现竞争性乃至更优的性能。通过仅对2000个精心挑选的样本进行有监督微调(SFT),我们展示了在数学推理方面的显著提升。具体而言,Pensez 7B模型在AIME25上的准确率较基础模型提高了20%,在法语MATH五级基准测试中提升了12%。这些结果挑战了当前普遍认为大规模数据集是LLMs实现强大推理性能必要条件的假设,凸显了策略性数据筛选和优化微调在提升专业技能和多语言能力方面的潜力。我们的发现对于高效开发高性能、多语言LLMs,特别是在资源受限的情况下,具有重要意义。
English
Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks. However, achieving strong performance in specialized domains like mathematical reasoning and non-English languages often requires extensive training on massive datasets. This paper investigates a contrasting approach: strategic fine-tuning on a small, high-quality, bilingual (English-French) dataset to enhance both the reasoning capabilities and French language proficiency of a large language model. Rather than relying on scale, we explore the hypothesis that targeted data curation and optimized training can achieve competitive, or even superior, performance. We demonstrate, through targeted supervised fine-tuning (SFT) on only 2,000 carefully selected samples, significant improvements in mathematical reasoning. Specifically, Pensez 7B exhibits an increase in accuracy of the base model up to 20% on the AIME25 and a 12% increase on a French MATH level 5 benchmark. These results challenge the prevailing assumption that massive datasets are aprerequisite for strong reasoning performance in LLMs, highlighting the potential of strategic data curation and optimized fine-tuning for enhancing both specialized skills and multilingual capabilities. Our findings have implications for the efficient development of high-performing, multilingual LLMs, especially in resource-constrained scenarios.

Summary

AI-Generated Summary

PDF52March 19, 2025