比较：法国政府推出大语言模型竞技平台，旨在收集法语人类提示与偏好数据

摘要

大型语言模型（LLMs）在非英语语境下常出现性能下降、文化适应性不足及安全鲁棒性减弱等问题，部分归因于预训练数据与人类偏好对齐数据集中英语占主导地位。基于人类反馈的强化学习（RLHF）和直接偏好优化（DPO）等训练方法依赖人类偏好数据，但英语之外的许多语言仍面临数据稀缺且非公开的困境。为弥补这一缺口，我们推出compar:IA——一项由法国政府内部开发的开源数字公共服务，旨在从以法语使用者为主的广泛群体中收集大规模人类偏好数据。该平台采用盲选配对比较界面，在保持低参与门槛和隐私保护自动过滤的同时，采集多样化语言模型的无约束真实场景提示与用户评判。截至2026年2月7日，compar:IA已收集超过60万条自由形式提示词和25万次偏好投票，其中约89%为法语数据。我们以开放许可发布三个互补数据集（对话记录、投票数据及互动反馈），并呈现初步分析成果，包括法语模型排行榜和用户交互模式。超越法国本土语境，compar:IA正逐步发展为国际数字公共产品，为多语言模型训练、评估及人机交互研究提供可复用基础设施。

English

Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and human preference alignment datasets. Training methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) require human preference data, which remains scarce and largely non-public for many languages beyond English. To address this gap, we introduce compar:IA, an open-source digital public service developed inside the French government and designed to collect large-scale human preference data from a predominantly French-speaking general audience. The platform uses a blind pairwise comparison interface to capture unconstrained, real-world prompts and user judgments across a diverse set of language models, while maintaining low participation friction and privacy-preserving automated filtering. As of 2026-02-07, compar:IA has collected over 600,000 free-form prompts and 250,000 preference votes, with approximately 89% of the data in French. We release three complementary datasets -- conversations, votes, and reactions -- under open licenses, and present initial analyses, including a French-language model leaderboard and user interaction patterns. Beyond the French context, compar:IA is evolving toward an international digital public good, offering reusable infrastructure for multilingual model training, evaluation, and the study of human-AI interaction.