比較：AI：法國政府推出大型語言模型競技場，旨在收集法語人類提示詞與偏好數據

摘要

大型语言模型（LLMs）在非英语语境中常表现出性能下降、文化适应性不足及安全鲁棒性减弱等问题，部分原因在于预训练数据与人类偏好对齐数据集均以英语为主导。基于人类反馈的强化学习（RLHF）和直接偏好优化（DPO）等训练方法需要人类偏好数据，但除英语外的多数语言仍面临此类数据稀缺且非公开的困境。为填补这一空白，我们推出compar:IA——一项由法国政府内部开发的开源数字公共服务，旨在从以法语使用者为主的普通受众中收集大规模人类偏好数据。该平台采用盲选成对比较界面，在保持低参与门槛和隐私保护自动过滤的同时，采集涵盖多样化语言模型的真实场景无约束提示词及用户评判。截至2026年2月7日，compar:IA已收集超过60万条自由形式提示词和25万次偏好投票，其中约89%数据为法语。我们以开放许可形式发布三个互补数据集（对话记录、投票数据及互动反馈），并呈现初步分析成果，包括法语模型排行榜和用户交互模式。超越法国本土语境，compar:IA正逐步发展为国际数字公共产品，为多语言模型训练、评估及人机交互研究提供可复用基础设施。

English

Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and human preference alignment datasets. Training methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) require human preference data, which remains scarce and largely non-public for many languages beyond English. To address this gap, we introduce compar:IA, an open-source digital public service developed inside the French government and designed to collect large-scale human preference data from a predominantly French-speaking general audience. The platform uses a blind pairwise comparison interface to capture unconstrained, real-world prompts and user judgments across a diverse set of language models, while maintaining low participation friction and privacy-preserving automated filtering. As of 2026-02-07, compar:IA has collected over 600,000 free-form prompts and 250,000 preference votes, with approximately 89% of the data in French. We release three complementary datasets -- conversations, votes, and reactions -- under open licenses, and present initial analyses, including a French-language model leaderboard and user interaction patterns. Beyond the French context, compar:IA is evolving toward an international digital public good, offering reusable infrastructure for multilingual model training, evaluation, and the study of human-AI interaction.

比較：AI：法國政府推出大型語言模型競技場，旨在收集法語人類提示詞與偏好數據

compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data

摘要

Support