비교: IA: 프랑스 정부의 프랑스어 인간 프롬프트 및 선호도 데이터 수집을 위한 LLM 아레나

초록

대규모 언어 모델(LLM)은 비영어권 언어에서 성능, 문화적 정렬, 안전성 견고성이 종종 저하되는 모습을 보이는데, 이는 부분적으로 영어가 사전 학습 데이터와 인간 선호도 정렬 데이터셋을 지배하기 때문입니다. 인간 피드백 강화 학습(RLHF) 및 직접 선호도 최적화(DPO)와 같은 훈련 방법은 인간 선호도 데이터를 필요로 하지만, 영어 이외의 많은 언어에 대해 이러한 데이터는 여전히 부족하고 대부분 공개되지 않은 상태입니다. 이러한 격차를 해결하기 위해, 우리는 프랑스 정부 내에서 개발된 오픈소스 디지털 공공 서비스인 compar:IA를 소개합니다. 이 플랫폼은 주로 프랑스어를 사용하는 일반 대중으로부터 대규모 인간 선호도 데이터를 수집하도록 설계되었습니다. 해당 플랫폼은 블라인드 쌍별 비교 인터페이스를 사용하여 다양한 언어 모델에 걸쳐 제약 없는 실제 프롬프트와 사용자 판단을 포착함과 동시에 낮은 참여 장벽과 개인정보 보호 자동 필터링을 유지합니다. 2026년 2월 7일 기준으로, compar:IA는 60만 개 이상의 자유 형식 프롬프트와 25만 개의 선호도 투표를 수집했으며, 데이터의 약 89%가 프랑스어로 구성되어 있습니다. 우리는 대화, 투표, 반응이라는 세 가지 상호 보완적인 데이터셋을 오픈 라이선스로 공개하고, 프랑스어 모델 순위표 및 사용자 상호작용 패턴을 포함한 초기 분석을 제시합니다. 프랑스 맥락을 넘어, compar:IA는 국제적인 디지털 공공재로 진화하고 있으며, 다국어 모델 훈련, 평가 및 인간-AI 상호작용 연구를 위한 재사용 가능한 인프라를 제공합니다.

English

Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and human preference alignment datasets. Training methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) require human preference data, which remains scarce and largely non-public for many languages beyond English. To address this gap, we introduce compar:IA, an open-source digital public service developed inside the French government and designed to collect large-scale human preference data from a predominantly French-speaking general audience. The platform uses a blind pairwise comparison interface to capture unconstrained, real-world prompts and user judgments across a diverse set of language models, while maintaining low participation friction and privacy-preserving automated filtering. As of 2026-02-07, compar:IA has collected over 600,000 free-form prompts and 250,000 preference votes, with approximately 89% of the data in French. We release three complementary datasets -- conversations, votes, and reactions -- under open licenses, and present initial analyses, including a French-language model leaderboard and user interaction patterns. Beyond the French context, compar:IA is evolving toward an international digital public good, offering reusable infrastructure for multilingual model training, evaluation, and the study of human-AI interaction.

비교: IA: 프랑스 정부의 프랑스어 인간 프롬프트 및 선호도 데이터 수집을 위한 LLM 아레나

compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data

초록

Support