アトラ・セレネ・ミニ：汎用評価モデル

要旨

最先端の小規模言語モデル審査員（SLMJ）であるAtla Selene Miniを紹介します。Selene Miniは、11の分布外ベンチマーク全体で最高のSLMJおよびGPT-4o-miniを凌駕する汎用評価モデルです。これらのベンチマークは、絶対スコアリング、分類、およびペアワイズな選好タスクを網羅しています。RewardBenchにおいて、GPT-4oや専門の審査員などの強力なベースラインを上回る、最高スコアの8B生成モデルとなっています。これを達成するために、公開データセットに合成的に生成された批評を追加し、フィルタリングとデータセットの削除を通じて高品質を確保する、原則に基づいたデータキュレーション戦略を開発しています。モデルは、直接的な選好最適化（DPO）と教師付きファインチューニング（SFT）損失を組み合わせてトレーニングし、実世界のシナリオで優れたパフォーマンスを発揮する高度にプロンプト可能な評価モデルを生成します。Selene Miniは、金融および医療業界のデータセットにおける人間の専門家評価とのゼロショット合意が著しく向上しています。また、プロンプト形式の変化に対しても頑健です。予備結果によると、Selene Miniは、コミュニティ主導のJudge Arenaにおいて最高ランクの評価モデルであることが示されています。モデルの重みは、広範なコミュニティの採用を促進するためにHuggingFace（https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B）およびOllamaで公開されています。

English

We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges. To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios. Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven Judge Arena. We release the model weights on HuggingFace (https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B) and Ollama to encourage widespread community adoption.