Atla Selene Mini：一個通用目的的評估模型

摘要

我們介紹了 Atla Selene Mini，一款最先進的小型語言模型作為評判（SLMJ）。Selene Mini 是一個通用的評估器，在跨越絕對評分、分類和兩兩偏好任務的 11 個超出分佈基準上表現優於最佳的 SLMJs 和 GPT-4o-mini。它是在 RewardBench 上得分最高的 8B 生成模型，超越了像 GPT-4o 和專門的評判這樣的強基準。為了實現這一點，我們開發了一個合理的數據精選策略，通過合成生成的評論來擴充公共數據集，並通過過濾和數據集刪除來確保高質量。我們在結合了直接偏好優化（DPO）和監督微調（SFT）損失的訓練下，培養出一個高度可提示的評估器，在現實情境中表現出色。Selene Mini 在金融和醫療行業數據集上與人類專家評估的零-shot一致性顯著提高。它也對提示格式的變化具有韌性。初步結果表明，Selene Mini 是一個在現場、由社區驅動的評判競技場中排名最高的評估器。我們在 HuggingFace（https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B）和 Ollama 上釋出模型權重，以鼓勵廣泛的社區採用。

English

We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges. To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios. Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven Judge Arena. We release the model weights on HuggingFace (https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B) and Ollama to encourage widespread community adoption.