EvalTree：透過層次化能力樹剖析語言模型的弱點

摘要

理想的模型評估應達成兩個目標：識別模型失敗之處，並提供可操作的改進指引。針對語言模型（LM）評估的這些目標，我們將生成弱點描述的問題形式化，即在給定LM在基準測試中每個單獨實例的表現後，生成一組以自然語言表達的弱點。我們引入了一套定量評估方法來比較不同的弱點描述方法。我們還提出了一種名為EvalTree的弱點描述方法。該方法構建了一個能力樹，其中每個節點代表一種以自然語言描述的能力，並與專門評估該能力的基準測試實例子集相連結；然後提取LM表現不佳的節點以生成弱點描述。在MATH和WildChat基準測試中，我們展示了EvalTree通過更精確和全面地識別弱點，優於基線的弱點描述方法。弱點描述進一步促進了基於弱點的數據收集，而由EvalTree識別的弱點引導的訓練數據收集，相比其他數據收集策略，更能提升LM的性能。我們還展示了EvalTree如何揭露Chatbot Arena基於人類投票的評估實踐中的缺陷。為了促進未來的研究，我們發布了我們的代碼和一個界面，使實踐者能夠互動式地探索由EvalTree構建的能力樹。

English

An ideal model evaluation should achieve two goals: identifying where the model fails and providing actionable improvement guidance. Toward these goals for Language Model (LM) evaluations, we formulate the problem of generating a weakness profile, a set of weaknesses expressed in natural language, given an LM's performance on every individual instance in a benchmark. We introduce a suite of quantitative assessments to compare different weakness profiling methods. We also propose a weakness profiling method EvalTree. It constructs a capability tree where each node represents a capability described in natural language and is linked to a subset of benchmark instances that specifically evaluate this capability; it then extracts nodes where the LM performs poorly to generate a weakness profile. On the MATH and WildChat benchmarks, we show that EvalTree outperforms baseline weakness profiling methods by identifying weaknesses more precisely and comprehensively. Weakness profiling further enables weakness-guided data collection, and training data collection guided by EvalTree-identified weaknesses improves LM performance more than other data collection strategies. We also show how EvalTree exposes flaws in Chatbot Arena's human-voter-based evaluation practice. To facilitate future work, we release our code and an interface that allows practitioners to interactively explore the capability trees built by EvalTree.

EvalTree：透過層次化能力樹剖析語言模型的弱點

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

摘要

Support