AgroBench：农业领域的视觉-语言模型基准测试

摘要

精准自动化理解农业任务，如病害识别，对于可持续作物生产至关重要。近期视觉-语言模型（VLMs）的进展，有望通过简便的文本交互促进人机互动，从而进一步拓展农业任务的应用范围。本文介绍AgroBench（农艺师AI基准），一个针对七个农业主题评估VLM模型的基准，涵盖农业工程的关键领域并与实际耕作相关。与近期其他农业VLM基准不同，AgroBench由农艺专家进行标注。我们的AgroBench覆盖了最前沿的类别范围，包括203种作物类别和682种病害类别，以全面评估VLM的能力。在AgroBench上的评估中，我们发现VLM在细粒度识别任务上仍有提升空间。特别是在杂草识别方面，多数开源VLM的表现近乎随机。凭借广泛的主题和专家标注的类别，我们分析了VLM所犯错误的类型，并为未来VLM的发展提出了潜在路径。我们的数据集和代码可在https://dahlian00.github.io/AgroBenchPage/ 获取。

English

Precise automated understanding of agricultural tasks such as disease identification is essential for sustainable crop production. Recent advances in vision-language models (VLMs) are expected to further expand the range of agricultural tasks by facilitating human-model interaction through easy, text-based communication. Here, we introduce AgroBench (Agronomist AI Benchmark), a benchmark for evaluating VLM models across seven agricultural topics, covering key areas in agricultural engineering and relevant to real-world farming. Unlike recent agricultural VLM benchmarks, AgroBench is annotated by expert agronomists. Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories, to thoroughly evaluate VLM capabilities. In our evaluation on AgroBench, we reveal that VLMs have room for improvement in fine-grained identification tasks. Notably, in weed identification, most open-source VLMs perform close to random. With our wide range of topics and expert-annotated categories, we analyze the types of errors made by VLMs and suggest potential pathways for future VLM development. Our dataset and code are available at https://dahlian00.github.io/AgroBenchPage/ .

AgroBench：农业领域的视觉-语言模型基准测试

AgroBench: Vision-Language Model Benchmark in Agriculture

摘要

Support