AgroBench：農業領域的視覺-語言模型基準測試

摘要

精準自動化理解農業任務，如病害識別，對於可持續作物生產至關重要。視覺語言模型（VLMs）的最新進展有望通過易於操作的文本交流促進人機互動，從而進一步拓展農業任務的範圍。在此，我們介紹AgroBench（農學家AI基準），這是一個用於評估VLMs在七個農業主題上的基準，涵蓋農業工程的關鍵領域並與實際農業相關。與近期的農業VLM基準不同，AgroBench由專家農學家進行註釋。我們的AgroBench涵蓋了最先進的類別範圍，包括203種作物類別和682種病害類別，以全面評估VLM的能力。在AgroBench上的評估中，我們發現VLMs在細粒度識別任務上仍有改進空間。值得注意的是，在雜草識別方面，大多數開源VLMs的表現接近隨機。憑藉我們廣泛的主題和專家註釋的類別，我們分析了VLMs所犯錯誤的類型，並為未來VLM的發展提出了潛在路徑。我們的數據集和代碼可在https://dahlian00.github.io/AgroBenchPage/ 獲取。

English

Precise automated understanding of agricultural tasks such as disease identification is essential for sustainable crop production. Recent advances in vision-language models (VLMs) are expected to further expand the range of agricultural tasks by facilitating human-model interaction through easy, text-based communication. Here, we introduce AgroBench (Agronomist AI Benchmark), a benchmark for evaluating VLM models across seven agricultural topics, covering key areas in agricultural engineering and relevant to real-world farming. Unlike recent agricultural VLM benchmarks, AgroBench is annotated by expert agronomists. Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories, to thoroughly evaluate VLM capabilities. In our evaluation on AgroBench, we reveal that VLMs have room for improvement in fine-grained identification tasks. Notably, in weed identification, most open-source VLMs perform close to random. With our wide range of topics and expert-annotated categories, we analyze the types of errors made by VLMs and suggest potential pathways for future VLM development. Our dataset and code are available at https://dahlian00.github.io/AgroBenchPage/ .

AgroBench：農業領域的視覺-語言模型基準測試

AgroBench: Vision-Language Model Benchmark in Agriculture

摘要

Support