AgroBench：農業分野における視覚-言語モデルのベンチマーク

要旨

病害識別などの農業タスクの精密な自動理解は、持続可能な作物生産にとって不可欠である。近年の視覚言語モデル（VLM）の進展は、容易なテキストベースのコミュニケーションを通じた人間とモデルの相互作用を促進することで、農業タスクの範囲をさらに拡大することが期待されている。本稿では、農業工学の主要分野をカバーし、実世界の農業に関連する7つの農業トピックにわたってVLMモデルを評価するためのベンチマークであるAgroBench（Agronomist AI Benchmark）を紹介する。最近の農業VLMベンチマークとは異なり、AgroBenchは専門の農学者によって注釈が付けられている。我々のAgroBenchは、203の作物カテゴリと682の病害カテゴリを含む最先端の範囲をカバーし、VLMの能力を徹底的に評価する。AgroBenchでの評価において、VLMは細粒度の識別タスクにおいて改善の余地があることが明らかになった。特に、雑草識別では、ほとんどのオープンソースVLMがランダムに近い性能を示した。我々は、幅広いトピックと専門家による注釈付きカテゴリを用いて、VLMが犯すエラーのタイプを分析し、将来のVLM開発のための潜在的な道筋を提案する。我々のデータセットとコードはhttps://dahlian00.github.io/AgroBenchPage/で公開されている。

English

Precise automated understanding of agricultural tasks such as disease identification is essential for sustainable crop production. Recent advances in vision-language models (VLMs) are expected to further expand the range of agricultural tasks by facilitating human-model interaction through easy, text-based communication. Here, we introduce AgroBench (Agronomist AI Benchmark), a benchmark for evaluating VLM models across seven agricultural topics, covering key areas in agricultural engineering and relevant to real-world farming. Unlike recent agricultural VLM benchmarks, AgroBench is annotated by expert agronomists. Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories, to thoroughly evaluate VLM capabilities. In our evaluation on AgroBench, we reveal that VLMs have room for improvement in fine-grained identification tasks. Notably, in weed identification, most open-source VLMs perform close to random. With our wide range of topics and expert-annotated categories, we analyze the types of errors made by VLMs and suggest potential pathways for future VLM development. Our dataset and code are available at https://dahlian00.github.io/AgroBenchPage/ .

AgroBench：農業分野における視覚-言語モデルのベンチマーク

AgroBench: Vision-Language Model Benchmark in Agriculture

要旨

Support