ChatPaper.aiChatPaper

BioBench:超越ImageNet的科学机器学习基准新蓝图

BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks

November 20, 2025
作者: Samuel Stevens
cs.AI

摘要

ImageNet-1K线性探针迁移精度虽仍是视觉表征质量的默认评估指标,但其对科学影像的性能预测已然失效。基于46个现代视觉模型检查点的测试表明,ImageNet top-1精度仅能解释生态学任务中34%的性能差异,且在准确率超过75%的模型中存在30%的误判。我们推出BioBench——一个捕捉ImageNet缺失维度的开放生态视觉基准。该基准整合了9项公开的应用驱动型任务,涵盖4个生物分类界和6种采集模态(无人机RGB图像、网络视频、显微照片、原位与标本图像、相机陷阱帧),总计310万张图像。通过单一Python接口可实现数据下载、轻量级分类器与冻结主干网络的适配,并输出类别均衡宏F1值(同时提供FishNet和FungiCLEF的领域指标);ViT-L模型在A6000 GPU上仅需6小时即可完成评估。BioBench不仅为生态学计算机视觉研究提供了新标尺,更为构建跨领域的可靠AI科学基准提供了模板方案。代码与预测结果详见https://github.com/samuelstevens/biobench,完整结果可访问https://samuelstevens.me/biobench。
English
ImageNet-1K linear-probe transfer accuracy remains the default proxy for visual representation quality, yet it no longer predicts performance on scientific imagery. Across 46 modern vision model checkpoints, ImageNet top-1 accuracy explains only 34% of variance on ecology tasks and mis-ranks 30% of models above 75% accuracy. We present BioBench, an open ecology vision benchmark that captures what ImageNet misses. BioBench unifies 9 publicly released, application-driven tasks, 4 taxonomic kingdoms, and 6 acquisition modalities (drone RGB, web video, micrographs, in-situ and specimen photos, camera-trap frames), totaling 3.1M images. A single Python API downloads data, fits lightweight classifiers to frozen backbones, and reports class-balanced macro-F1 (plus domain metrics for FishNet and FungiCLEF); ViT-L models evaluate in 6 hours on an A6000 GPU. BioBench provides new signal for computer vision in ecology and a template recipe for building reliable AI-for-science benchmarks in any domain. Code and predictions are available at https://github.com/samuelstevens/biobench and results at https://samuelstevens.me/biobench.
PDF22December 1, 2025