Herbarium specimens are the physical evidence of plant diversity world-wide and a rich source of research data. With c. 402 M items, distributed in 3.9 k registered herbaria, in 182 countries, it is estimated that c. 12.8% have associated digital images. Despite this low proportion, information in these images facilitates numerous avenues of research aimed at understanding the hidden complexity and variation of plants. The first step in harvesting this information is to sort images by the types of items represented within them. To address this challenge, we present Herbariograph, a new open image dataset and deep-learning model designed to automatically recognize all the image types commonly stored in collection databases. The dataset consists of 17 image categories with 12 288 images per category gathered from 43 institutions. A Convolutional Neural Network was trained on the Herbariograph dataset. The trained model produces a test macro F1 score of 0.9611. Herbariograph will help to automate specimen image processing to make targeted dataset creation faster, better, and more accessible.
Keywords: Artificial Intelligence; biodiversity; herbarium specimen; image classification; images; machine learning.
© 2025 The Author(s). New Phytologist © 2025 New Phytologist Foundation.