Mushroom data creation, curation, and simulation to support classification tasks
- Author: mycolabadmin
- 4/14/2021
- View Source
Summary
This study creates a new dataset of over 61,000 mushroom records from 173 species to help computers learn to identify whether mushrooms are safe to eat or poisonous. The researchers extracted mushroom information from an identification textbook and used computer programs to generate realistic hypothetical mushroom entries. They tested different AI methods and found that random forests (a type of machine learning algorithm) worked best, achieving perfect accuracy in identifying poisonous versus edible mushrooms.
Background
Mushroom identification is challenging as many species share similar characteristics. The 1987 UCI mushroom dataset containing only 23 species from 2 families has been widely used for binary classification tasks, but it is outdated and not representative of mushroom diversity. A more comprehensive dataset is needed to better reflect the complexity of mushroom classification.
Objective
To create a comprehensive mushroom dataset with 173 species from 23 families for binary classification of edible versus poisonous mushrooms. The study aims to develop a reproducible workflow for data extraction from textbooks, curation, and simulation while maintaining comparability with the historical 1987 dataset.
Results
The random forests classifier achieved perfect classification results (accuracy and F2-score of 1.0) on the secondary data. Unlike the 1987 data which showed linear separability using LDA, the new secondary data required non-linear classification. The dataset contained 61,069 hypothetical entries with balanced class distribution and no concerning variable correlations.
Conclusion
The newly created secondary dataset is more representative and realistic than the 1987 dataset, capturing the complexity of mushroom identification better. The work provides a reproducible workflow and FAIR-compliant open-source data that serves as a better alternative for attribute-based binary mushroom classification tasks in research and education.
- Published in:Scientific Reports,
- Study Type:Data Creation and Classification Study,
- Source: 10.1038/s41598-021-87602-3