A Robusta Coffee Leaf Image dataset for Improved Identification and differentiation
Abstract
This study examines the possibility of designing and developing a classification model based
on a dataset of Robusta Coffee seedling leaves taken with a Samsung Mobile phone camera.
Images of leaves of various varieties (KR1-KR10) were taken in-situ and these were
processed into a dataset that can be used in a machine learning pipeline to enable automation
in Robusta Coffee variety identification and classification. Feature extraction is one of the
major initial steps in any computer vision project. It deals with mining of important aspects
from a population of features and rules out unnecessary detail that could become a burden in
the machine learning process. Features are extracted from this dataset of coffee image leaves
through Deep Learning (DL) and Convolutional Neural Networks (CNN) and using the
python programming language and libraries such as Scikitlearn, Numpy, Matplotlib and
Open Source Computer Vision (Open CV). Before feeding them into the pipeline,
preprocessing has been carried out on each of the collected images to make it easier for
processing. Due to computational limitations, only 350 images belonging to 5 classes namely;
kr3, kr5, kr6, kr7 and kr9 were considered. These were run through data iterator that
performed a series of augmentations to cater for data variability and randomness. The
resultant model had an overall accuracy of 6.89% and a validation loss of 0.43%. Here is the
link to the notebook https://github.com/GyaviiraS/KR-Classification-
Model/blob/main/classification%20model%20100%20(1).ipynb
It was discovered that most existing automatic computer vision systems have been designed
in laboratory conditions on data captured with high end gadgets that cannot be accessed by
end-users and that these solutions have not trickled down to the end users due to their
complexity and cost of implementation. It has been demonstrated in this study that mobile
phones can be used to create a more realistic dataset for machine learning and the solution
created out of this dataset can be implemented by end user

