96 

© 2020 Adama Science & Technology University. All rights reserved 

Ethiopian Journal of Science and Sustainable Development  

e-ISSN 2663-3205                                                                           Volume 8 (2), 2021 

Journal Home Page: www.ejssd.astu.edu.et  ASTU  

Research Paper 

Medicinal Plant Part Identification and Classification using Deep Learning based on 

Multi Label Categories 

Misganaw Aguate1,, Abebe Tesfahun2, Amlakie Aschale1 

1Department of Electrical and Computer Engineering, Debre Tabor University, P. O. Box 272, Debre Tabor, Ethiopia  

2School of Electrical and Computer Engineering, Debre Markos University, P. O. Box 269, Debre Markos, Ethiopia  

Article Info  Abstract 

Article History: 

Received 11 May 2021 

Received in revised form 02 

August 2020 

Accepted 20 August 2021 

 Plants have been used as direct medicinal sources since ancient times as well as today. 

However, researchers and pharmacists are facing difficulties to identify medicinal plant parts 

before starting ingredient extraction in the laboratory. This study was conducted to identify the 

medicinal plant part based on multi-label categories by employing a sigmoid classifier as the 

last layer of Convolutional Neural Network (CNN). The study employed supervised learning 

approach in which the true values were predefined initially for the classifier using data 

annotation phase. Hence, leaf images of the plants were taken as an identity for the rest of the 

plant parts. The system was designed based on transfer learning by adopting (fine tune) the pre-

trained models that employ CNN and trained using Image Net. High-resolution cameras for 

data acquisition and google Colab for the experiment (training and testing) were used. Mobile 

Net performed best with an accuracy of 93% for training sets and 92% for testing sets. When 

the models were evaluated using F1_score, it performed 94%. Without batch normalization at 

fully connected layer, this model scored 84%. So, Mobile Net obtained higher performance, 

and suitable to classify the medicinal plant body part. It was also taken as the fastest model to 

train because Mobile Net used depth wise separable convolution method that reduces scalar 

multiplication through convolution. By observing the results obtained from the presence and 

absence of batch normalization, this study deduced that batch normalization is advantageous 

to obtain good classification performances of the models. 

Keywords:  

Medicinal plant parts  

Deep learning   

Multi-label   

Convolutional neural network   

Fine-tuned model   

1. Introduction 

Extensive research on medicinal plants identification 

has been done by various researchers (Dileep et al., 2019, 

Bandara, et al. 2019, Tan et al., 2020; etc.). However, the 

studies did not answer the questions like which part of the 

plants is used as a medicine. Accordingly, this study tried 

to identify the specific parts of medicinal plants. The 

identification of the medicinal plant parts can be done 

using image processing and chemical ingredient 

extraction. But using its chemical ingredient consumes 

time and needs high expenditure for laboratory 

                                                           


Corresponding author, e-mail: ethiomisgie@gmail.com  

https://doi.org/10.20372/ejssdastu:v8.i2.2021.380 

equipment. So, to reduce this problem image processing 

is the preferred approach.  This approach can be handled 

with the help of deep learning. It is obvious that, deep 

learning can be used in Speech Recognition, Natural 

Language Processing, Machine Translations, Audio 

Recognitions, bioinformatics, Drug design, Medical Image 

Identifications, and Medicinal plant identification and 

classification (Amuthalingeswaran et al., 2019). That 

excited us to apply image processing with deep learning. 

This study used a multi-label classification technique to 

http://www.ejssd.astu.edu/
https://doi.org/10.20372/ejssdastu:v8.i2.2021...........


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

97 
 

improve indigenous knowledge using transfer learning. 

Mobile Net, VGG16, and Inception_V3 are among the 

adopted deep learning techniques. The ability of plants to 

cure disease using its parts and the need of improvement 

of indigenous knowledge to modern medical science are 

the main points for the excitation to precede this study. 

The digitization of useful species of plants and their 

information is necessary. Several researchers have tried 

to develop a more robust and efficient plant recognition 

system by exploiting pattern recognition and image 

processing techniques based on plant leaves, flowers, 

barks, and fruits. However, leaves play a very important 

role among all the other parts of a plant as they contains 

rich information as well as more reliability (Naresh et 

al., 2016). Hence, this study used the leaves of medicinal 

plants to extract unique features. Proving of overfitting 

is not only affected by the depth of convolutional layers 

and the number of datasets, addition of batch 

normalization at the fully connected layer of CNN to 

makes the model to learn fast and obtain good 

classification performances and taking of the leaf 

images from its back side to makes the CNN to extract 

feature of the leaves uniquely and accurately are the 

major contribution of this study.  

In 2019, the extraction of shape, color and texture 

features from leaf images was employed and to train the 

Artificial Neural Network (ANN) to identify the exact 

leaf classes (R.janani, 2013). The plant species can be 

identified based on the input leaf sample (Sivaranjani et 

al., 2019). The medicinal plant data set was developed 

based on the extraction of texture and color features 

from plant leaf images (Pacifico et al., 2019). The author 

used machine learning for recognition of medicinal 

plants. But using deep learning is good to classify 

medicinal plant due to its ability to extract features 

automatically. Herbal species can also be identified 

using their flower images (Bandara et al., 2019). The 

authors have used SVM, Decision Trees and K-NN. 

Using only leaf features of shape, color, texture is not a 

distinct attribute of leaves. In Lee et al. (2017), the 

authors proved that vein and contour features are good 

to uniquely identify plant species. Raw plant leaf image 

is represented into deep features using knowledge 

transfer from object identification to plant species 

identification (Prasad et al., 2017). VGG-16 ConvNet 

architecture is used to train and classify with 

combination of PCA which can reduce feature vector to 

optimize classification cost (Prasad et al., 2017). In (Tan 

et al. 2020) the leaf images were preprocessed and the 

features were extracted by pre-trained AlexNet, fine-

tuned AlexNet and D-Leaf. These features were then 

classified by Support Vector Machine (SVM), Artificial 

Neural Network (ANN), k-Nearest Neighbour (KNN), 

NaïveBayes (NB) and CNN.  AyurLeaf which is a Deep 

Learning based CNN model proposed in (Dileep et al., 

2019) to classify medicinal plants. 

2. Methods and Materials 

This study used CNN which is the backbone of deep 

learning algorithm and can automatically extract multiple 

unique features of plant leaves. The study also employed 

transfer learning technique by adding batch normalization 

at the fully connected layer of CNN, and tune weights and 

training hyper parameters. In this study, we took the 

backside of leaves to make the model extract the features 

uniquely and identify medicinal plant parts accurately.   

2.1. Proposed System Architecture 

In this research six procedures were followed such as 

data collection, data annotation (labelling), image 

preprocessing, feature extraction and training, testing 

(model evaluation) and finally classification of 

medicinal plant parts. Figure 1 shows the high-level 

system architecture of the study. The input image phase 

includes data acquisition and annotation (data 

labelling). Model Building phase includes feature 

extraction and training tasks. The detail description is 

mentioned in the following consecutive sections. 

2.2. Data Acquisition 

The data was collected using a high-resolution 

camera (TECHNO SPARK K7 with 13MP and 

SAMSUNG A30S with 25MP). These sample data 

weree taken from Gojjam, Jimma, and Kefa. 

These data collection procedure took two months. 

When taking images of the plant leaves, to avoid data 

augmentation, the different light intensity strength and 

direction of leaf positions were considered. Backsides of 

the leaves were captured to extract vein feature accurately. 

We mde the focal length different depending on the 

broadness of leaves to make the prediction consistent. 

The dataset contains 15,100 medicinal plant leaf images. 

Each plant has an equal number of leaf images (300 

images) but the number of plants categorized in each 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

98 
 

Figure 1:  High level system architecture 

label is different. Hence, the study has imbalanced data 

in each category. The following guidelines are taken for 

this data acquisition: 

 Medium aged and young parts of the plant leaf are 

selected; 

 The focal length of the camera (distance between 

leaves and camera) is taken depending on the 

broadness and narrowness of the plant leaves; 

 The white background of the image was selected 

to avoid confusion and get a clear leaf image 

structure; 

 Dried and diseased leaves excluded to make the 

prediction consistent, and 

 The leaf images are captured immediately after 

cut out from the plants to reduce the loss of leaf 

features. 

 The camera stands perpendicular to the leaf stand 

(i.e. the position taken 90° from the leaf stand). 

2.3. Visualizing Intensity Variation of the Data 

The frequency of pixels in RGB channels, brightness 

and, darkness of the images can be visualized by simply 

observing the vertical, left, and the right parts of the 

image histogram. Hence, Figure 2 shows images of one 

leaf but in different pixel intensity sample and Figure 3 

and 4 (histogram plot) describe how it varies based on 

its three-channel pixel intensity and distribution 

throughout the different images. 

Figure 3 shows the histogram of sample one (S1) 

image. It is different from sample three (S3). The leaf 

S1 and S3 are the same but this variation is due to light 

intensity while the images are captured. This variation 

of the datasets in a single leaf makes the model to 

classify consistently while it faces different images 

captured by different cameras and environments. 

 
Figure 2:  Intensity variation of single leaf image 

 
Figure 3: Histogram of leaf image (S1) 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

99 
 

Figure 4: Histogram of leaf image (S3) 

2.4. Data Annotation 

The labeling tasks were supported by five experts 

with 5, 10, 13, 28- and 30-year experiences who are 

currently working on traditional medicine preparation. 

Initially, 72 plants were selected. But finally, 52 were 

taken as the complete dataset. Twenty plants were 

avoided because there are not commonly labeled parts 

by the experts for those plants. Since the system is based 

on supervised learning approach, all parts of the plats 

were tagged (mapped) on the leaves image as indicated 

in Table 1, Img_1, img_2…. img_n indicates the name 

of the leaf images file. 

Table 1: Sample of data annotation (labeling) 

 
2.5. Pre-processing 

To reduce computational time during training, the 

input image was resized to 128 x 128. In the system 

architecture, normalization means scaling of image 

pixels. Image pixels are integer values from 1 up to 255. 

In CNN, processing a large integer value can disrupt or 

slow down the learning process. Hence, the image pixels 

were scaled (normalized) between 0 and 1. A series of 

array values of images pixel saved as one variable and 

later reloaded to save processing time when using this 

data for the second time. After selecting the target label, 

it was changed into series of array elements and finally 

binarize the labels into 0’s and 1’s using the label 

binarize function to be understood by Sigmoid 

activation function (Zhao et al., 2014). Hence, the 

system used a binary classifier internally for each 5 

target labels. Even though the color feature consumes 

computational time to process the image, RGB image 

used to avoid loss of some useful features. 

2.6. Model Building (Transfer learning) 

2.6.1. Feature Extraction 

The fine-tuned Models of CNN such as Mobile Net, 

VGG16, and Inception_V3 are adopted to extract useful 

leave features and feed them to the classifier. These 

feature extractions are processed in the feature learning 

layer part of CNN as indicated in Figure 5. This is 

achieved with the help of filtering kernel that slides over 

the image pixels and compute dot product to produce 

different image features, Max pooling to reduce the 

spatial size of the convolved image features, and ReLU 

activation function to allow the models learn faster and 

better by overcoming the problem of vanishing gradient 

problem. In this feature extraction, the deep learning 

model extracts the necessary features of the plant leaf 

automatically. 

 
Figure 5: Structure of convolutional neural network 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

100 
 

2.6.2. How CNN Internally Extract Leaf Features 

Figure 6 shows how CNN extract leave features. At 

1st, it detects simply identifiable patterns like horizontal 

and vertical lines present in leaves. At the 2nd layer, it 

can detect different corners on leaves as a result it tries 

to identify the shape of leaves. At the 3rd layer, it further 

computes more complex feature map identification like 

extracting differently structured veins on the surface of 

leaves. At the 4th layer, it becomes more power full to 

exploit each tiny vein structure and while going to a very 

deep layer it can identify the ubiquitous structure. So, by 

going through this execution CNN can identify leaves 

unique features for classification. In this study, feature 

map means the number of image features produced by 

convolution process. As the image size increases the 

number of features that are produced by convolution 

layer will be increases. If there is no padding, the 

convolution layer reduces the image sizes 

 
Figure 6: Feature extraction process in convolutional 

neural network 

2.6.3. Classification 

This part contains flatten, fully connected neural 

network, dropout, and sigmoid layer. Loss function of 

binary cross entropy is used because at the end layer, 

there is a sigmoid classifier shown in Equation 1. 

Sigmoid is used as classifier because this study performs 

five binary decisions (classification) internally to 

achieve multi label classification. 

𝑓(𝑥) =
1

1+exp−𝐱
                                              (1) 

Figure 7 shows the basic technical operations 

performed. In the figure, F1, F2, F3, ... Fn represent 

features, D1, D2, D3, .... Dn represents input image data 

which is feature vector as 2D array and C1, C2, C3, ... Cn 

represents target classes. 

 
Figure 7: Basic technical operation performed in this study 

2.7. Train the Model 

To save computational resources and time, the 

transfer learning techniques are adopted on Mobile Net, 

VGG16, and Inceprion_V3. The adopted models are 

pre-trained with the image net. In this procedure some 

layers of the models are tuned with the new weights and 

the remaining layers was left to freeze. At the fully 

connected layer, the new dense layers drops out and 

batch normalization are added. The advantages obtained 

from this transfer learning are, the training is completed 

in a short period of time, no need of large number of 

training epochs, and also the data needed for training are 

small. So, to customize these pre-trained models in 

addition to layer tuning, the deep learning hyper 

parameters are tuned up on the models. On Mobile Net 

the last 4 layers of convolutional base trained with the 

new weights and the remaining layer was left to freeze. 

At the fully connected layer, two layers with a 1024 unit 

and ReLU activation functions are used. Due to this 

modification total number of layers in Mobile Net 

changed from 28 to 30. Another interesting thing is 

batch normalization added at the dense layers to 

facilitate training rates and obtain good performance 

with small epochs. On the VGG16, the same changes 

with Mobile Net are applied. Unlike Mobile Net, a 

dropout layer with a value of 0.5 added to overcome the 

problem of overfitting and only one dense layer is 

added. On the Inception_V3 also applied the same as 

Mobile Net and VGG16 except all base convolutional 

layers frozen and dropout of 0.9 is used at FC. The 

common hyper-parameters applied on all models are, 

batch size that is the number of training sample utilize 

in one iteration (epoch), epochs used as the number of 

passes to determine how many repetitions a machine 

learn from input data, learning rate to determine the step 

size in each iteration, dropout to make some neuron 

inactive temporarily to overcome the problems of over 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

101 
 

fitting, and optimizers to update weight parameters and 

minimize error rate during back propagation. 

2.8. Model Performance Evaluation 

To evaluate the models Accuracy, Precision, Recall 

and F-Score, Equation 2, 3, 4 and 5 can be used (Bekkar 

et al., 2013). But in this study, we have imbalanced 

dataset among categories. Hence, F1_Score is better than 

the other to measure for the performance of models (Jeni 

et al., 2013) as it is the harmonic mean of recall and 

precision. 

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+ 𝐹𝑃 𝐹𝑁
                              (2) 

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃

𝑇𝑃+𝐹𝑃
                                         (3) 

𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃

𝑇𝑃+𝐹𝑁
                                              (4) 

𝐹1_𝑆𝑐𝑜𝑟𝑒 = 2 ∗
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
                 (5) 

where, TP, TN, FP and FN represent true positive, true 

negative, false positive and false negative respectively. 

2.9. Multi Label Setup 

The parts of the plant are taken as multi label for 

corresponding plant dataset to classify it as multi label 

classification. The target class labelled as L= [l1, l2, l3, 

ln] where L is the list of labels which can be taken as a 

target class for a given input dataset. ln is an element of 

(0, 1). For a specific input row from element of target 

label array, ln=0 means, the input leaf image has no the 

target label (the label is plant part) and ln = 1 means, 

input leaf image have that target label. 

2.10. Working Principle of the System 

Figure 8 contains two main parts such as image data 

processing and target label processing. On the image 

processing side, after putting image data in a folder and 

reading the CSV file, these two files are concatenated to 

read a full image file. From a folder, the original image 

of leaves taken and from the CSV file, the 

corresponding filename of the image taken is labeled in 

the first column of the CSV file. The input image is 

resized to 128 x 128 and the pixel values are normalized 

between 0 and 1 to reduce computation time during 

training. The image pixels are changed into a series of a 

2D array to be understandable by computer machines 

(models) while processing it. The array values of image 

pixel with the corresponding mapped (labeled) classes 

are saved as one variable and later reload to save 

computational time when these data are needed later. 

Two of the columns are dropped from the CSV file 

which holds image name and label tags, and leave the 

remaining column as the target label of corresponding 

image files. Figure 2-5 indicates the image data and 

corresponding target labels is split into training and 

testing set as X-train, X-test, and Y-train, Y-test to 

evaluate the models. Here 20% of the image data and 

target label are taken for the test set for one-time cross 

validation purpose. Training set is used to train the model 

 
Figure 8: Over all working principle of the system 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

102 
 

and the testing set is unseen data during training but 

used later to test the model. Further 20% from X-train 

and Y-train data are taken the validation set to cross 

check whether the model train well or not within each 

epoch during training. The next tasks are fitting X-train, 

Y-train, and validation data to the model and train it. 

Here fine-tuned Mobile Net, VGG16, and Inception-V3 

are used on the train model section. Then we test the 

efficiency of the models using X-test Y-test. During this 

procedure the validity of the model are evaluated and 

then if the models are performed good, predictions of 

medicinal plant parts are takes place otherwise tuning of 

training and model hyper parameters are applied and 

train the model again until good result is obtained. 

2.11. Experimental Setup 

For the experimental analysis, first all image data 

with corresponding target labels are uploaded to google 

drive and the data further mounted to Google Colab. 

Then Google Colab with free 12 GB RAM and Tesla 

K80 GPRU0.0 is used for data preprocessing, model 

building (training), testing, evaluation and predication 

(classification). The data is split to 80% for training and 

20% for test. 15,100 total medicinal plant images are 

prepared. From this data first 20% are taken for testing 

which are 3,020 images and the remaining 12,080 data 

taken for training. From 12,080 data, again 20% (2,416 

images) is taken for validation. So, the data are split in 

to 9,664 for training, 3,020 for testing (unseen data 

during training), and 2,416 for validation. In all models, 

weights of image net and custom weights are used and 

the dense layer removed and replaced with new 

convolutional layers including batch normalization and 

dropout layer. Batch normalization to increase the speed 

of learning and acquire higher accuracy values. For the 

trained models in each experiment, the best optimizer is 

selected and mentioned in the table with the 

corresponding batch sizes. Different deep learning 

hyper parameters are tuned to come up with the best 

solution (Table 2). 

3. Results and Discussion 

3.1. Mobile Net 

As indicated in Table 3, Adam gave a good result 

based on training and testing losses. Even though 

RMSprop score higher result in training and testing 

accuracy, it did not reduce the losses. 

Figure 9 shows the best case of training and 

validation accuracy/loss of fine-tuned Mobile Net 

model using Adam and RMSprop optimizer, with batch 

size of 32 and 128. This Model gives a higher result with 

this specified batch size and optimizers. 

 
Table 2: Hyper parameter tuning 

Hyper parameter  MobileNet  VGG16 InceptionNet 

 
Batch sizes 

32 Applied Applied Applied  

64 >> >> >> 

128 >> >> >> 

Learning rates 1e-4 >> >> >> 

1e-3 >> >> >> 

 
Optimizers 

Adam >> >> >> 

Adamax >> >> >> 

Adagrad >> >> >> 

RMSprop >> >> >> 

Dropout  ………… ………. 0.5 0.9 

Weight modification  >> >> Not used  

 
Table 3: Training and testing accuracy/loss of Mobile net 

Batch size  Optimizers  Training accuracy  Testing accuracy  Training loss  Testing loss 

32 Adam 92.59 92.15 0.34 1.29 

RMSprop 93.33 91.82 0.87 3.03 

64 Adam 91.83 91.69 0.36 1.39 

128 Adam 92.32 91.75 0.28 1.22 

RMSprop 92.63 92.12 1.63 3.99 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

103 
 

Figure 9: Training and testing accuracy/loss using optimizer Adam and RMSprop respectively 

3.2. VGG16 

Training and testing accuracy/loss of VGG16 is 

included in Table 4. Using batch size 32 and Adamax 

gives preferable result but RMSprop using batch size 64 

and Adam using batch size 128 gives valid accuracy. 

When compared among loss values Adam using batch 

size 128 gives an acceptable result. So, Adam is taken 

as a good optimizer for VGG16. 

3.3. Inception_V3 

As seen from Table 5, inception_V3 cannot give a 

satisfactory result compared to Mobile Net and VGG16. 

But when compared by itself RMSprop gives good 

results on the batch size of 32, 64, and 128.  

Generally, when we look in to the result analysis, 

Mobile Net is a good and fastest classifier than VGG 16 

and Inception Net. The reason behind is that Mobile Net 

constructed by depth wise separable convolution layers 

that makes the model to learns fast and reduce the 

problem of overfitting. The advantage of using depth 

wise separable convolutional layer is to reduce the value 

of total scalar multiplications produced by convolution 

process. 

3.4. Accuracy of Models with Batch Normalization 

In this part, the study tried to show the empirical 

result of training accuracy, testing accuracy, training 

loss, and testing loss (Figure 10 and 11) with the 

presence of batch normalization at fully connected layer 

of CNN. Since our data is imbalanced, in this study 

accuracy is not used to measure the performance of the 

models. Because accuracy places more weight on the 

large classes than on small classes, which makes it 

difficult for a classifier to perform well on the rare 

classes, it become a misleading indicator (Bekkar, et al. 

2013). 

Table 4: Training and testing accuracy/loss of VGG16 

Batch size  Optimizers  Training accuracy  Testing accuracy  Training loss  Testing loss 

32 Adamax 88.86 86.95 0.39 1.69 

64 RMSprop 91.71 90.83 0.64 1.6 

128 Adam 91.78 90.93 0.4 1.33 

 
Table 5: Training and testing accuracy/loss of Inception net 

Batch size  Optimizers  Training accuracy  Testing accuracy  Training loss  Testing loss 

32 RMSprop 74.30 70.93 7.67 11.23 

64 RMSprop 74.77 71.32 6.58 10.27 

128 RMSprop 75.2 72.25 6.31 9.83 

 
Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

104 
 

Figure 10: Training and testing accuracies of the models 

 
Figure 11: Training and testing losses of the models 

3.5. Accuracy of Models without Batch Normalization 

Figure 12 and 13 shows an experimental result of all 

model accuracy and loss for both training and testing 

datasets respectively. This is taken as higher in case of 

training the models without batch normalization. But it 

is almost the same as the worst case of models with 

batch normalization. 

 
Figure 12: Training and testing accuracies of the models without batch normalization 

 
Figure 13: Training and testing losses of the models without batch normalization 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

105 
 

3.6. Performance Evaluation 

To measure the performance of models, this study 

used the hyper parameters, the effect of batch 

normalization when it is added at fully connected layer 

of CNN and the type of convolutional layer such as 

depth wise separable and standard convolutional layer 

as evaluation benchmarks. The dataset in each category 

is not balanced (each category has different amount of 

data). Hence, when there are datasets that have 

unbalanced classes a better choice is F1_score, which 

can be interpreted as a weighted average of the precision 

and recall values (Jeni et al., 2013). As indicated in 

Figure 14, Mobile Net performs good using a batch size 

of 32 and optimizer of Adam and RMSprop. This 

indicates that Mobile Net is well trained on a new 

medicinal plant data than other models. 

 
Figure 14: Model performance evaluation result of the 

models 

 
Figure 15: Comparison of model (Mobile Net) accuracy 

with and without batch normalization 

In Figure 15, the study tried to show performance 

comparison of with and without batch normalization by 

applying on Mobile Net. As indicated, for multi label 

classification Mobile Net without batch normalization 

decreases the accuracy by 8.2 % and 13.9 % using Adam 

and RMSprop respectively. This is a big difference and 

hence for image classification based on multi label 

technique application of batch normalization on the 

dense layers of convolutional neural network is highly 

recommended. 

3.7. Classification Report 

Figure 16 represents the confusion matrix on which 

the models are measured with the test data set by 

observing how many test samples of the data are 

classified correctly and how many are misclassified in 

each class (category). This classification report is taken 

as the best performance from the 72 experiments. The 

result is obtained using a learning rate of 1e-4, batch size 

of 32, and optimizer of Adam. The right side of the 

figure shows how many of the sample data are classified 

in True Positive (TP), False Positive (FP), True 

Negative (TN), and False Negative (FN) in each of the 

five classes. The root of the plant is taken as an 

illustration for the confusion matrix. It has the true 

positive value of 732, false-positive value 4, true 

negative 2,223, and false negative 61. Hence, the root 

has a total test data of 793. From these data, 732 samples 

are classified into the root category, 2 samples are 

misclassified as leaf and seed, 2 samples are 

misclassified as unidentified. This shows the model 

misses only 0.5 % of the correct classification and a 

desirable result of the classification handled by Mobile 

Net. 

Table 6 is taken from the classification report of 

Mobile Net to show the performances of the model in 

each category (class). The results of F1_score is taken 

for the description because this evaluation metric takes 

the advantages of both precision and recall by making 

the harmonic mean of Precision and Recall. The 

F1_score shows, the model obtained the adequate 

performances on both classes. It is also possible to make 

sure by referring from the above confusion matrix. The 

prediction results at the end of this section also another 

evidence for the performance of a model that is adopted 

in this study. 

 
Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

106 
 

Figure 16: Confusion matrix of the best model (MobileNet) 

Table 6: Performance measurement results of the best model (MobileNet) 

 
3.8. Prediction 

The following screenshot is taken from the sample 

demonstration which indicates as the model can predict 

the medicinal plant parts effectively. The parts that have 

the percentage value approaches to 1 means the model 

classify accurately. No matter about the value 1 and 0.9, 

 
Figure 17: Samples of medicinal plant predicted by trained model

Categories (classes) Precision in% Recall in % F1_Scoer in % No. of test data 

Root  99 92 96 793 

Bark  100 89 94 225 

Leaf  96 92 94 1350 

Seed  72 99 84 363 

Un identified  100 98 88 289 

Micro average 94 94 94  

3020 Macro average 94 94 93 

Weighted average 95 94 94 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

107 
 

it is simply calculated as the decimal point. It is easy to 

make it whole number by multiplying by 100 and 

change to 100 % and 99%. 

4. Conclusion and Recommendations 

This research is the first to classify the specific parts 

of medicinal plants using multi-label classification. Since 

there is no published medicinal plants data based on 

their specific parts, the dataset is prepared based on the 

reference of pathobiological researches and consultation 

of experts who are currently working on preparations of 

traditional medicine. From the observation of data 

collections, leave part of the plants takes the majority 

role in curing diseases. 

Based on the experimental results, we can conclude 

that over fitting of the models does not only depend on 

the number of a data set and the depth of hidden layers 

in CNN, it is also determined by the total number of 

scalar multiplication of filters (kernel) calculated by the 

convolution process. In this study, Mobile Net contains 

30 convolutional layers and VGG16 has only 16 layers. 

But the Mobile Net does not over fit than VGG16, 

because it uses depth wise separable convolution 

technique. This technique reduces the scalar 

multiplication of the kernel by a factor of 1/20. So, the 

total number of scalar multiplications calculated in 

Mobile Net is very small compared to VGG16. That is 

why Mobile Net does not over fit even though it has the 

maximum number of convolutional layers than VGG16. 

In this study, three main pillar points are identified: 1) 

batch normalization at the fully connected layers of the 

CNN makes the model learn faster in a small number of 

training epochs, 2) the utilization of plant leaves image 

from its backside enables the models to extracts the vein 

features accurately. The combined effect of the two 

cases leads the model to score a good performance while 

it is tested by unseen (new) data, and 3) depth wise 

separable convolution highly determines the over fitting 

and under fitting conditions of the models.  

Finally, this study deduced that using a learning rate 

of 1e-4, batch size of 32, and optimizer of Adam, the 

models achieved classification performances of 94%, 

86%, and 68% for Mobile Net, VGG16, and 

Inception_V3, respectively. This result is based on 

F1_score evaluation metric. Generally, after conducting 

more than 72 experiments on 15,100 images of 

medicinal plants, the study concludes Mobile Net is the 

fastest model to train and test medicinal plant parts. This 

model is also selected as a suitable deep learning 

technique to come up with a good solution for the 

identified problems by achieving higher classification 

performance than VGG16 and Inception_V3. 

This study introduces the idea and testing of the 

identification and classification of medicinal plant parts 

using deep learning technique based on multi label 

categories to answers the question of which part of 

medicinal parts are used to medicine preparation? But 

the study does not incorporate the identification of 

diseases cured by the identified parts of the plants, 

limited with small number of data sets, and the optimal 

learning rate is manually searched using grid search 

method. For the future work, we recommend researchers 

to incorporate the following concept for more relevant 

results: 

 It is good using FastAI (a deep learning library 

running on top of Pytorch) to make the 

classification task easier and fast by finding the 

optimal learning rate using learning rate finder. 

 Increasing the number of data set will leads the 

model to obtain accurate results. 

 If researchers incorporate the corresponding 

diseases to be cured by specified parts, the output 

of a system will be more useful and relevant. 

 If the future researcher works on the way that can 

increase the model performance, using multi label 

classification makes the research finding 

appreciable.

Reference 

Amuthalingeswaran, C., Sivakumar, M., Renuga, P., Alexpandi, S., Elamathi, J., & Hari, S. S. (2019). Identification of medicinal 

plant’s and their usage by using deep learning. Proceedings of the International Conference on Trends in Electronics 

and Informatics, ICOEI 2019, Icoei, 886–890. https://doi.org/10.1109/ICOEI.2019.8862765 

Bandara, M., & Ranathunga, L. (2019). Texture Dominant Approach for Identifying Ayurveda Herbal Species using Flowers. 

MERCon 2019 - Proceedings, 5th International Multidisciplinary Moratuwa Engineering Research Conference, 117–

122. https://doi.org/10.1109/MERCon.2019.8818944 

Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation Measures for Models Assessment over Imbalanced Data Sets. 


Misganaw Aguate et al.                                                                                                  Ethiop.J.Sci.Sustain.Dev., Vol. 8 (2), 2021 

108 
 

Journal of Information Engineering and Applications, 3(10): 27–38.  

Dileep, M. R., and P. N. Pournami. 2019. “AyurLeaf: A Deep Learning Approach for Classification of Medicinal Plants.” IEEE 

Region 10 Annual International Conference, Proceedings/TENCON 2019-Octob: 321–25. 

Jeni, L. A., Cohn, J. F., & De La Torre, F. (2013). Facing imbalanced data - Recommendations for the use of performance 

metrics. Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 

2013, 245–251. https://doi.org/10.1109/ACII.2013.47 

Lee, S. H., Chan, C. S., Mayo, S. J., & Remagnino, P. (2017). How deep learning extracts and learns leaf features for plant 

classification. Pattern Recognition, 71: 1–13. https://doi.org/10.1016/j.patcog.2017.05.015 

Naresh, Y. G., and Nagendraswamy, H.S. (2016). “Classification of Medicinal Plants: An Approach Using Modified LBP with 

Symbolic Representation.” Neurocomputing 173: 1789–97. http://dx.doi.org/10.1016/j.neucom.2015.08.090. 

Pacifico, L. D. S., Britto, L. F. S., Oliveira, E. G., & Ludermir, T. (2019). Automatic classification of medicinal plant species 

based on color and texture features. Proceedings - 2019 Brazilian Conference on Intelligent Systems, BRACIS 2019, 

741–746. https://doi.org/10.1109/BRACIS.2019.00133 

Prasad, S., & Singh, P. P. (2017). Medicinal plant leaf information extraction using deep features. IEEE Region 10 Annual 

International Conference, Proceedings/TENCON, 2017-Decem, 2722–2726. https://doi.org/10.1109/TENCON.2017.8228324 

R.janani, A. G. (2013). Identification of selected medicinal plant leaves using image features and ANN. International Conference 

on Advanced Electronics System, IEEE, 99–117. 

Sivaranjani, C., Kalinathan, L., Amutha, R., Kathavarayan, R. S., & Jegadish Kumar, K. J. (2019). Real-time identification of 

medicinal plants using machine learning techniques. ICCIDS 2019 - 2nd International Conference on Computational 

Intelligence in Data Science, Proceedings, 1–4. https://doi.org/10.1109/ICCIDS.2019.8862126 

Tan, J. W., Chang, S. W., Abdul-Kareem, S., Yap, H. J., & Yong, K. T. (2020). Deep Learning for Plant Species Classification 

Using Leaf Vein Morphometric. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1): 82–

90. https://doi.org/10.1109/TCBB.2018.2848653