Knowledge Engineering and Data Science (KEDS)  pISSN 2597-4602 

Vol 5, No 1, December 2022, pp. 67–77  eISSN 2597-4637 

 
https://doi.org/10.17977/um018v5i12022p67-77  

©2022 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id 

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

 
Fish Image Classification using Transfer Learning Method with 

Adaptive Learning Rate   

Rizka Suhana 1, *, Wayan Firdaus Mahmudy 2, Agung Setia Budi 3  

Faculty of Computer Science, Brawijaya University 

Jl. Veteran no. 8, Malang 65145, Indonesia 
1 rizka28294@student.ub.ac.id *; 2 wayanfm@ub.ac.id; 3 agungsetiabudi@ub.ac.id 

* corresponding author 

 
I. Introduction 

Indonesia is an archipelagic country with a coral reef area of more than 85,700 km2 [1], directly 

there is the potential for abundant natural resources and very high biodiversity. Fishery production in 

Indonesia accounts for more than 50% of which comes from coastal areas, especially from seagrass 

ecosystems, mangroves, and coral reefs. Indonesia is included in the coral triangle center as the center 

area of the coral triangle [2]. More than 412 species, including 44 families and 146 genera of fish, 

have been identified in the Karimun Jawa National Park area, Jepara Regency, Central Java Province 

[3]. The diversity of reef fish or other organisms living on coral reefs indicates that the ecosystem is 

healthy [4]. Conservation activities are critical to monitor the coral reef environment regularly.  

Conservation data in video, then processed to produce fish image data. The fish image will be 

analyzed by experts, including what type of fish image is. Experts use the level of diversity of fish 

species as an indicator of a healthy coral reef ecosystem [5]. The study of Villon et al. [6] obtained an 

accuracy value of 89.3% in the manual classification of fish images, namely direct observation using 

the naked eye by researchers, and there may still be errors in classifying what types of fish are in the 

image. 

Image classification is included in the primary research area in image processing, which has broad 

prospects in various scientific fields such as image segmentation, image recognition, and many more. 

In k-Nearest Neighbors (KNN) [7], Random Forest [8], and XGBoost [9][10][11][12] are all machine 

learning methods that can be applied to image classification. In essence, the image classification 

process depends on feature extraction and feature classification composition. The first is feature 

extraction, which extracts all features from the image and is stored in tabular form. The second is 

ARTICLE INFO A B S T R A C T   

Article history: 

Received 23 June 2021 

Revised 14 July 2022 

Accepted 14 August 2022 

Published online 7 November 2022 

 
The diversity of fish species in coral reef ecosystems is one of the indications in 

determining health in coral reef ecosystems. Many Indonesian Fisheries and Marine 
Research and Development Agency experts carefully classify fish images. A reliable 

technique for performing image classification is Convolutional Neural Network 

(CNN). Transfer learning appears and adopts part of CNN, namely the modified 

convolution layer. The paper aims to solve the fish classification problem using the 
pre-trained model of Mobilenet V2. The model has a low computational process and 

does not use too many memory resources when training image data. The research 

image data used is 49,281 data of various sizes and 18 types of fish. The image is 

entered into the transformation process (random rotation, random resize crop, random 
horizontal flip) on the training and test data to produce varied data. After the 

transformation process, the image data is entered into the training process using the 

Mobilenet V2 architecture. Testing the Mobilenet V2 architectural model obtained an 

accuracy score of 99.54%, which is reliable in classifying fish images. 

This is an open access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/). 

Keywords: 

Fish Images 

Image Classification 

CNN 

Transfer Learning 

Mobilenet v2 

 
http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


68 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 

 
feature extraction. Classification, namely deviating the label from the classification image. After going 

through the feature extraction and classification processes, the data from the image can be processed 

using each of the above methods. 

The application of deep learning methods can be one solution to the problem of fish image 

classification. Convolutional Neural Networks (CNN) can solve problems related to fish 

classification, according to the research of Alshdaifat et al. [13] and Cui et al. [14]. In the case of fish 

image classification, using learning methods that utilize pre-trained models, or transfer learning, is 

also more efficient than building deep-learning architectural models from scratch [15]. Classification 

methods on fish images are beneficial for researchers in terms of speed to identify fish [6][16][17]. 

The pre-trained architecture of the Mobilenet model is reliable for image recognition.  

Mobilenet V2 is efficient because it can be inserted into mobile or other vision devices [18]. In the 

Mobilenet V1 architecture model, using a convolution layer type called depthwise separable 

convolution makes the computing process on the mobilenet V1 architecture faster than the traditional 

CNN architecture. The Mobilenet V2 [19] got an update on the following architecture, using Inverted 

Residual and Linear Bottleneck on the convolution layer in the Mobilenet V2 architecture model. 

Models with good performance will undoubtedly depend on optimal hyper-parameters, which will 

directly affect the performance/performance of the model, so the selection of hyper-parameters 

becomes very important [11]. One of the hyper-parameters used is learning rate and batch size. The 

learning rate is a hyper-parameter that controls how fast and slow the learning of the neural network 

model is to solve problems [19][20]. So there is an update on optimizing an adaptive learning rate that 

can gradually change to obtain a global minimum [21]. Batch size is a hyper-parameter that controls 

the accuracy of the estimated gradient error when learning the neural network and controls the speed 

and stability of the neural network's learning process [22]. 

Experts need to maintain the diversity of fish species and want to make it easier to classify fish 

species in the field of conservation. Researchers decided to solve the problems of the experts. 

Researchers who studied the method from previous research in this study will use the architecture of 

Mobilenet V2 by combining optimization techniques, namely adaptive learning rate. It is hoped that 

using the Mobilenet V2 architecture with an adaptive learning rate carried out by researchers can 

relieve and help experts at the Fisheries and Marine Research and Development Agency. 

II. Methods 

A. Dataset Searching 

The dataset is obtained on the Fish4Knowledge website and a European foundation formed for 

water conservation [23]. From Table 1, we can know the distribution of fish images. The data is in the 

form of video recordings with a complete recording of 87.000 hours with a total of 524.000 recordings. 

Table 1. Quantity distribution of image 

ID Species Data Training (80%) Testing (20%) 

 01.   Abudefduf vaigiensis  403 322 81 

 02.   Acanthurus nigrofuscus  2729 2183 546 

 03.   Amphiprion clarkii  7034 5627 1407 

 04.   Chaetodon lunulatus  5028 4022 1006 

 05.   Chaetodon trifascialis  565 452 113 

 06.    Chromis chrysura  7186 5748 1438 

 07.   Dascyllus aruanus  738 590 148 

 08.   Dascyllus reticulatus  15.308 12246 3062 

 09.  Hemigymnus fasciatus  238 190 48 

 10.  Hemigymnus melapterus  189 151 38 

 11.  Lutjanus fulvus  206 164 42 

 12.  Myripristis kuntee  3454 2763 691 

 13.  Neoglyphidodon nigroris  145 116 29 

 14.  Neoniphon sammara  299 239 60 

 15.   Pempheris Vanicolensis  78 62 16 

 16.  Plectroglyphidodon dickii 5139 4111 1028 

 17.  Pomacentrus moluccensis 181 14 37 

 18.  Zebrasoma scopas 361 288 73 

 
 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 69 

 
In this study, the experts tried to do the image by cropping the screen-captured image on the videotape. 

So get an image dataset of various sizes.  

Dataset transformation, the initial process before the data is entered to train the architectural model 

used, is through several transformation stages. The images in this data have different dimensions. As 

in Figure 1, the image dimensions are 36x36 pixels. This study uses the pre-trained model to resize 

the image to 224 × 224 pixels. 

 
Fig. 1. Species of fish type abudefduf vaigiensis 

The data train transformation transforms the fish image on the training data, including random 

rotation, random resize crop, random horizontal flip, tensor (converting to tensor data), and data 

normalization. Random Rotation 10°, sets random rotation between left or right with a predetermined 

degree of inclination. The second setting, Random Resize Crop (1-0.8 scale), randomly changes size 

with cutting with a predetermined scale between 1 and 0.8. Random Horizontal Flip rotates the fish 

images horizontally randomly. The next step converts previous data into tensor data (PyTorch). The 

last step, data normalization, used for transforming training data, is normalizing tensor data according 

to data normalization in the Mobilenet V2 architectural model. Mean = [0.485, 0.456, 0.406] and 

standard deviation = [0.229, 0.224, 0.225] on each image channel which has three channels (RGB). 

The data test transformation for the fish image on the test data includes random resize crop, 

random center crop, tensor (convert to tensor data), and data normalization. In the image, 

transformation test data is not done flipping because as much as possible to approach the image 

according to the original image. Resize process changes the image size to 230x230 pixels because 

the fish image test data have different sizes. The Second is Center Crop to change the resized image 

to 230x230 pixels and then crop it in the center to 224x224 pixels. The next is Convert previous data 

into tensor data (PyTorch). The last step for transforming training data is normalizing tensor data 

according to data normalization in the Mobilenet V2 architectural model. Mean = [0.485, 0.456, 

0.406] and standard deviation = [0.229, 0.224, 0.225] on each image channel which has three 

channels (RGB). 

The Image Structuring phase transforms the image data into a tabular or tabular dataset by 

extracting / flattening and featuring each image pixel, as shown in Figure 2. The Figure extracted 

each pixel of the image to get the feature data of the image data. Label the fish image using the folder 

name of the extracted image as a record. Figure 3 shows Feature images and labels from images, 

12289 features of a 3-channel image with a size of 64 × 64 pixels. 

 
Fig. 2 Flowchart to create metadata for machine learning 


70 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 

 
Fig. 3 Feature images and labels from images 

B. Architecture Configuration 

The model architecture used in this research is Mobilenet V2 (Sandler et al., 2018), with 

modifications to the previous classification layer to classify 1000 types of images. The architectural 

model of Mobilenet V2 consists of a complete convolution layer with 32 filters and 19 residual 

bottleneck layers. Modifying the classification layer is changing to a fully connected layer with an 

input layer of 1280 and an output layer of 18. The architecture of Mobilenet V2 follows Table 2. 

input is the initial image size before entering into the convolution process, the operator is a simple 

name of the convolution layer, t is the expansion factor, c is output, n is repeated times of 

convolutions layer, s are strides. 

From the Mobilenet V2 architecture table above, it can be simplified further in the Figure below 

by describing all parts of the Inverted Residual Block as one bottleneck, which will result in feature 

extraction, and the final layer, there is a classification layer. This research uses transfer learning 

architecture with an adaptive learning rate, and the architectural model of transfer learning that will 

be used is the pre-trained model from Mobilenet V2. Figure 4 is a simple description of the 

architecture above. 

 
Fig. 4 Simple architectural model in this research 

The steps in modeling the Mobilenet V2 architecture are the Input layer, the first layer in the 

architectural model as an input layer, and a fish image that has gone through image data pre-

processing. A bottleneck is a simple arrangement described in the architectural model in which 

various layers comprise the V2 Mobilenet architectural model. 19 layers comprise the bottleneck. 

One is the depthwise convolution layer and pointwise convolution layer using skip connection in 

each layer. Flatten layer changes the results of the feature map in the previous layer into features that 

can later be processed on the neural network. The last layer is used for the classification process to 

determine the class the processed image belongs to. 

 
Table 2 Architecture Mobilenet V2 

input Operator t c n s 

2242 x 3 conv2d - 32 1 2 

1122 x 32 bottleneck 1 16 1 1 

1122 x 16 bottleneck 6 24 2 2 

562 x 24 bottleneck 6 32 3 2 

282 x 32 bottleneck 6 64 4 2 

142 x 64 bottleneck 6 96 3 1 

142 x 94 bottleneck 6 160 3 2 

72 x 160 bottleneck 6 320 1 1 

72 x 320 conv2d 1x1 - 1280 1 1 

72 x 1280 avrgpool 7x7 - - 1 - 

12 x 1280 conv2d 1x1 - k -  

 
 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 71 

 
C. Optimizer (AdamW) 

The AdamW optimizer is an Adam optimization combined with L2 regularization and weight 

decay [21], while the Adam Optimizer is an optimization algorithm that replaces Stochastic gradient 

descent in the deep learning model training stage. Adam's optimization represents the best properties 

of other optimization algorithms, such as AdaGrad and RMSProp, which have the advantage of an 

adaptive learning rate. AdamW algorithm, using hyperparameter α=0.001, β1=0.9, β2=0.999, ε= 10^
-

8, λ ∈ R. Hyperparameters are pre-set, and parameters t ←0, the first-moment vector is initialized to 
the value of 0 (mt  ←0), The second-moment vector is also initialized to the value of 0 (vt  ←0) and 

the schedule multiplier parameter is set to zero (ηt  ←0 ∈ R). n the AdamW algorithm, the parameter 
t will increase as the number of iterations increases, as in (1). 

𝑡 ← 𝑡 + 1           (1) 

Then add the derivative formula of gradient loss to weight in (2).  

𝑔𝑡  ←
𝜕𝐿𝑡

𝜕𝑊𝑖,𝑡
+  𝜆

𝜕𝐿𝑡

𝜕𝑊𝑖,𝑡
 (𝑖𝑔𝑛𝑜𝑟𝑒 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑖)     (2) 

Next, the first step is a formula similar to the momentum mt in (3), and the second step is the same 

as RMSProp vt in (4). 

𝑚𝑖,𝑡 =  𝛽1𝑚𝑖−1,𝑡 + (1 − 𝛽1)𝑔𝑡         (3) 

𝑣𝑖,𝑡 =  𝛽2𝑚𝑖−1,𝑡 + (1 − 𝛽2)(𝑔𝑡 )
2        (4) 

Of course, the AdamW algorithm still has a technique to do bias correction by adding a formula 

to escape the value of 𝑚𝑡  𝑎𝑛𝑑 𝑣𝑡, being unbiased to 0 or close to 0. The following explains if, in the 
first iteration (𝑡 = 1), the momentum and RMSProp values are given a 0. So there is an additional 

formula in the next step to avoid bias in the initial iteration, that is �̂�𝑖,𝑡 (m hat, m hat, as momentum 

in (5)) dan �̂�𝑖,𝑡 (v hat, as RMSProp in (6)). 

�̂�𝑖,𝑡 =  
𝑚𝑖,𝑡

1−𝛽1
𝑡            (5) 

�̂�𝑖,𝑡 =  
𝑣𝑖,𝑡

1−𝛽2
𝑡            (6) 

Then (7), as an update of the weight on AdamW, the new weight equals the old weight subtracted 

from the multiplication of the coefficient 𝜂 with 𝛼�̂�𝑖,𝑡  divided by √�̂�𝑖,𝑡 +  𝜀 then added 𝜆𝑔𝒕. 

𝑊𝑖,𝑡 =  𝑊𝑖−1,𝑡 + 𝜂𝑡 (
𝛼�̂�𝑖,𝑡

√�̂�𝑖,𝑡+ 𝜀
) 𝜆𝑔𝑡         (7) 

III. Results and Discussion 

A. Learning Rate Testing 

The recommended learning rate from the testing process is between 0.1 𝑡𝑜 1𝑒 −6, and 1.74𝑒−3 is 
obtained. The learning rate is an essential component that must be considered if the learning rate is 

too large, then we will not reach a minimal global loss, but on the contrary, if the learning rate is too 

low, it will take too long to reach the global minimum and even get stuck in the local minimum. 

Figure 5 is the recommended learning rate for use in the architectural model of learning transfer 

modification because, according to Smith's research [24], the ideal learning rate is neither too large 

nor too small. The value of the learning rate of 1.74𝑒 −3 is the better choice in this study. 

 
72 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 

 
Fig. 5. Suggested learning rate 

B. Batch Size Testing 

In the Batch size test results, researchers get different test results on train costs, test fees, train 

scores, test scores, and the number of epochs. The result is shown in Table 3 and Table 4. 

The phase 1 (adaptation) batch size test results in Table 3 show that the small batch size value 

affects the test cost and test score because, with at least training data in 1 iteration, it will affect the 

results. Meanwhile, for large batch sizes (256), more training data will be obtained in 1 iteration, the 

excellent test cost and test scores evidence this. The model's number of epochs on a batch size 64 is 

stuck at the local minimum. 

The results of the batch size test in phase 2 in Table 4, the value of train cost, test cost, train score, 

and test score, get good results. The most striking change is in the epoch section, which decreases in 

large batch sizes. 

From testing phase 1 of adaptation and phase 2, the value of the size of the learning rate and early 

stopping is efficient with the accuracy value obtained. As in phase 1 adaptation with learning rate = 

0.001, indeed, with a small batch size value will get a small accuracy value as well because it is 

affected by early stopping, which will stop the training process when the accuracy value is not 

increased [20]. 

C. Performance of the model modified transfer learning 

Performance testing using the architectural model of the transfer learning modification in phase 

1 and phase 2, the result will be shown in Table 5 and Table 7. Table 5 shows the highest accuracy 

value of 93.85% with two early stops. In this phase, the values obtained in training and validation 

are not much different, neither overfit nor underfit. Table 5 can be visualized as a graph, as shown in 

Figure 6. 

Table 2. Phase 1 (adaptation) batch size test results 

Batch Size Train_cost Test_cost Train_score Test_score Epoch 

64 0.3885 0.4139 0.8735 0.8708 7 
128 0.1759 0.1830 0.9402 0.9407 17 
256 0.2251 0.2341 0.9263 0.9270 8 

 
Table 3. Phase 2 batch size test results 

Batch Size Train_cost Test_cost Train_score Test_score Epoch 

64 0.0052 0.0287 0.9985 0.9921 48 
128 0.0018 0.0167 0.9994 0.9947 46 
256 0.0038 0.0204 0.9994 0.9938 36 

 
 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 73 

 
In Figure 6, the accuracy value of training and validation, when viewed from Table 5 there is no 

significant difference, but if seen in Figure 6, it is evident because of the effect of several parameters 

that have been prepared above, such as early stopping and learning rate. 

 
Fig. 6. Phase 1 accuracy and loss graph 

This research section will discuss the performance of the prediction model that the researcher uses. 

A confusion matrix, one of which is used in the prediction model in supervised learning. The function 

of the confusion matrix is one of the benchmarks for evaluating the supervision learning model, 

namely by calculating accuracy, precision, sensitivity/recall. From the results of the prediction, values 

obtained a confusion matrix in phase 1 (adaptation) as shown in Figure 7 and the calculations for 

Accuracy, precision, sensitivity/recall in the confusion matrix phase 1 (adaptation). The above 

confusion matrix calculation is described in tabular form according to Table 6. 

Table 4. Accuracy and loss values in phase 1 (adaptation) 

Epoch 
Training Validasi 

Avg accuracy Avg loss Avg accuracy Avg loss 

1 89.28 0.36 89.16 0.38 
2 90.97 0.29 91.05 0.29 
3 92.31 0.25 91.60 0.27 
4 92.87 0.23 92.05 0.25 
5 92.88 0.22 92.33 0.24 
6 93.52 0.20 92.69 0.23 
7 93.87 0.19 93.32 0.21 
8 93.94 0.19 93.16 0.22 
9 93.98 0.18 93.49 0.20 
10 93.81 0.19 92.94 0.21 
11 94.42 0.17 93.64 0.20 
12 94.38 0.17 93.85 0.19 
13 94.39 0.17 93.24 0.20 
14 94.64 0.16 93.62 0.20 

 
Table 5 Classification report phase 1 

Class precision recall f1-score 

Abudefduf vaigiensis 0,929 1 0,963 
Acanthurus nigrofuscus 0,808 0,886 0,845 

Amphiprion clarkii 0,981 0,987 0,983 

Chaetodon lunulatus 0,967 0,995 0,98 

Chaetodon trifascialis 0,915 0,788 0,846 
Chromis chrysura 0,92 0,959 0,939 

Dascyllus aruanus 0,936 0,993 0,963 

Dascyllus reticulatus 0,956 0,921 0,938 

Hemigymnus fasciatus 1 0,916 0,956 
Hemigymnus melapterus 0,906 0,763 0,829 

Lutjanus fulvus 1 0,976 0,987 

Myripristis kuntee 0,959 0,9 0,928 

Neoglyphidodon nigroris 0,733 0,379 0,499 
Neoniphon sammara 1 1 1 

Pempheris Vanicolensis 1 0,75 0,857 

Plectroglyphidodon dickii 0,891 0,906 0,898 

Pomacentrus moluccensis 1 1 1 
Zebrasoma scopas 0,6 0,739 0,662 

Average 0,916 0,881 0,892 

 
74 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 

 
Fig. 7. Confusion matrix in phase 1 (adaptation) 

In Table 7, the highest accuracy value is 99.54375%, and the training accuracy value touches the 

value of 99.92389% in two early stops, so the values obtained in training and validation are 

overfitting at this stage. However, Overfitting does not make a big difference from Table 7, you can 

visualize it in the form of a graph as shown in Figure 8. 

 
Table 6. Accuracy and loss values in phase 2 

Epoch 
Training Validasi 

Avg accuracy Avg loss Avg accuracy Avg loss 

1 90.27348 0.42452 89.87 0.44 
2 90.21006 0.42567 89.87124 0.44435 
3 94.33254 0.23217 93.69 0.25 
4 94.22853 0.23249 93.69360 0.25335 
5 96.58278 0.14528 96.04 0.17 
… … … …. … 
27 99.91374 0.00455 99.54 0.02 
28 99.92389 0.00455 99.54375 0.01831 
29 99.93911 0.00417 99.49 0.02 
30 99.89852 0.00456 99.49305 0.01938 
31 99.93911 0.00370 99.52 0.02 
32 99.94672 0.00359 99.52347 0.01674 
33 99.94926 0.00339 99.52 0.02 

 
 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 75 

 
Fig. 8. Phase 2 accuracy and loss graph 

 In Figure 8, the training and validation accuracy values are in Table 7 show a smooth graph until 

the difference is less significant, as small learning rates can reach the global minimum. Figure 9 

shows Phase 2, where the performance of the fish image classification system in the transfer learning 

process is analyzed using the  Mobilenet V2 architecture. The modified transfer learning architecture 

model has improved performance, decreased FN and FP values, and increased TP values. Following 

are the calculations for Accuracy, precision, sensitivity/recall in the phase 2 confusion matrix. The 

above confusion matrix at Figure 9 calculation is described in tabular form according to Table 8. 

 
Fig. 9. Confusion matrix in phase 2 


76 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 

 
D. Testing With Other AI Models 

In this section, researchers compare machine learning and deep learning models. That is by using 

Traditional CNN, which has five convolution blocks and two hidden layer blocks with softmax 

function activation. The Convolution block has 3x3 layer filters, with stride = 1, padding = 1, function 

activation = ReLU, and type pooling = max pool.  

From Table 9, the Modified transfer learning model gives the best results have some reason. Use 

a pre-trained architectural model (which has been trained previously). The data trained on the 

previous architecture and the data used by the researcher are not too different because the pre-trained 

model architecture used has been trained on 1000 different types of images. Traditional CNNs are 

computationally faster to train images than the transfer learning modifications that use the inverted 

residual layer, although they differ slightly from the transfer learning modifications used by 

researchers. Machine learning models from KNN, Random Forest, and XGBoost did not achieve 

accuracy values over 90%, but machine learning models were already suitable for classifying fish 

images. However, the data structuring process from image data / unstructured data to 

tabular/structured data still takes much time. 

IV. Conclusion 

This study aims to classify fish images and use transfer learning modifications to determine the 

best performance. Using a pre-trained model from Mobilenet, you can modify the classification layer 

to provide modified transfer learning results. Traditional CNNs can be used to classify fish images, 

but the design of hidden layers is time-consuming and requires much computation. Therefore, you 

can use modified transfer learning to solve the problem. The modified transfer learning performance 

and confusion matrix test results are excellent. When testing Phase 1, accuracy rating = 0.8751; 

accuracy value = 0.9355; recall / sensitivity value = 0.93055. In Phase 2 testing, accuracy value = 

0.9895; accuracy value = 0.9947; recall / sensitivity value = 0.9947. Based on the study's results, we 

can conclude that modified transfer learning can be the best model. 

Table 7 Classification report phase 2 

Class Precision Recall F1-score 

Abudefduf vaigiensis 1 1 1 

Acanthurus nigrofuscus 0,973 0,99 0,981 

Amphiprion clarkii 0,997 1 0,998 
Chaetodon lunulatus 0,998 0,9 0,946 

Chaetodon trifascialis 0,9 1 0,947 

Chromis chrysura 0,997 0,998 0,997 

Dascyllus aruanus 1 1 1 
Dascyllus reticulatus 0,996 0,994 0,994 

Hemigymnus fasciatus 1 0,979 0,989 

Hemigymnus melapterus 0,947 0,947 0,947 

Lutjanus fulvus 0,976 1 0,987 
Myripristis kuntee 0,997 0,988 0,922 

Neoglyphidodon nigroris 0,933 0,965 0,948 

Neoniphon sammara 1 1 1 

Pempheris Vanicolensis 1 0,875 0,933 
Plectroglyphidodon dickii 0,993 0,933 0,933 

Pomacentrus moluccensis 1 1 1 

Zebrasoma scopas 0,985 0,931 0,957 

Average 0,982 0,972 0,971 

 
Table 8. Benchmarking table with machine learning model 

No Method Accuracy 

1 Modified Transfer Learning 99,64% 

2 Traditional CNN 98,58% 
3 KNN 85,5% 

4 Random Forest 81,63% 

5 XGBoost 86,55% 

 
 R. Suhana et al. / Knowledge Engineering and Data Science 2022, 5 (1): 67–77 77 

 
Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.  

Conflict of interest  

The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence 

the work reported in this paper.  

Additional information  

Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. 

Publisher’s Note: Department of Electrical Engineering - Universitas Negeri Malang remains neutral with regard to 

jurisdictional claims and institutional affiliations. 

References 

[1] J. P. Schulze Rojas, “Reef front heterogeneity analysis and coral genera diversity pattern in the Bunaken National 
Park, Indonesia.” 2010. 

[2] I. Asaad, C. J. Lundquist, M. V. Erdmann, and M. J. Costello, “Delineating priority areas for marine biodiversity 
conservation in the Coral Triangle,” Biol. Conserv., vol. 222, pp. 198–211, Jun. 2018. 

[3] E. Yuliana, I. Farida, Nurhasanah, M. Boer, A. Fahrudin, and M. M. Kamal, “Habitat quality and reef fish resources 
potential in Karimunjawa National Park, Indonesia,” AACL Bioflux, vol. 13, no. 4, pp. 1836–1848, 2020. 

[4] I. Cáceres, E. C. Ibarra-García, M. Ortiz, M. Ayón-Parente, and F. A. Rodríguez-Zaragoza, “Effect of fisheries and 
benthic habitat on the ecological and functional diversity of fish at the Cayos Cochinos coral reefs (Honduras),” Mar. 

Biodivers., vol. 50, no. 1, p. 9, Feb. 2020. 

[5] B. J. Boom et al., “Long-term underwater camera surveillance for monitoring and analysis of fish populations,” Work. 
Vis. Obs. Anal. Anim. Insect Behav. (VAIB), conjunction with ICPR 2012, no. August 2015, pp. 2–5, 2012. 

[6] S. Villon et al., “A Deep learning method for accurate and fast identification of coral reef fishes in underwater images,” 
Ecol. Inform., vol. 48, no. August, pp. 238–244, 2018. 

[7] S. Winiarti, F. I. Indikawati, A. Oktaviana, and H. Yuliansyah, “Consumable Fish Classification Using k -Nearest 
Neighbor,” IOP Conf. Ser. Mater. Sci. Eng., vol. 821, no. 1, p. 012039, Apr. 2020. 

[8] Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random 
Forests and Survival Analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial 

Intelligence and Lecture Notes in Bioinformatics), vol. 12343 LNCS, 2020, pp. 503–515. 
[9] Y. C. Chang, K. H. Chang, and G. J. Wu, “Application of eXtreme gradient boosting trees in the construction of credit 

risk assessment models for financial institutions,” Appl. Soft Comput. J., vol. 73, pp. 914–920, 2018. 

[10] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. 
Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016. 

[11] W. Jiao, X. Hao, and C. Qin, “The Image Classification Method with CNN-XGBoost Model Based on Adaptive 
Particle Swarm Optimization,” Information, vol. 12, no. 4, p. 156, Apr. 2021. 

[12] J. Brownlee, XGBoost With Python Gradient Boosted Trees With XGBoost and Scikit-learn. 2018. 
[13] N. F. F. Alshdaifat, A. Z. Talib, and M. A. Osman, “Improved deep learning framework for fish segmentation in 

underwater videos,” Ecol. Inform., vol. 59, no. May, p. 101121, 2020. 

[14] S. Cui, Y. Zhou, Y. Wang, and L. Zhai, “Fish Detection Using Deep Learning,” Appl. Comput. Intell. Soft Comput., 
vol. 2020, 2020. 

[15] B. S. Rekha, G. N. Srinivasan, S. K. Reddy, D. Kakwani, and N. Bhattad, Fish detection and classification using 
convolutional neural networks, vol. 1108 AISC, no. July. Springer International Publishing, 2020. 

[16] F. Kratzert and H. Mader, “Fish species classification in underwater video monitoring using Convolutional Neural 
Networks,” 2018. 

[17] D. Li, Z. Wang, S. Wu, Z. Miao, L. Du, and Y. Duan, “Automatic recognition methods of fish feeding behavior in 
aquaculture: A review,” Aquaculture, vol. 528, p. 735508, 2020. 

[18] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” 2017. 
[19] J. Brownlee, Better Deep Learning. Train Faster, Reduce Overfitting, and Make Better Predictions, vol. 1.3, no. 0. 

2019. 
[20] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning: Machine Learning Book. 2016. 
[21] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 7th Int. Conf. Learn. Represent. ICLR 2019, 

2019. 

[22] D. Masters and C. Luschi, “Revisiting Small Batch Training for Deep Neural Networks,” pp. 1–18, 2018. 
[23] B. J. Boom, P. X. Huang, J. He, and R. B. Fisher, “Supporting ground-truth annotation of image datasets using 

clustering,” Proc. - Int. Conf. Pattern Recognit., no. January, pp. 1542–1545, 2012. 

[24] L. N. Smith, “Cyclical learning rates for training neural networks,” Proc. - 2017 IEEE Winter Conf. Appl. Comput. 
Vision, WACV 2017, no. April, pp. 464–472, 2017. 

http://journal2.um.ac.id/index.php/keds
https://purl.utwente.nl/essays/90739
https://purl.utwente.nl/essays/90739
https://doi.org/10.1016/j.biocon.2018.03.037
https://doi.org/10.1016/j.biocon.2018.03.037
https://www.cabdirect.org/cabdirect/abstract/20203458419
https://www.cabdirect.org/cabdirect/abstract/20203458419
https://doi.org/10.1007/s12526-019-01024-z
https://doi.org/10.1007/s12526-019-01024-z
https://doi.org/10.1007/s12526-019-01024-z
https://homepages.inf.ed.ac.uk/rbf/VAIB12PAPERS/boom.pdf
https://homepages.inf.ed.ac.uk/rbf/VAIB12PAPERS/boom.pdf
https://doi.org/10.1016/j.ecoinf.2018.09.007
https://doi.org/10.1016/j.ecoinf.2018.09.007
https://iopscience.iop.org/article/10.1088/1757-899X/821/1/012039
https://iopscience.iop.org/article/10.1088/1757-899X/821/1/012039
https://doi.org/10.1007/978-3-030-62008-0_35
https://doi.org/10.1007/978-3-030-62008-0_35
https://doi.org/10.1007/978-3-030-62008-0_35
https://doi.org/10.1016/j.asoc.2018.09.029
https://doi.org/10.1016/j.asoc.2018.09.029
https://doi.org/10.1145/2939672.2939785
https://doi.org/10.1145/2939672.2939785
https://doi.org/10.3390/info12040156
https://doi.org/10.3390/info12040156
https://machinelearningmastery.com/xgboost-with-python/
https://doi.org/10.1016/j.ecoinf.2020.101121
https://doi.org/10.1016/j.ecoinf.2020.101121
https://doi.org/10.1155/2020/3738108
https://doi.org/10.1155/2020/3738108
https://doi.org/10.1007/978-3-030-37218-7_128
https://doi.org/10.1007/978-3-030-37218-7_128
https://doi.org/10.31223/osf.io/dxwtz
https://doi.org/10.31223/osf.io/dxwtz
https://doi.org/10.1016/j.aquaculture.2020.735508
https://doi.org/10.1016/j.aquaculture.2020.735508
https://doi.org/10.48550/arXiv.1704.04861
https://machinelearningmastery.com/better-deep-learning/
https://machinelearningmastery.com/better-deep-learning/
https://www.deeplearningbook.org/
https://openreview.net/forum?id=rylV-2C9KQ
https://openreview.net/forum?id=rylV-2C9KQ
https://doi.org/10.48550/arXiv.1804.07612
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6460437
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6460437
https://doi.org/10.1109/WACV.2017.58
https://doi.org/10.1109/WACV.2017.58