Visual Analytics of Tuberculosis Detection Rat Performance 

  
1 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

Visual Analytics of Tuberculosis Detection Rat Performance 

Joan Jonathan1, Camilius Sanga2, Magesa Mwita1, Georgies Mgode3, 4 

1 Centre for Information and Communication Technology, Sokoine University of Agriculture, P.O Box 

3218, Chuo Kikuu, Morogoro, Tanzania 

 
2Sokoine National Agricultural Library (SNAL), Sokoine University of Agriculture, P.O.Box 3022, 

Morogoro, Tanzania 

 
3Pest Management Centre, Sokoine University of Agriculture, P.O Box 3110, Chuo Kikuu, Morogoro, 

Tanzania 

 
4APOPO TB Project, Sokoine University of Agriculture, Morogoro, Tanzania 

 
Abstract 

The diagnosis of tuberculosis (TB) disease remains a global challenge, and the need for 

innovative diagnostic approaches is inevitable. Trained African giant pouched rats are the 

scent TB detection technology for operational research. The adoption of this technology is 

beneficial to countries with a high TB burden due to its cost-effectiveness and speed than 

microscopy. However, rats with some factors perform better. Thus, more insights on factors 

that may affect performance is important to increase rats’  TB detection performance. This 

paper intends to provide understanding on the factors that influence rats TB detection 

performance using visual analytics approach. Visual analytics provide insight of data through 

the combination of computational predictive models and interactive visualizations. Three 

algorithms such as Decision tree, Random Forest and Naive Bayes were used to predict the 

factors that influence rats TB detection performance. Hence, our study found that age is the 

most significant factor, and rats of ages between 3.1 to 6 years portrayed potentiality. The 

algorithms were validated using the same test data to check their prediction accuracy. The 

accuracy check showed that the random forest outperforms with an accuracy of 78.82% than 

the two. However, their accuracies difference is small. The study findings may help rats TB 

trainers, researchers in rats TB and Information systems, and decision makers to improve 

detection performance. This study recommends further research that incorporates gender 

factors and a large sample size. 

Keywords: Data mining in healthcare, African giant pouched rats, Classification Technique 

in Tuberculosis diagnosis 

*Correspondence: Joan Jonathan (joanjonathan@sua.ac.tz) 

DOI: 10.5210/ojphi.v13i2.11465 

Copyright ©2021 the author(s) 

This is an Open Access article. Authors own copyright of their articles appearing in the Journal of Public Health Informatics. 
Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in 
the copy and the copy is used for educational, not-for-profit purposes. 

mailto:joanjonathan@sua.ac.tz


Visual Analytics of Tuberculosis Detection Rat Performance 

  
2 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

1 Introduction 

Tuberculosis (TB) is one of the life-threatening infectious diseases causing death worldwide [1]. 

The WHO report [2] shows that 10 million people are infected with TB each year. Microscopy 

is the widely used TB diagnostic tool in developing countries despite its lower sensitivity [3, 4]. 

Nucleic acid-based test such as GeneXpert MTB/RIF is now in use with higher sensitivity and 

specificity than microscopy. However, its full roll-out and utility is limited to some areas. There 

is a need for new cheap and rapid diagnostic approaches to enhance TB case detection in 

countries with a high TB burden. Since 2007, Anti- Personnel Landmines Detection Product 

Development (APOPO) and Sokoine University of Agriculture (SUA) have been exploring the 

potential application of the trained African giant pouched rats (HeroRATS) for detection of 

pulmonary TB in sputum samples [7]. Trained rats retest heat inactivated sputum samples after 

smear microscope and other hospital tests to detect missed TB cases. The study conducted by 

Poling et al. [6] evaluated sputum 21,600 from Tanzanians and 9,048 from Mozambicans which 

was already screened by the microscope. However, after the evaluation by rats it was revealed 

that there were 1,412 new patients with active TB in Tanzania and 645 new patients in 

Mozambique. The new detected cases increase the detection rate by 39% in Tanzania and 53% 

in Mozambique when compared to smear microscopy, the standard diagnostic for TB. 

Furthermore, trained rats increase pediatric TB detection by 68% as the additional of 23 children 

patients who tested TB positive from 982 children sputum samples [7]. The endorsed 

conventional tests such as concentrated smear microscopy offer higher sensitivity than the direct 

microscopy and thus are used to confirm detection rat results before patients start treatment [1]. 

These scent detection rats detect the specific volatile organic compounds produced by 

Mycobacterium tuberculosis bacterium that causes TB [5]. 

The usefulness of this scent detection technology is also due to the rats’ rapid diagnostic speed 

in which rats can test up to 100 samples in 20 minutes that will take a laboratory technician about 

four days when using the microscopy to examine the recommended 30 samples per day [3]. The 

detection performance of trained rats may depend on rats’ characteristics, which include age, 

sex, time of day, bacteria count, and weight [8]. The study conducted by Ellis et al. [3] identified 

that older rats do better than less old rats also time of day of training influences the detection 

performance. However, there were no significant differences between male and female detection 

performance. In another study of Mgode et al. [7] rats can detect TB in samples with a lower 

number of bacteria count likely to be missed by microscope. Based on the experience older rats 

and weighty rats have low detection performance. And as such, there is no empirical evidence 

on the main influencing factors, and the trend of their impact is not clear. Therefore, this study 

intends to use data mining techniques to predict factors that influence TB detection rats’ 

performance. The scent detection technology at APOPO produces massive data that need an in-

depth look to obtain insights on various valuable information using data mining techniques. Data 

mining is a useful field for discovering interesting patterns and information from 

multidimensional data. In healthcare, data mining techniques such as classification, clustering, 

and association are most used to solve health problems [9]. Most of the studies [10, 11] used 

classification technique in the diagnosis of tuberculosis to categorize and find the relationships 

among the manipulated variables. Furthermore, the study of Asha et al. [12] propose that the 

classification technique helps the health sectors to have better decision toward their operations. 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
3 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

1.1 Objective of the study 

The objective of the study was to use Data Mining techniques to predict factors associated with 

TB detection performance of the rats. The goal is to provide a deeper understanding of the main 

factors influencing detection performance as well support decision making, improving human 

health, and scaling up of the detection technology. To further contribute to this body of 

knowledge the study focused on the following three different hypotheses: 

Null hypothesis: 

i) There is no measurable accuracy difference between the three algorithms of a 

classification technique in predicting the factors associated with TB detection 

performance in rats 

ii) There is no measurable difference between different predicted factors of rats that affect 

TB detection performance 

iii) There is no measurable difference between the ability of rats in TB detection performance 

Alternate hypothesis: 

i) There is a measurable accuracy difference between the three algorithms of a classification 

technique in predicting the factors associated with TB detection performance in rats 

ii) There is a measurable difference between different predicted factors of rats that affect 

TB detection performance 

iii) There is a measurable difference between the ability of rats in TB detection performance 

2 Methods 

This study used the Cross-Industry Standard Process for Data Mining (CRISP-DM) as an 

analytical framework for knowledge discovery. CRISP-DM involves a systematic and organized 

approach in the data mining process [9]. CRISP-DM consists of six phases, namely: (1) Business 

Understanding (2) Data Understanding (3) Data Preparation (4) Model Building (5) Testing and 

Evaluation (6) Deployment. These phases are explained in detail underneath. 

2.1 Business understanding phase 

This phase dealt with what APOPO TB center needs from a business perspective. APOPO is a 

Belgian Non-Government Organization (NGO) based in Morogoro, Tanzania, which aims at 

using rat odor detection technology to solve humanitarian problems. Extracting knowledge of 

the application domain was useful to create an understanding of the aim, requirements, and 

constraints of the center. 

2.2 Data understanding phase 

This phase focused on the access, description, and identification of the relevant data from the 

APOPO TB center. The given rats’ detection performance data comprised of two datasets: 

Detection Rats Data and RAT_WEIGHT. Detection Rats Data dataset composed of 18 detection 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
4 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

performance variables (17 independent and 1 dependent) and 471,133 observations from 2011 

to 2019 years. Meanwhile, the RAT_WEIGHT dataset contained four (4) independent detection 

performance variables and 1438 records from 2012 to 2019. This dataset also contained the five 

female rats. However, the fifth rat had no corresponding detection performance variables and 

thus disqualified. Hence, this study used the four (4) female rats’ data from 2014 to 2018 years. 

Table 1 shows the two datasets with their respective variables. 

Table 1: Rats Datasets Description 

DetectionRatsDataDataset Description 

Number Variable name Data type Description Variable type 

1 DOTS_NAME String Name of the DOTS 

center 

Independent 

2 DOTS_PATIENTS_NUMBER Integer Number of patients 

from DOTS center 

Independent 

3 ENTRY_YEAR Integer Year when patient 

attend DOTS center 

Independent 

4 ID_SAMPLE Integer Identification of the 

sample 

Independent 

5 ID_BL_DOTS Integer Identification of 

bacteria level from 

DOTS center 

Independent 

6 HIT Boolean TB detection rat 

performance 

(categorical 

variable) 

Dependent 

7 ID_BL_APOPO Integer Identification of 

bacteria level from 

APOPO center 

Independent 

8 ID_CONFIGURATION Integer Identification of the 

cage during training 

Independent 

9 ID_BL_FM Integer Identification of 

bacteria level 

by fluorescence 

microscope 

Independent 

10 ID_EVALUATION_SESSION Integer Identification of 

evaluation 

session 

Independent 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
5 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

11 SESSION_DATE Date Date when a session 

performed 

Independent 

12 ID_RAT Integer Identification of the 

rat 

Independent 

13 RAT_NAME String Name of rat Independent 

14 GENDER String Sex of rat Independent 

15 Age Integer Age of rat Independent 

16 START_TIME DateTime Date and time when 

detection task 

started 

Independent 

17 END_TIME DateTime Date and time when 

detection task ended 

Independent 

18 DOB Date Date when rat was 

born 

Independent 

RAT_WEIGHT Dataset Description  

Number Variable name Data Type Description Variable type 

1 ID_RAT Integer Identification of rat Independent 

2 RAT_NAME String Name of rat Independent 

3 WEIGHT_DATE 

 
Date Date when weight 

of the rat was 

measured 

Independent 

4 WEIGHT Integer Weight of the rat Independent 

 
2.3 Data preparation 

Following the data understanding, this phase prepared the data into the well-formed data using 

the four main steps [9]. These steps were data consolidation, data cleaning, data transformation, 

and data reduction. 

Data consolidation step, data were accessed from APOPO TB center based in Morogoro, 

Tanzania. The data were integrated into a single file to ease the data mining process. Data 

cleaning step, irrelevant variables, and empty rows were removed to prevent inconsistencies and 

outliers. As a result, the prepared data had 365,843 observations from 471,133 observations. 

Data transformation step, the new three (3) detection performance variables were created. These 

variables are Rat_Av_Weight_Per_Year, Session_Start_Time, and Session_Completion_Time 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
6 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

as shown in Table 2. It is important to note that all variables were converted to the required data 

type. Data reduction step, the prepared observations were reduced from 365,843 to 200,000 to 

ease the analysis by using random sampling method. The study conducted by Czarnowski et al. 

[13] shows that data reduction focused on reducing the volume of dataset while maintaining the 

integrity of data since the reduced dataset has the same acceptable amount of information as the 

original dataset. The main four steps were governed by R-language and RStudio. RStudio is a 

data mining tool and an integrated development environment for R, is a free programming 

language with extensive modeling and quality graphs resources [14]. Considering the given four 

(4) and three (3) new created detection performance variables, seven (7) prepared detection 

performance variables and 200,000 observations were used for analysis as shown in Table 2. 

Where six (6) are independent variables and one (1) is the dependent variable. Table 2 and Table 

3 show description and descriptive statistical summary information of the variables used to build 

predictive models respectively. 

Table 2: Description of the variables used to build predictive models 

Variable Description Data type Variable type Values 

DOTS_Name Name of DOTS 

center 

Factor Independent DOTS centre 

name 

Rat_Name Name of rat Factor Independent Rat 1, Rat 2, 

Rat 3, Rat 4 

Rat_Age Age of rat in 

years 

Numeric Independent 0.79, 2.04, 

3.22 

Rat_Av_Weight_Per_Year Average weight 

of rat per year 

Numeric Independent 846.35, 

866.80 

Session_Start_Time Time of day when 

detection session 

started in 24 

hours 

Integer Independent 12,13,14 

Session_Completion_Time Differences in 

minutes between 

session start time 

and session end 

time 

Numeric Independent 1,2,3 

Performance Performance of 

rat during the 

session 

Factor Dependent TRUE, 

FALSE 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
7 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

Table 3: Descriptive statistical summary information of the independent variables used to 

build predictive models 

 Age Av_Weight_Per_Ye

ar (g) 

Session_Start_Time 

(hours) 

Session_Completion_Ti

me (in min) 

Min 0.79 843.7 8.00 1.0 

Max 7.95 1054.8 18.00 129.0 

Mean 3.83 899.4 12.16 10.5 

Median 3.71 866.8 12.00 10.0 

Range 7.16 211.16 10.00 128.0 

SD 1.72 84.44 1.67 4.89 

CI 0.0056 0.027 0.0041 0.016 

 
Table 3 depicts statistical summary information of the independent variables where the younger 

and older rats have ages of 0.79 and 7.95 years respectively with the mean, median, range, SD, 

and CI age of 3.83, 3.71, 7.16, 1.72, and 0.0056 years. Moreover, the rats’ lowest and highest 

average weight per year are 843.7g and 1054.8g respectively, with the mean, median, range, SD, 

and CI age of 899.4, 866.8g, 211.16kg, 84.44 kg, and 0.027 kg. Besides, the table shows that 

their lowest and highest session start time are 8:00 and 18:00 hours, with the mean, median, 

range, SD, and CI age of 12:16, 12:00, 10:00,1.67, and 0.0041. Furthermore, the minimum and 

maximum session completion time is 1 and 129 minutes, with the mean, median, range, SD, and 

CI age of 10.5, 10.0, 128.0, 4.89, and 0.016 minutes. Since the mean and median are not equal, 

it manifests that the data used for this analysis lack normal distribution. Table 4 shows number 

of named rats, and the associated observations by factor of interest. 

Table 4: Summary for number of rats, and the associated observations by factor of interest 

Rat_Name Gender Age Av_Weight_Per

_Year (g) 

Session_Start

_Time (hours) 

Session_Completion

_Time (min) 

No of 

observations 

Sofia F 0.7-

6.3 

846-877 11:00-14:00 4.0-14.0 50448 

Catia F 1-6 846-877 10:00-15:00 8.2-13.0 50271 

Happy F 1-8 844-1055 10:00-15:00 6.4-23.0 50035 

Mkuta F 1-6 844-1055 9:00-16:00 4.5-15.8 49246 

  
Visual Analytics of Tuberculosis Detection Rat Performance 

  
8 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

The data from Table 4 depicts four female rats used in the analysis with their observations 

completed during the detection tasks. Sofia completed many numbers of observations compared 

to all since it started detection tasks early to the age of 0.7 years. Besides, Mkuta has few numbers 

of observations whereas its large average weight of 1055g may have caused this performance 

compared to Catia from which both started and end detection tasks at the age of 1-6 years, 

respectively. Moreover, Happy is older and has a few observations than Sofia and Catia. The 

large average weight of 1055g and the removal of irrelevant data from 2011 to 2013 and 2019 

could have led this since there is a possibility that Happy had many observations in the irrelevant 

years. Furthermore, the table shows that there is sex inequality in the data given from the DOTS 

center, since all rats are female. Table 5 shows the dependent variable, and the associated 

observations used in this analysis. 

Table 5: Summary for dependent variable (Performance) and the number of observations 

detected 

Performance No_Samples 

FALSE 157686 

TRUE 42314 

 
From Table 5, there is performance inequality in the distribution of rats’ detected observations. 

TRUE observations are far less by 21.2% than FALSE of about 78.8% for all observations. And 

as such, this analysis used more FALSE than TRUE. 

It was also important to examine/measure the association of continuous independent variables 

with a dichotomous dependent variable (performance). Thus, the logistic regression analysis was 

used to describe data and explain the relationship between the dependent variable and 

independent variables as shown in Table 6. 

Table 6: Association between Dependent and Independent variables 

Variable Pr(>|z|) 

Age < 2e-16 

Session_Completion_Time < 2e-16 

Session_Start_Time 2.53e-09 

Av_Weight_Per_Year 2.98e-07 

 
With regard to the Table 6, the Pr (>|z|) column indicates the p-value corresponding to the z-

statistics. The p-values for the independent variables are below 0.05, and implies that there is a 

relationship between independent variables and the dichotomous dependent variable, and 

variables are statistically significant. However, the data assumption of normality was not 

achieved since the corresponding values are less than 0.05. The normality was examined by 

prediction analysis (logistic regression) using the Kolmogorov Smirnov (KS) test under the z test 

statistic. 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
9 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

2.4 Model building 

After obtaining the data with the required format, this phase used to select and apply the data 

mining technique and algorithms based on the nature of the data. This phase applied 

classification technique to build predictive models that assigned a class for each rat in the given 

data and predicted the factors that influence rats TB detection performance. Not only that but 

also the predictive models might be useful to place and predict the new instances (rats) with 

unknown labels into their respective classes. Classification is a supervised learning technique in 

data mining and machine learning that learn the relationship or patterns between independent 

variables (input) and the dependent variable (output) from the past data and classify each data 

item into a predefined class label. 

Before presenting the prepared data to the algorithms, R was entirely used to partition the data 

(200,000 observations) using a simple split estimation method. The simple split estimation is the 

most popular method, which divided two-thirds of the data (134,000 observations) in the training 

data and one-third in the test data (64,000 observations) [9] as shown in Table 7. Therefore, the 

training data was used to build a predictive model while the test data used to assess the predictive 

model classification accuracy. 

Table 7: Summary of a simple random data splitting 

Type of data Number of observations 

Training data 134000 

Testing data 66000 

 
The data from Table 7 indicate that this analysis consisted of 67% training data and 33% test 

data. The training data were given many observations to build the predictive model while test 

data were used only to assess the performance of the model generated. Despite many 

classification algorithms used for prediction, this study used Decision Tree, Random Forest, and 

Naïve Bayes for prediction. A decision tree algorithm is a supervised classification algorithm 

which generates the decision tree automatically by examining the weight of each variable used 

to the extent that each leaf node has the same class [11]. Also, it generates rules that are easy to 

interpret and understand [16]. The decision tree is a tree-shaped diagram comprises many input 

variables that may have an impact on classifying different patterns. Additionally, it is known as 

a decision support algorithm which depends on the input to show the possible outcomes [12]. 

The decision tree was generated by recursively dividing the training data until each division 

consisted of the variables of the same class or values based on conditions. Following this, a split 

point used in each node to test the manipulated variables and decide the way to divide the data. 

The split decision focused on the amount of information a computed variable offered in the class 

(information gain) and its randomness (entropy). As a result, the variable with the highest 

information gain and the lowest entropy split and tested. The information gain and entropy 

determined the decision on the split of data and construction of the decision tree. However, the 

growth of the decision tree influenced deep learning. Control on the parameters used to overcome 

this problem through pruning [9]. Pruning is the process of reducing the size of decision trees by 

removing sections of the tree that provide little power to classify instances. 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
10 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

The generated decision tree consists of a root node, branches, and leaf nodes. The root node is 

the node at the top of the tree which implies the most important factor responsible for classifying 

the observations. The branches represent the pattern classification outcome of a test using one of 

the variables based on conditions. The leaf nodes placed either before or at the end of the decision 

tree imply the nodes without children. And as such, they identify the last class choice for a 

pattern. Moreover, the decision tree formed rules (IF-THEN statements) from the root node to 

the leaf nodes which are easy to interpret and understand. As a result, they enhanced the 

discovery of exploratory knowledge on the factors that influence rats TB detection performance. 

Furthermore, the random forest algorithm was also applied to predict the influencing factors and 

compare the prediction accuracy of the models. 

A random forest algorithm is a supervised classification algorithm used to build multiple 

decision trees called forest in random during the training process. The choice of most of the trees 

determined the final decision of the algorithm based on the given manipulated variables. There 

is a relationship between the number of trees in the forest and the results. Thus, many trees, the 

more accurate the result. The motives behind this algorithm are that it can be used for both 

classification and regression problems and lowers the risk of overfitting [11]. Overfitting is a 

modeling error which occurs when the outcome of the analysis is limited only to specific data. 

As a result, instead of predicting the whole manipulated data, the model predicts only for that set 

of data [16]. In the random forest algorithm, the process of determining the root node and the 

splitting of the variable nodes were performed randomly from the training data. During the 

training, no control of parameter (pruning) involved preventing the decrease of the relationship 

between trees. However, pruning is of importance for the reduction of complexity in variable 

computation during the training. As a result, the algorithm handled about 500 trees in the 

ensemble and identified the error rate based on the training data. Following this, the random 

forest algorithm predicted the factors for detection by pinpointing the mean decrease in Gini 

values for each variable. Furthermore, the Naive Bayes algorithm was applied to compare their 

predictive accuracy by using the same test data, and finding the best algorithm with high 

classification accuracy rates for the given data. 

Naive Bayes is the supervised classification algorithm that uses a probability theory (Bayesian 

Theorem) to generate the classification model. Moreover, to place an instance in the desired 

class. This theory supported to calculate a set of probabilities by counting the frequency and 

values of the manipulated variables from the given data [11]. The Naive Bayes algorithm is a 

well-performed algorithm owing to its simplicity in execution time. And as such, it can build a 

final model that can learn rapidly different classification problems [17]. However, this algorithm 

assumed that all variables were independent of the given data while few real-world applications 

may agree with this [12]. The main advantages of the Naive Bayes algorithm compared to the 

other two algorithms are the run-time speed on large and complex datasets. Hence, most 

healthcare field researchers across the world use this algorithm due to its better speed and 

accuracy. This algorithm identified a priori probability for the dependent variable and 

conditional probabilities for every independent variable based on the manipulated data. The 

Naive Bayes algorithm does not show the weights of each variable included in the classification, 

but it has been used purposely to compare its prediction performance with the results generated 

from the decision tree and random forest algorithms [17]. 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
11 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

2.5 Testing and Evaluation 

This phase was used to test and assess the classification performance of the three generated 

predictive models. The assessment was based on the accuracy metric to show the predictive 

accuracy of the model from the confusion matrix. However, the confusion matrix has several 

assessment measures such as sensitivity and specificity. The confusion matrix is a table used to 

describe a classification model performance based on the test data. The accuracy measure was 

used to assess the ability of the models to accurately predict the class label of the test data. The 

accuracy entailed the matching between actual class labels of the test data and the class labels of 

the predicted models. The accuracy measurement focused on the accuracy rate, the percentage 

of test instances that were accurately classified by the predictive model as shown in Table 8. The 

accuracy (acc) and error rate (err) values of the classification matrix rated the predictive model 

performance. The error rate (err) implies the fraction of the sum of FALSE positives and FALSE 

negatives and the sum of the total number of all the predictions made. Table 8 presents the 

comparison of predictive models classification accuracy rate and error rate between training data 

and test data for all three algorithms. Since the predictive models learned to classify the rats TB 

detection performance into TRUE or FALSE, the positive class is a FALSE value since it has 

many observations of about 157,686 samples as reported in Table 5. Therefore, the following 

formulas measured the percentage accuracy rate and error rate respectively for the positive class. 

Accuracy (acc) = TP + TN 

TP + FP + TN + FN 

Error rate (err) = FP + FN 

TP + FP + TN + FN 

TP, TN, FP, FN mean True Positive, True Negative, False Positive and False Negative 

respectively. 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
12 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

Table 8: Comparison of predictive models’ classification accuracy rate between training data 

and test data for all algorithms 

 
From Table 8, during the building of the predictive model, the decision tree algorithm correctly 

classified 105573 observations equal to the accuracy rate of 78.74% and incorrectly classified 

28227 observations equal to the error rate of 21.06%. In other hands, random forest correctly 

classified 105878 observations equal to the accuracy rate of 79.00% and incorrectly classified 

28142 observations equal to the error rate of 21.00%. Furthermore, the naïve Bayes correctly 

classified 105674 observations equal to the accuracy rate of 78.86% and incorrectly classified 

28326 observations equal to the error rate of 21.14%. With regards to the test data, the decision 

tree algorithm correctly classified 51997 observations equal to the accuracy rate of 78.78% and 

incorrectly classified 14003 observations equal to the error rate of 21.22%. Additionally, random 

forest correctly classified 52019 observations equal to the accuracy rate of 78.82% and 

incorrectly classified 13981 observations equal to the error rate of 21.18%. Again, the naïve 

Bayes correctly classified 51946 observations equal to the accuracy rate of 78.71% and 

incorrectly classified 14054 observations equal to the error rate of 21.29%. Thus, the random 

forest algorithm outperforms both during the building of the predictive model and assessing the 

classification performance. However, the ability to overcome overtraining problem might have 

led to this. Additionally, data overlapping, and the random nature of the modeling algorithms 

presumed to affect the overall performance of the three predictive models. 

 
Training data 

Evaluation criteria Predictive model 

Decision tree Random 

forest 

Naive Bayes 

Accuracy (%) 78.94% 79.00% 78.86% 

Error rate (%) 21.06% 21.00% 21.14% 

Correctly classified observations (TP) 105573 105878 105674 

Incorrectly classified observations (FN) 28227 28142 28326 

Mcnemar's Test p-value <2e-16 <2e-16 <2e-16 

Test data 

Evaluation criteria Predictive model 

Decision tree Random 

forest 

Naive Bayes 

Accuracy (%) 78.78% 78.82% 78.71% 

Error rate (%) 21.22% 21.18% 21.29% 

Correctly classified observations (TP) 51997 52019 51946 

Incorrectly classified observations 

(FN) 

14003 13981 14054 

Mcnemar's Test p-value  <2e-16 <2e-16 <2e-16 
 

Visual Analytics of Tuberculosis Detection Rat Performance 

  
13 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

2.6 Deployment 

This is the last phase that was used to organize and present the knowledge gained to the end-user 

for real application using visualization techniques, such as a plot. The knowledge obtained is 

explicitly aimed at helping users to predict rats’ factors that influence TB detection performance 

and the classes of new data instances (where the class label is unknown). Table 6 pinpoints the 

association between a dichotomous variable and independent variables. Moreover, Figure 2 

shows a variable importance plot used for proper interpretation and ease understanding of the 

knowledge gained. 

3 Results and Analysis 

The data mining process aimed to elicit knowledge from the given structured data and present it 

to the end-user for the real application. And as such, this process was managed by the 

classification technique and algorithms that helped to learn the relationship between the patterns. 

However, the classification technique used three algorithms which are decision tree, random 

forest, and naïve Bayes to build the predictive models. Thus, this section presents results and 

analysis based on the formulated different three hypotheses as follows: 

• There is no measurable accuracy difference between data mining algorithms in predicting the 

factors that associated with TB detection performance in rats 

 
With regards to the first hypothesis, our findings show that there is in fact a measurable 

difference in between the three data mining algorithms. According to McNemar’s Test, the test 

checked if there was significant difference between the counts in two cells made in both 

predictive models. by capturing the errors made by both models. Hence, Table 8 shows that the 

errors made by both models in the test data are not the same, and thus the result of the test is 

significant and the null hypothesis is rejected. Additionally, the Mcnemar's Test p-values for 

both models is <2e-16 which are below the 0.05 leading to the rejection of the null hypothesis 

that the data mining algorthms are statistically significant. 

 
• There is no measurable difference between the predicted factors of rats that affect TB detection 

performance 

 
The classification technique was used to build predictive models that predicted the class for each 

rat and the factors that influence TB detection performance. However, this technique applied 

three algorithms to learn the relationship between variables. The independent variables (input) 

include Age, Av_Weight_Per_Year, Session_Start_Time, and Session_Completion_Time while 

the dependent variable (output) is Performance. Therefore, all three algorithms applied these 

variables separately to build predictive models on factors that influence rats’ TB detection 

performance. Starting with the decision tree algorithm, it generated a decision tree where the top 

node (root node) shows the most significant factor that influences TB detection performance. 

The ability of the algorithm to seek optimal splits in variable values has led to this. Moreover, 

the leaf nodes indicate the class of every instance from the observations. 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
14 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

 
Figure 1: Decision tree with rats factors that influence TB detection performance 

Figure 1 depicts the hierarchy of variables where the variable with a high correlation (Age) with 

the prediction, split on first. Thus, the age of the rat is the most significant factor. However, the 

other predicted factors are shown on the leaf nodes which indicate the class of every instance. 

Moreover, the decision tree algorithm-generated rules which are easy to interpret and understand. 

These rules are the result of the IF-THEN statements from the root node to the leaf nodes as 

reported in Table 9. 

Table 9: Classification rules generated from decision tree algorithm 

Rule 

number 

Rule Performance Decision 

TRUE FALSE Number of 

Observations 

in % 

1 IF Age >= 6 ⇒ 0.13 0.87 10% 

2 IF Age < 0.88 

& Session_Completion_Time>= 7 ⇒ 
0.14 0.86 1% 

3 IF Age is 3.1 to 6 0.15 0.85 6% 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
15 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

& Av_Weight_Per_Year< 845 ⇒ 

4 IF Age is 3.1 to 6 

& Av_Weight_Per_Year >= 845 

& Session_Start_Time >= 14 ⇒ 

0.19 0.81 11% 

5 IF Age < 3.1 

& Session_Completion_Time< 7 ⇒ 
0.20 0.80 10% 

6 IF Age is 3.1 to 6 

& Av_Weight_Per_Year >= 845 & 

Session_Start_Time < 14 ⇒ 

0.22 0.78 36% 

7 IF Age is 0.88 to 3.1 

& Session_Completion_Time >= 7 ⇒ 
0.25 0.75 26% 

 
From Table 9, the first rule implies that older rats (with ages greater or equal to 6 years) had a 

performance chance of 0.13, TRUE and 0.87, FALSE, and detected fewer observations (10%). 

Considering the second rule, rats with the age of fewer than 0.88 years and at least 7 minutes 

(session completion time) had a performance chance of 0.14 TRUE, 0.86 FALSE, and detected 

1% of the observations. Hence, older and less young rats portrayed low performance. The sixth 

rule has 36% of the detected observations. Rats with ages of 3.1 to 6 years, at least 845g of the 

average weight per year, and the session start time before 14:00 hours had a detection 

performance chance of 0.22 TRUE, and 0.78, FALSE. This rule is consistent with the fourth one 

except for the session start time. Since the sixth rule had many observations than the fourth, the 

session starts time before 14:00 hours are the most performed one. Furthermore, the fifth rule 

has 10% detected observations, which imply rats with ages of 3.1 years and session completion 

time of fewer than 7 minutes had a performance chance of 0.20 TRUE and 0.80 FALSE. When 

comparing this rule with the second one, rats with a session completion time of fewer than 7 

minutes depicted potentiality in detection since this rule had many observations compared to the 

second one. Therefore, the results pinpointed in Table 11 manifest that rats with ages of 3.1 to 6 

years, at least 845g of the average weight per year, the session start time before 14:00 hours, and 

fewer than 7 minutes as the session completion time performed well. 

However, it is of importance to understand the extent to which each factor contributed to the 

prediction. The random forest algorithm pinpointed the predictor variables that are important in 

predicting the outcome based on the mean decrease in Gini (impurity), as shown in Figure 2. 

Mean Decrease in Gini is the average (mean) of a variable total decrease in the likelihood of 

incorrect classification of a new instance of a random variable from the data set. 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
16 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

 
Figure 2: Variable importance generated by Random Forest algorithm 

From Figure 2, a higher (1791.9167) and lower (233.5753) mean Decrease in Gini portrays 

greater and less variable importance, respectively. Hence, Age and Av_Weight_Per_Year are 

the most and least significant factors. Both decision tree and random forest have indeed shown 

Age as the most significant factor. Thus, the random forest algorithm and decision tree algorithm 

have predicted the factors that influence rats’ TB detection performance by using the 

classification technique. However, the naive Bayes algorithm was used to create the model and 

compare their classification accuracy since it measures the probabilities of the variables and not 

their weights. Therefore, regarding the second hypothesis, as the p-values shown in Table 10 are 

less than 0.05, we reject the null hypothesis and conclude that there is a measurable difference 

between the predicted factors of rats that affect TB detection performance. Hence, the predicted 

factors are statistically significant. 

Table 10: Predicted factors with their corresponding p_values 

Factor p_value 

Age < 2e-16 

Session_Completion_Time < 2e-16 

Session_Start_Time 2.53e-09 

Av_Weight_Per_Year 2.98e-07 

 
• There is no measurable difference between the ability of rats in TB detection performance 

From the given data and the aim of the study, the rats’ performance implies their ability to detect 

a sample with either TB, TRUE (Sensitivity) or without TB, FALSE (Specificity). Table 3  

manifests that the youngest and oldest rats had the ages of 0.79 and 7.95 years respectively with 

the median age of 3.71 years. Meanwhile, the less weighty and weighty rats had the weights of 

843.7g and 1054.8g respectively with the median of 866.8g. And as such, rats with ages and 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
17 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

weights below and above the median refer to younger, older, less weighty, and weighty rats 

respectively. Additionally, early detection conducted at 8:00 hours while the late detection was 

done at 18:00 hours with the minimum completion time of 1 minute and maximum completion 

time of 129 minutes respectively with the median completion time and start time of 10 minutes 

and 12:00 hours. Since the given data had many numbers of observations with FALSE values 

than TRUE values as shown in Table 5, the rats’ high performance in these data had a FALSE 

value. In this conception, rats’ performance depended on the number of observations 

accomplished. Therefore, the results pinpointed in Table 3 manifest that rats with ages of 3.1 to 

6 years, at least 845g of the average weight per year, the session start time before 14:00 hours, 

and fewer than 7 minutes as the session completion time performed well. With regards to these 

results, it is obvious that there is difference in ability of rats to detect TB samples. This is also 

evidenced by Table 10 which indicated the p_values of each predicted factor which are less than 

0.05, and thus makes the results significant and reject the null hypothesis. 

4 Discussion 

4.1 Characteristics of Data 

Considering data understanding phase, the given data consisted of many variables and 

observations, but the sample size for characterizing a TB rat is therefore only four female rats. 

The given number of rats was the ones found with the requested data and was expected to address 

the aim of the study of finding the influencing factors based on the number of observations as 

shown in Table 4 and not comparing the performance of every rat which would require large 

sample sizes. Moreover, Table 4 reported that, there was no gender equality in the given data 

since all rats were female. However, for the future it is advantageous to analyze data with large 

sample size and both male and female rats to understand which gender influences detection 

performance. 

Based on dependent variable performance, Table 5 demonstrated that data consisted of many 

FALSE values than TRUE values. Since it was the target class for classification, it is presumed 

to have an impact on the results. Thus, when one value has many samples than the other, its 

performance is also higher. It was valuable if the data would have an estimation of about an 

equal number of values of the observations in the detection performance class. And as such, it 

would reduce the suspicion that the results might rely on one group of the data and limits 

generalization. Furthermore, Table 6 shows the logistic regression analysis which examined the 

association of independent variables with a dichotomous dependent variable (performance). The 

p-values for the independent variables are below 0.05, and implies that there is a relationship 

between independent variables and the dichotomous dependent variable. Hence, the variables 

are statistically significant. 

4.2 Factors Influencing Rats TB Detection Performance 

The results depict the strength of the Age factor in the detection performance. Figure 1 shows 

that Age split first due to the highest information gain ratio. As a result, it has appeared in all 

generated rules in Table 9. Contrary to the other variables that are shown only once in the 

generated rules. Moreover, in the variable importance of random forest depicted in Figure 2, the 

decrease mean Gini of Age was higher than the other variables. The results manifested that rats 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
18 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

between the ages of 3.1 to 6 years positively affected the performance. However, it may limit 

the generalization of the results since it referred to female rats. The study of Brushfield et al. [8] 

proposes that detection performance may depend on rats’ characteristics such as age. 

Nevertheless, successful training and growth progress might have led to good detection 

performance. Furthermore, the results show that older rats portrayed a low detection 

performance. And as such, the olfactory deficit might have caused this since detection 

performance depends on the rats’ olfactory sensitivity [13].Moreover, the results provide new 

insight into the relationship between time differences when the rat starts and ends detection tasks 

(session completion time). And as such, good performers were the rats that completed the 

number of observations with less than 7 minutes. Since these rats have a high-speed of detecting 

100 samples in 20 minutes, good performers were the rats that completed the number of 

observations with less than 7 minutes. However, the given data might have influenced the 

session's completion time since the samples contained many values of FALSE (samples without 

TB bacteria). The study conducted by Mgode et al. [7] pinpoints that during the training, rats 

learn to pause for a long time of about 3 seconds to the sample hole with TB bacteria and take a 

short time of about 1 second to the sample hole without TB bacteria. 

Not only that but also, the results contribute a clearer understanding of the influence of average 

weight per year (Av_Weight_Per_Year) on detection performance. Rats with an average weight 

per year of greater or equal to 845g performed better. According to the study conducted by 

Beyene et al. [5], the weight range of adult rats’ females ranges from 1 to 1.5kg. Therefore, one 

can argue that young rats with at most 1.05 kg were the most performers. However, presumed 

the reliability of the results could increase with the corresponding weight rather than the applied 

average. However, these results may limit generalization since they refer to female rats. 

Therefore, rats TB trainers and decision-makers must consider these results to utilize the 

usefulness of this technology and should maintain it for sustainability. On the other hand, the 

results reveal that for the three different algorithms used, the classification accuracy was much 

more in the random forest (78.82%) than decision tree (78.78%) and naive Bayes (78.71%). 

Conversely, the predictive models’ accuracies differences are small. The nature of data and 

algorithms used might have caused this in the sense that random forest and decision tree 

algorithms fit in skewed data different from naive Bayes which, do better in normally distributed 

data [9]. Moreover, in the random forest, the ability to assembly several trees and make the final 

decision from several trees might influence this highest classification accuracy [12]. 

Despite the found results based on the dependent and independent variables given from the data, 

other factors presumed the influence on these results. These factors may include training 

procedures, trainers or recorders (data recording), experimental setup, and laboratory technicians 

(quality control) [7, 9, 15]. The study conducted by Reither et al. [15] argue that since rats are 

trained based on the conditioning techniques which support to change their behavior such as 

learning to recognize sound during the training, it is useful to have the justifiable rules to avoid 

incorrect results. Likewise, Mgode et al. [7] demonstrate that rats’ successful and consistent 

training procedures are most important in TB healthcare centers that apply rat as odor-detection 

technology. With this regards, it is presumed that rats from the given data succeeded in the 

training procedures and thus manifested better performance. Moreover, observing precision in 

data recording during detection tasks is highly emphasized to avoid false results. Since rats’ 

trainers and recorders are the ones performing data recording and training, they should have 

skills in getting consistent records. Hence, a well-organized experiment setup may facilitate rats 


Visual Analytics of Tuberculosis Detection Rat Performance 

  
19 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

to portray better performance [1]. Additionally, before presenting the sample in a cage for 

detection, a standard heat is applied into it to kill infectious microorganisms and enhance quality 

control. Hence, quality control may determine the effectiveness of rats in detection performance. 

Consequently, despite the outcome of the dependent and independent variables, the mentioned 

confounding variables might influence the results. Therefore, rats’ detection performance 

depends on the main and confounding factors. 

5 Conclusion and Recommendations 

This study has focused on the prediction of factors influencing rats’ TB detection performance 

using data mining techniques. Also, building predictive models for predicting the class for every 

new instance (rat). While this study also concentrated on understanding the relationship of the 

manipulated variables, the results indicate that Age, Session_Completion_Time, 

Session_Start_Time, and Av_Weight_Per_Year are the factors influencing rats TB detection 

performance. However, the results show that the age of the rat was the most influencing factor. 

The results also pinpoint that rats with the age of 3.1 to 6 years, at least 845g of the 

Av_Weight_Per_Year, before 14:00 hours as the session start time, and less than 7 minutes as 

the session completion were the best performers. These results are useful to rats’ trainers and 

decision-makers in understanding the potential factors that may affect the detection performance 

and hence increase TB detection performance. Ultimately to support decision making, scaling 

up of the detection technology and improve human health. 

Considering predictive models, the random forest predictive model has the highest classification 

performance accuracy of 78.82%. Followed by the decision tree with 78.78% and naive Bayes 

is the last model with 78.71% and thus makes the random forest predictive model the best model 

for the study. Since this study implemented data mining techniques in a social setting by 

predicting factors that influence rats in detecting TB disease, it is also helpful to the academic 

society of Information systems. However, confounding factors such as training procedures, 

trainers or recorders (data recording), experimental setup, and laboratory technicians (quality 

control) might have an impact on the results. Therefore, to maximize the effectiveness and 

efficiency of these results, several criteria for future research will have to be optimized. First, a 

dataset with large sample size and many desirable variables for rats TB detection performance 

is valuable to increase the number of known factors. Moreover, to predict significant sex 

differences, the dataset should balance gender distribution. 

Acknowledgments 

Data for this study was supported by APOPO TB Training and Research center in Morogoro, 

Tanzania. Many staff from the APOPO TB center have provided advice and appreciated 

suggestions. Colleague’s critiques and comments have consistently improved the  paper. 

Conflicts of interest 

On behalf of all authors, the corresponding author states that there is no conflict of interest. 

 
Visual Analytics of Tuberculosis Detection Rat Performance 

  
20 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

6 References 

1. Poling A, Weetjens B, Cox C, Beyene N, Durgin A, et al. 2011. Tuberculosis Detection by 

Giant African Pouched Rats. Behav Anal. 34(1), 47-54. PubMed 

https://doi.org/10.1007/BF03392234  

2. World Health Organization. Global tuberculosis report 2018. New York, United States of 

America: WHO; 2018. 

3. Ellis H, Mulder C, Valverde E, Poling A, Edward T. 2017. Reproducibility of African giant 

pouched rats detecting Mycobacterium tuberculosis. BMC Infect Dis. 17, 298. doi:. PubMed 

https://doi.org/10.1186/s12879-017-2347-3 

4. Weetjens BJ, Mgode GF, Machang’u RS, Kazwala R, Mfinanga G, et al. 2009. African 

pouched rats for the detection of pulmonary tuberculosis in sputum samples. Int J Tuberc 

Lung Dis. 13, 737-43. PubMed 

5. Beyene, N., Mahoney, A., Coxi, C., Weetjens, B., Makingi, G., Mgode, G, et al. (2012). 

APOPO’s tuberculosis research agenda: achievements, challenges and prospects. Tanzania 

Journal of Health Research. doi: 10.4314/thrb.v14i2.5 

6. Poling A, Valverde E, Beyene N, Mulder C, Cox C, et al. 2017. Active Tuberculosis detection 

by pouched rats in 2014: More than 2,000 new patients found in two countries. J Appl Behav 

Anal. PubMed https://doi.org/10.1002/jaba.356 

7. Mgode GF, Cox CL, Mwimanzi S, Mulder C. 2018. Pediatric tuberculosis detection using 

trained African giant pouched rats. Pediatr Res. 84(1). doi:. PubMed 

https://doi.org/10.1038/pr.2018.40 

8. Brushfield A, Luu T, Callahan B, Gilbert P. 2008. A comparison of discrimination and reversal 

learning for olfactory and visual stimuli in aged rats. Behav Neurosci. 122(1), 54-62. PubMed 

https://doi.org/10.1037/0735-7044.122.1.54  

9. Sharda, Delen & Turban (2014). Business Intelligence and Analytics (Tenth edition). 

  10.Nagabhushanam D, Naresh N, Raghunath A, Praveen Kumar K. 2013. Prediction of 

Tuberculosis Using Data Mining Techniques on Indian Patient’s Data. Int J Cloth Sci Technol. 

4, 262-65. 

 11.Suresh, N. & Arulanandam, D. (2018). A Mining Approach for Detection and Classification 

Techniques of Tuberculosis Diseases. 

 12.Asha, T., Natarajan,S., Murthy, K.N.B., (2011). Effective Classification Algorithms to Predict 

the Accuracy of Tuberculosis-A Machine Learning Approach. 

 13.Czarnowski I., Jędrzejowicz P. 2018. An Approach to Data Reduction for Learning from Big 

Datasets: Integrating Stacking, Rotation, and Agent Population Learning Techniques. 

Complexity. doi:. https://doi.org/10.1155/2018/7404627 

https://pubmed.ncbi.nlm.nih.gov/22532730
https://doi.org/10.1007/BF03392234
https://pubmed.ncbi.nlm.nih.gov/28438117
https://doi.org/10.1186/s12879-017-2347-3
https://pubmed.ncbi.nlm.nih.gov/19460250
https://pubmed.ncbi.nlm.nih.gov/27718224
https://doi.org/10.1002/jaba.356
https://pubmed.ncbi.nlm.nih.gov/29617007
https://doi.org/10.1038/pr.2018.40
https://pubmed.ncbi.nlm.nih.gov/18298249
https://doi.org/10.1037/0735-7044.122.1.54
https://doi.org/10.1155/2018/7404627


Visual Analytics of Tuberculosis Detection Rat Performance 

  
21 
Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 13(2):e12, 2021 

OJPHI 

14.Hussain, S. (2015). Educational Data Mining using R Programming and R Studio. Journal of 

applied and fundamental sciences 

15.Reither, K., Jugheli, L., Glass, T.R., Sasamalo, M., Mhimbira, F.A., Weetjens, B.J, et al. 

(2015). Evaluation of Giant African Pouched Rats for Detection of Pulmonary Tuberculosis 

in Patients from a High-Endemic Setting. 

16.Chaurasia V, Pal S. (2013). Data Mining Approach to Detect Heart Disease. International 

Journal of Advanced Computer Science and Information Technology Volume 2, Issue 4, 

ISSN: 2296-1739. 

17.Ameri H, Alizadeh S, Hadizadeh M. 2014. Assessing the Effects of Infertility Treatment 

Drugs Using Clustering Algorithms and Data Mining Techniques [Persian]. J Mazandaran 

Univ Med Sci. 24, 26-35. 

18.Ayas, S. &Ekinci, M. (2014). Random forest-based tuberculosis bacteria classifications in 

images of ZN-stained sputum smear samples. doi: . 

19.Kraemer S, Apfelbach R. 2014. Olfactory sensitivity, learning and cognition in young adult 

and aged male Wistar rats. Physiol Behav. PubMed 

20.Mahoney A, Edwards TL, Weetjens BJ, Cox C, Beyene N, et al. 2013. Giant African pouched 

rats (CricetomysGambianus) as detectors of Tuberculosis in human sputum: Two operational 

improvements. Psychol Rec. 63, 583-94. https://doi.org/10.11133/j.tpr.2013.63.3.012  

21.Maniya H, Hasan MI, Patel PK. 2011. Comparative study of Naïve Bayes Classifier and KNN 

for Tuberculosis [IJCA]. Int J Comput Appl. 

22.Mulder C, Mgode GF, Reid SE. 2017. Tuberculosis diagnostic technology: an African   

solution … think rats. Afr J Lab Med. 6(2),  https://doi.org/10.4102/ajlm.v6i2.420  

23.PrasannaDesikan. Kuo-Wei Hsu, Srivastava,J. (2011). Data Mining for Healthcare 

Management. SIAM International Conference on Data Mining. 

 24.World Health Organization. Make every mother and child count. Geneva, Switzerland: WHO; 

2005. 

https://pubmed.ncbi.nlm.nih.gov/15135015
https://doi.org/10.11133/j.tpr.2013.63.3.012
https://doi.org/10.4102/ajlm.v6i2.420