IEEE Paper Template in A4 (V1)


International Journal on Advances in ICT for Emerging Regions 2022 15 (3): 

December 2022                                                                             International Journal on Advances in ICT for Emerging Regions 

Improving Drug Combination Repositioning using 

Positive Unlabelled Learning and Ensemble 

Learning 
Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

Abstract— Drug repositioning is a cost-effective and time-

effective concept that enables the use of existing drugs/drug 

combinations for therapeutic effects. The number of drug 

combinations used for therapeutic effects is smaller than all 

possible drug combinations in the present drug databases. 

These databases consist of a smaller set of labelled positives and 

a majority of unlabelled drug combinations. Therefore, there is 

a need for determining both reliable positive and reliable 

negative samples to develop binary classification models. Since, 

we only have labelled positives, the unlabelled data has to be 

separated into positives and negatives by a reliable technique. 

This study proposes and demonstrates the significance of using 

Positive Unlabelled Learning, for determining reliable positive 

and negative drug combinations for drug repositioning. In the 

proposed approach, the dataset with known positives and 

unlabelled samples was clustered by a Deep Learning based Self 

Organizing Map. Then, an ensemble learning methodology was 

followed by employing three classification models. The 

proposed PUL model was compared with the frequently used 

approach that randomly selects negative drug pairs from 

unlabelled samples. A significant improvement of 19.15%, 

20.56% and 20.23% in the Precision, Recall and F-measure, 

respectively, was observed for the proposed PUL-based 

ensemble learning approach. Moreover, 128 drug repositioning 

candidates were predicted by the proposed methodology. 

Further, we found literature-based evidence to support five 

drug combinations that may be able to be repositioned. These 

discoveries show our proposed PUL approach as a promising 

strategy that is applicable in drug combination prediction for 

repositioning. 

Keywords— Drug repositioning, Positive Unlabelled Learning 

(PUL), Deep Learning, Self-Organizing Maps (SOM), Support 

Vector Machine (SVM) 

I. INTRODUCTION  

ntroducing a new drug to the market is time consuming 

and costly. Nearly it takes seven to fifteen years to 

introduce a new drug to the market and approximately 

around $700-$1000 million cost for the whole process since 

it requires to undergo a massive experimental procedure 

before going to the hand of patients. [1]. Therefore, most of 

the pharmaceutical companies and medical research 

institutes are trying to find alternatives, which can be used to 

prevent and cure human diseases. As one of the most 

efficient and trust worthy approaches, repurposing or the 

reuse of existing drugs as treatments to some other diseases 

that still do not have proper treatments is an emerging topic  

 
Correspondence: Yashodha Ruchini Maralanda (E-mail: 

yashodar95@gmail.com) Received: 24-08-2021  

Revised:04-01-2023 Accepted: 11-01-2023  

Yashodha Ruchini Maralanda, Pathima Nusrath Hameedis from Department 

of Computer Science, University of Ruhuna, Sri Lanka 
(yashodar95@gmail.com, nusrath@dcs.ruh.ac.lk ). 

 

DOI: http://doi.org/10.4038/icter.v15i3.7232 
 

© 2022 International Journal on Advances in ICT for Emerging Regions.  

from the last decade. This concept is known as drug 

repositioning or drug repurposing. 

Moreover, drug combinational treatments are identified to be 

much efficient in avoiding drug resistance at treating 

complex diseases like cancer [2]. Since there exist 

approximately 16,000 [3] of approved drugs in the market, 

an extremely large number of drug combinations can be 

formed. However, only a very small number out of them are 

confirmed with experimental researches. Therefore, there is 

a need of an accurate and more predictive approach to infer 

useful drug combinations out of those millions of possible 

drug combinations, which remains yet unlabelled.  

Existing drug combination repositioning approaches have 

followed binary classifications [4]–[6] as well as several 

other approaches such as tree based techniques [8] for 

repositioning of drug combination data. In the existing 

binary classification approaches, the unlabelled samples 

were considered as negatives [4]–[6]. Therefore, the results 

of existing studies might be unreliable, inaccurate and may 

cause to the loss of valuable and repositionable drug 

combinations.  

In this study, a Positive Unlabelled Learning (PUL) based 

approach was proposed to address this problem. It uses a 

deep learning based unsupervised clustering approach 

followed by binary classification enabling us to select 

reliable negatives for binary classification.  Unsupervised 

clustering method was based on a Deep Learning model 

using identified drug-drug similarities. The clusters with 

least significant known drug combinations were considered 

as the clusters with negatives. Our model has been compared 

using the frequently used binary classification approach that 

randomly selects samples as negatives from unlabelled data. 

Thereby, model predictions were evaluated and the 

significance of the PUL approach has been highlighted. To 

the best of our knowledge, this is the first attempt focusing 

on learning from positive unlabelled data for drug 

combination repositioning using drug-based features. 

In Section II, an overview of existing literature under the 

domain, and their limitations are emphasized stating the need 

and the importance of our work. In Methodology and 

Materials Section, our dataset and our research workflow are 

explained in detail. Then, in the Results Section, we have 

illustrated our results that are relevant to the PUL-based 

ensemble learning methodology and the final predictions. 

Next, under Discussions Section, we have emphasized the 

significance of the proposed PUL approach, future work 

capabilities and literature based evidences about some of the 

predicted results. Finally, Section VI provides the 

concluding remarks of this study. 

II. RELATED WORK 

Drug repositioning via in-silico methods has become 

popular and there exist many successful efforts in this 

I  

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted 

use, distribution, and reproduction in any medium, provided the original author and source are credited 

mailto:yashodar95@gmail.com
mailto:yashodar95@gmail.com
mailto:nusrath@dcs.ruh.ac.lk
http://doi.org/10.4038/icter.v15i3.7232
http://doi.org/10.4038/icter.v15i3.7232
https://creativecommons.org/licenses/by/4.0/legalcode
https://creativecommons.org/licenses/by/4.0/legalcode


73                Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

 

International Journal on Advances in ICT for Emerging Regions            December 2022 

domain. Majority of them are single drug repositioning 

approaches while a considerable number have focused on 

repositioning of drug combinations as novel therapeutics to 

diseases.  Moreover, Machine Learning techniques such as 

Support Vector Machine (SVM), Naïve Bayes, Logistic 

Regression, and Random Forest as well as Deep learning 

techniques including Deep Neural Networks, Convolutional 

Neural Networks and Deep Feed Forward Networks were 

employed. 

Use of unlabelled data under binary classification, 

involves different methods. One common paradigm is the 

random selection of samples from unlabelled data as labelled 

negatives. Li et al. [4]'s study was for repositioning of drug 

combinations and it was a binary classification problem. 

Their dataset was composed of a majority of unlabelled data 

and a comparably smaller set of labelled positive samples. 

However, they have taken all the unlabelled samples as 

negative rather than selecting the plausible positives and 

negatives in unlabelled data and have further checked for 

overfitting by varying the positive and negative sample ratio 

from 1:12. Finally, they have chosen 1:1 as the most 

appropriate ratio since it has produced the best result. Even 

though their study was involved with an ensemble learning 

methodology, the reliability of the predictions might not be 

satisfiable because of the random selection of the negative 

sample. 

Similarly, Chen et al. [5] has used this approach of random 

negative selection from a set of drug combinations, which do 

not have a proper labelling. They have carried out a binary 

classification of the selected labelled positives and randomly 

sampled negatives via Random Forest based on chemical 

interactions between drugs, protein interactions between 

drug targets and the target enrichment of KEGG pathways. 

Furthermore, map reduce programming model was used 

together with SVM and Naïve Bayes classifiers to identify 

novel drug combinations by Sun et al. [6]. Their negative 

dataset was composed of randomly paired drugs, which were 

belonging to the 103 single drugs that have been selected 

from DCDB [7]. 

TreeCombo [8] is another work, which has used a tree 

based approach to predict drug combinations with the use of 

physical and chemical properties of drugs together with gene 

expression levels of cell lines.  Use of clinical side-effects to 

predict drug combinations has been tested by Huang et al. 

[9]. They have applied Logistic Regression and predicted 

drug combinations based on their clinical side effects. Here, 

they have categorized drug combinations as safe and unsafe 

by using three key side effects that were identified as more 

contributing towards model performance. NLLSS [10] was 

another approach that has integrated known synergistic drug 

combinations, unlabelled combinations, drug-target 

interactions and drug chemical structures to predict 

synergistic drug combinations. Moreover, they have 

followed a different method by involving Loewe Score [11] 

for drug combinations and they have classified data into 

principal drugs and adjuvant drugs based on a set of rules. 

Kalantarmotamedi et al. [12] applied a Random Forest 

approach with Transcriptional Drug Repositioning in order 

to identify synergistic drug combinations against Malaria. Li 

et al. [13] has implemented PEA; an algorithm to model drug 

combinations using a Bayesian network which was also 

integrated with a similarity algorithm. 

Shi et al. [14] have used Matrix Factorization to predict 

potential Drug-Drug Interactions (DDIs) between two drugs 

as well as between a set of drugs by using side effect 

information of known drugs. Moreover, they have 

introduced the ability of predicting the interaction between 

new drugs with another new drug that has no yet approved 

interactions. 

Apart from machine learning, recently, Deep Learning has 

grabbed more interest in the domain of drug combination 

repositioning. Several studies have been carried out in order 

to predict novel drug candidate pairs. MatchMaker [15] is a 

Supervised Learning framework implemented based on a 

Deep Neural Network to predict drug synergy scores referred 

to as Loewe score. Chemical structures and untreated cell 

line gene expression profiles of drugs were utilized with 

three separate sub-networks where two of them are parallel 

executions for separate drugs in a pair and the third sub-

network is for the whole drug pair. 

DeepSynergy [16] for predicting anti-cancer drug synergy 

is based on a Feed Forward Neural Network which takes 

three inputs including chemical descriptors from two drugs 

and the genomic information of the cell line. The output from 

the network was the synergy score for the given input drugs. 

These synergy scores then decided whether the drug 

combination is positive or negative.  

Lee et al. [17] have used Deep Feed-Forward Networks to 

predict DDI effects based on a set of drug features; structural 

similarity profiles, gene ontology term similarity profiles and 

target gene similarity profiles. In order to perform feature 

reductions, they have used Autoencoders, which was proven 

to have improved performances rates than Principle 

Component Analysis (PCA). Reduced profile pairs were 

then concatenated and fed to the network. RMSprop and 

Adam were used as optimizers with the Autoencoder and 

Deep Feed Forward Network respectively. Autoencoders 

were trained twice in order to predict DDI types more 

accurately. Li et al. [18] have presented a novel 

Convolutional Neural Network based model which is 

capable of predicting indications for new drugs by 

identifying the relevant lead compounds using the drug 

molecular structure information and disease symptom 

information. Under this, they have constructed similarity 

matrices out of the above two vectors of information and they 

were mapped into one grey scale image. Finally, this was 

used as the input to a Convolutional Neural Network model 

and that was executed using MATLAB software. Here, they 

have used stochastic gradient descent as an optimizer. 

Peng et al. [19] has performed a prediction of drug-drug 

interactions using a deep learning model. They have taken 

true positive and true negative drug combinations from the 

dataset under first approach and true positive and sampled 

negative drug combinations in their second approach. Lee et 

al. [20] has involved drug pairs but it is not an approach for 

prediction of drug combinations as repositioning candidates. 

They have used Deep feed-forward networks to predict drug-

drug interaction effects based on a set of drug features.  

Zhang et al. [21] have implemented an ensemble model 

for DDI predictions. They have followed a semi-supervised 

learning approach because they wanted to identify 

unobserved DDIs, which might be available among other 

possible drug pairs. This is similar to identifying possible 

positive samples out of an unlabelled sample.  



Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 74 

December 2022                        International Journal on Advances in ICT for Emerging Regions 

PUL has become an emerging topic since most of the 

natural data exist as positive and unlabelled data samples 

rather than having already defined positive and negative 

samples. There are several researches carried out under 

learning from PU data.  

Sellamanickam et al. [22] have proposed a ranking based 

SVM model (RSVM) where the positive samples obtain 

higher scores than the unlabelled samples. A threshold 

parameter was estimated to form their final classifier. Liu et 

al [23] have followed a similar approach. They have 

introduced a novel computational framework for drug-drug 

interaction prediction with Dyadic PUL. They have 

identified the lack of a reliable method for separation of their 

unlabelled data into positives and negatives. Therefore, they 

have introduced a scoring function and assigned a certain 

score to each data pair. According to the assigned scores they 

have separated data into positive and negative by making the 

top scoring data pairs as positives while keeping the lower 

scoring data pairs as negatives. The top scoring data pairs 

were defined as the samples, which obtain a higher score 

than the average score of the unlabelled data pairs. 

Further, Zhao et al. [24] have proposed a method for 

protein complex mining by employing SVM with the use of 

PUL. They have introduced an efficient sub graph searching 

method that can search complex sub graphs. First, they have 

tried to express the traditional training dataset with positive 

and negative samples as a non-traditional training set with 

positive and unlabelled samples. Then they have tried to 

identify the relationship between the two classifiers that were 

trained with those two types of training samples. 

Even though, there are studies that have used PUL, to the 

best of our knowledge, no study was identified that has 

specifically focused on learning from positive unlabelled 

data for drug combination repositioning domain according to 

drug based features. Since, drug combination repositioning 

is one of the interesting and hot topics, and similarly PUL is 

an emerging field, we can identify the need of a PUL study 

with related to drug combination repositioning. The primary 

objective of our study is to introduce a new reliable 

computational method for PUL-based drug combination 

repositioning.  

III. MATERIALS AND METHODOLOGY 

A. Dataset 

In order to demonstrate the effectiveness of the proposed 

approach, 183,315 drug combinations from 606 drugs that 

was collected from Li et al [4]’s study were used. Drug 

Target Similarity, Drug Indication Similarity, Drug Structure 

 

 

Fig. 1  Workflow of  the Random and PUL approaches 

 



75                Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

 

International Journal on Advances in ICT for Emerging Regions            December 2022 

Similarity, Drug Expression Similarity and Drug Module 

Similarity of the above drug combinations were also 

collected from Li et al [4]'s study. They consist of Jaccard 

coefficient to represent the above similarities between drug 

pairs. Using them, we constructed a drug combination 

similarity matrix with the corresponding five feature 

similarity scores and the file was (183,315, 5) dimensions 

large. Li et al.[4]’s study was composed of 1,196 labelled 

positive drug combinations for the 606 drugs that we are 

interested. After separation of labelled positives, there were 

182,119 drug combinations in the unlabelled dataset 

(Supplementary Files S1 and S2). 

B. Proposed Methodology 

The concept of learning from positive and unlabelled data 

is a setting where we have only that majority of unlabelled 

data and a set of already labelled positive data. Even though 

it is yet unlabelled, this set of unlabelled data may also 

contain both positive and negative samples. With this PUL 

technique, we are trying to identify them separately. The 

concept of PUL has drawn the attention of researchers due to 

its ability of providing reliable solutions. With the surge of 

this technique, it has diminished the need of having fully 

supervised data for computational model driven research 

work. With this PUL concept, it has enabled the involvement 

of unlabelled data for computational model driven learning 

processes. Many applications and research work have 

utilized this concept.  

Unlabelled drug combinations may compose of plausible 

negative samples as well as repositionable drug 

combinations. Therefore, there is a need of a proper 

mechanism to identify the most probable set of negative 

samples to develop a reliable classification model. We have 

introduced a novel PUL approach for drug combination 

repositioning. Our proposed method enables learning from 

positives and unlabelled drug combinations in order to 

identify plausible negatives as well as plausible positives 

within majority of unlabelled data. We proposed PUL using 

a deep learning and ensemble learning methodology to 

predict reliable drug combinations for repositioning. 

Here, we have used two approaches, which can be used to 

determine negative drug combinations from the unlabelled 

dataset. Firstly, the frequently used random selection of 

negatives from unlabelled data and secondly, the proposed 

PUL using deep learning and ensemble learning. Fig. 1 and 

Fig. 2 illustrates the complete workflow based on the two 

approaches. We demonstrate a comparison of the 

performance of both approaches employing Receiver 

Operating Curve, Precision-Recall Curve, accuracy, 

precision, recall and F-measure. Hence, the significance of 

the PUL approach for drug combination based drug 

repositioning is emphasized. Furthermore, we have 

identified a set of plausible positive drug combinations that 

can be repositioned for new/rare diseases. Repositioning of 

these predicted drug combinations need further research with 

laboratory experiments and other background analysis with 

expertise knowledge. Therefore, it needs to be carried on as 

a separate experiment which becomes the second phase of 

our research. 

C. Random Approach 

In this approach, a randomly selected sample of 

unlabelled drug combinations, which is equal in size to that 

of the labelled positive sample was employed. Our labelled 

 

Fig. 2  Workflow of  the ensemble methodology for inferring drug reositioning candidates 

 



Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 76 

December 2022                        International Journal on Advances in ICT for Emerging Regions 

positive sample was composed of 1,196 drug combinations. 

Hence, we have taken a random sample of 1,196 unlabelled 

drug combinations as negatives. As this was a binary 

classification, class labels were assigned as 1 and 0, where 1 

for positive and 0 for negative classes respectively. 

Classification was carried out using the three classifiers; 

SVM, Stochastic Gradient Descent-based Classifier (SGD-

Classifier) and the Deep Neural Network (DNN) classifier. 

According to Nguyen et al. [25], we have identified that a 

train-test split of 70:30 is much effective with random 

sampling. Therefore, we decided to use the same split for 

both approaches. Out of the positive and negative datasets, 

30% was used for model testing while the remaining 70% 

was taken for training the model. Implementation was 

carried out using python with scikit-learn library [26] for 

SVM and SGD-Classifier and the Keras library for the deep 

neural network. The accuracy, precision, recall and F1-

scores were then recorded. 

D. Positive Unlabelled Learning (PUL) Approach 

Labelled positive sample was the same as in random 

approach, but selection of the negative sample was carried 

out by learning from positive and unlabelled drug 

combination data. A Self-Organizing Map (SOM) was used 

to cluster the sample with positive and unlabelled data and 

then the clusters were analysed to identify plausible negative 

samples from unlabelled data.  

For each cluster, probability of having labelled positive 

samples was calculated according to the Positive Probability. 

(Positive probabilities for each cluster are provided in 

Supplementary File S3). We defined the Positive Probability 

to be the ratio between Known drug combinations in cluster 

i and Total number of combinations in cluster i where i is the 

cluster ID. 

Since there are 1,196 known positives, we need 1,196 

reliable negatives to train the binary classifier. Therefore, we 

sorted each cluster based on its calculated positive 

probability value. The unlabelled drug pairs in the clusters 

with the lowest positive probability are considered as reliable 

negatives. Therefore, we aggregated the clusters with lower 

positive probability until we observe a sample size greater 

than or equal to 1,196. Accordingly, three clusters with the 

least positive probabilities were combined to get the set of 

least significant drug combinations. Thereby we observed 

3,115 negatives by aggregating the clusters where the 

positive probability is less than or equal to 0.000962.  Since 

we required balanced positive and negative samples, we 

randomly selected 1,196 negatives from the above-identified 

3,115 negatives.  

After selection of a negative sample via PUL, labelled 

positive and the negative sample were classified using the 

SVM, SGD-Classifier and the DNN model. Since, we 

needed to compare the performance of random and the PUL 

approach, we kept the model parameters fixed to the ones 

that were used in random approach. Similarly, 30% of data 

 

Fig. 3  Venn diagram to denote the distribution of the data 

 

Fig. 1.  

 

TABLE 1 

PERFORMANCE ASSESSMENT OF THE PROPOSED POSITIVE UNLABELLED LEARNING APPROACH AND RANDOM APPROACH  

 SVM SGD-Classifier DNN Classifier 

 Random PUL Random PUL Random PUL 

Accuracy 0.6421 0.7925 0.7103 0.8774 0.7326 0.9721 

Precision 0.6799 0.8413 0.8036 0.9564 0.7203 0.9806 

Recall 0.5628 0.7454 0.5893 0.7917 0.7328 0.9646 

F1 - score 0.6158 0.7904 0.6800 0.8663 0.7265 0.9725 

 



77                Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

 

International Journal on Advances in ICT for Emerging Regions            December 2022 

taken as the testing set while remaining 70% was taken for 

model training. Then, accuracy, precision, recall and the F1-

score given by the model were recorded. 

E. Ensemble Learning Methodology 

Figure 2 illustrates the ensemble learning approach used 

in this study. In order to predict drug repositioning 

candidates from unlabelled drug combinations, averaging 

ensemble learning technique was used. First, class 

probabilities for the unlabelled combinations were predicted 

using the three individual models separately. Then the 

separate probabilities obtained for each drug combination to 

be belonged to class 0 (negative class) or class 1 (positive 

class) were averaged and predicted a novel probability for 

each drug combination. The new class probabilities were the 

ensemble learning based class predictions. We then 

predicted the best candidate drug combinations. 

F. Clustering and Classification Models 

1) Self-Organizing Maps (SOM): SOM [27] is an 
Artificial Neural Network, which is widely used under 

unsupervised learning problems. The major difference of 

SOM with compared to other neural network models is the 

use of competitive learning. SOM has the capabilities of 

dimensionality reduction and it has the ability to identify 

similarities in data. It is evident that deep learning models 

have higher performance, with compared to machine 

learning approaches [28]. So, we have decided to cluster our 

unlabelled dataset using a minimalistic and Numpy based 

implementation of SOM known as MiniSom 

(https://github.com/JustGlowing/minisom/), which is a 

python library and much more adaptive with the 

environment where it is being used.  

A two-dimensional SOM of size 9x9 was chosen as the 

optimal size with a learning rate of 0.09 which is trained for 

 

 

Fig. 4 Plot of Quantization Error and Topogrphic Error with a fixed learning rate of 0.5 and fixed map size of 7x7 

 

  

 

 

Fig. 5 Plot of Quantization Error and Topogrphic Error  with a fixed map size of 9x9 and fixed number of iterations of 8000 

 

 

https://github.com/JustGlowing/minisom/


Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 78 

December 2022                        International Journal on Advances in ICT for Emerging Regions 

8000 iterations. Selection of the optimal size; learning rate 

and the number of iterations were performed after 

calculating the quantization and topographic errors by 

varying their values appropriately [29], [30].  

As the first step of optimal parameter identification, a set 

of initial parameters were needed to be determined. Hence, 

the learning rate of 0.5, was chosen as the initial learning rate 

for our model. Since a large dataset is used in this study, a 

considerably larger map size is required. Therefore, the map 

size of SOM was decided gradually increasing the 

dimensionality from 7x7. Hence, the initial parameters for 

learning rate and map size were defined as 0.5 and 7x7 

respectively.  

Model training was carried out multiple times with 

varying number of iterations, fixed learning rate and map 

size in order to record the Topographic Error and 

Quantization Error based on each experiment. Recorded 

error values for number of iterations that has been used in 

each experiment were plotted (see Fig. 4). According to the 

elbow technique, the experiment with 8000 iterations was 

chosen as the optimal value.  

Since the optimal number of iterations was identified, our 

next experiment was followed to identify the optimal map 

size. We fixed the learning rate to 0.5 and number of 

iterations to 8000 and performed training of the model 

multiple times by gradually increasing the map size at each 

experiment. At a map size of 9x9, we could observe a clear 

deduction in Topographic and Quantization Error, which 

then again shows an increase in error values (see Fig. 5). 

Therefore, we determined 9x9 as the optimal map size. 

After that, we used the above identified map size and the 

number of iterations to determine the optimal learning rate. 

We set the number of iterations and map size to 8000 and 

9x9, respectively. The training process was performed 

multiple times for different learning rates. Finally, an 

experiment of the error values corresponding to a learning 

rate of 0.09 was determined as the optimal learning rate in 

our problem. 

2) Support Vector Machine (SVM): SVM [31] is an 
algorithm, which always finds a hyperplane in an n-

dimensional space where the number of dimensions is equal 

to the number of features used in the dataset. This can be 

applied for both binary classification as well as multi-class 

classification problems. Since this is an algorithm that has 

been widely used because of its higher prediction capabilities, 

we have decided to use it as a binary classifier in our work. 

The employed SVM model was followed by a sigmoid 

kernel, since sigmoid kernel is the most appropriate for 

binary classification problems. 

 

Fig. 6 Plot of Quantization Error and Topogrphic Error  with a fixed learning rate of 0.5 and fixed number of iterations of 8000 

 

 

 

 
Fig. 7  Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Deep Neural Network classifier for Random 

and PUL approache 



79                Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

 

International Journal on Advances in ICT for Emerging Regions            December 2022 

3) Stochastic Gradient Descent based Classifier 
(SGD-Classifier): This is a linear classifier that is 

emphasized in Scikit-learn [26], that has been optimized 

using Stochastic Gradient Descent (SGD). It supports loss 

functions and penalties that are used in classification 

purposes. Further, this is capable of minimizing/maximizing 

the loss function defined by the model. Here, we have used 

the log los function, and with that, our model acts similar to 

logistic regression (LR). However, importance of using 

SGD-Classifier with log loss apart from direct LR model is 

that, even if LR is not capable of directly calculating the 

minimum value of its loss function, with the use of SGD-

Classifier we can easily perform it. Therefore, the 

performance is comparably better and so that we have used 

SGD-Classifier for classification purposes in our work. Even 

though both log loss and modified_huber loss for the loss 

parameter in SGD-Classifier enables to predict class 

probabilities for data, log loss has given the best performance 

in our case. Therefore, we employed a SGD-Classifier model 

followed by a log loss function. 

4) Deep Neural Network (DNN) Classifier: The DNN 
model that was implemented using Keras library 

(http://github.com/keras-team/keras) was composed of a 

fully connected network with three layers. Since, ReLu 

activation function shows better performance when referring 

to a majority of current researches, it was used in the first 

two layers and sigmoid activation function was used in the 

output layer since this is a binary classification problem. The 

dimensions of the layers were selected as 5, 12, 5 and 1 for 

the input layer, two hidden layers and the output layer 

respectively such that it gives a better model for the 

classification of our dataset. We have set the loss parameter 

as binary_crossentrphy as it is specifically designed for 

binary classification problems in Keras. Further, we have 

employed the Adam optimizer as it is well suited for the 

instances where there are large datasets. Since our prediction 

dataset is large, we have involved Adam optimizer to 

improve the accuracy of predictions. 

G. Evaluation Metrics 

We have divided our dataset into training and testing sets 

in order to validate the implemented model performances. 70% 

of the dataset was used for training and 30% was used for 

testing. Common validation measures including accuracy, 

precision, recall and F1-scores from the random and PUL 

approaches were calculated using below equations where, 

TP – True Positive, FP – False Positive, TN – True Negative 

and FN – False Negative. 

 

Accuracy = (TP + TN) / (TP + TN + FP + FN) (1) 

Precision = TP / (TP + FP) (2) 

Recall = TP / (TP + FN) (3) 

F1 Score = 2 * Precision * Recall / (Precision + Recall) (4) 

 

Furthermore, Receiver Operating Curve (ROC) is an 

important measure at binary classification problems, which 

 

 

Fig. 8  Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Support Vector Machine classifier for Random 
and PUL approaches 

 

 

 

Fig. 9  Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Stochastic Gradient Discent-based classifier for 

Random and PUL approaches 

 

http://github.com/keras-team/keras


Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 80 

December 2022                        International Journal on Advances in ICT for Emerging Regions 

plots false positive rate versus true positive rate. Precision-

Recall (PR) Curve provides more information by plotting the 

precision and recall for different thresholds. Therefore, we 

have observed the ROC and PR curves for our two 

approaches.  

IV. RESULTS 

In comparison to the random approach for negative 

sample selection, our proposed PUL approach demonstrates 

a significant improvement in the performance. (See Table 1). 

The accuracy, precision, recall, and F1-score for the PUL 

approach based on the three classifiers SVM, SGD-Classifier 

and the DNN classifier shows higher accuracies than the 

values recorded with random approach. For instance, F1-

score has improved by 17.46%, 18.63% and 24.60% for 

SVM, SGD-Classifier and the DNN classifier respectively 

when the PUL approach is used.  

When comparing the performance of three classifiers 

based on accuracy, precision, recall and F1-score, DNN 

classifier shows relatively higher performance for both 

random as well as the PUL approach. (See Table 1) SGD-

Classifier shows the second-best performance while SVM 

has relatively lower performance with compared to the other 

two classifiers.  

A comparison of the ROC and PR curves for random and 

the PUL approaches based on the three models also 

emphasize the higher skill of the model that was trained 

under the PUL approach (See Fig. 7, Fig. 8, and Fig. 9).  

The ROC and PR curves are drawn in blue and orange 

colours for PUL, random approaches respectively. The x-

axis represents false positive rate. If this rate is closer to zero, 

our model predicts only a few false positives. Similarly, the 

y-axis shows true positive rate. If this rate is closer to one, 

the model predicts a majority of the true positives. Therefore, 

an ROC curve that has bowed much towards the (0, 1) 

coordinate of the plot is considered to have higher skill 

compared to others. The blue coloured ROC plot based on 

each classifier has bowed towards the (0, 1) coordinate of the 

plot more than the orange coloured plot of random approach. 

Hence, the ROC curves emphasize the higher skill of the 

models that are trained using PUL approach.  

The x-axis of PR curve represents recall. If recall gives a 

value that is closer to one, our model predicts only a few false 

negatives. Similarly, the y-axis shows precision. If precision 

is closer to one, the model predicts only a few false positives. 

Therefore, a PR curve that has bowed much towards the (1, 

1) coordinate of the plot is considered to have higher skill 

compared to others.  

TABLE 2 

PERFORMANCE ASSESSMENT OF ENSEMBLE LEARNING 

 

The blue coloured plots of PUL approach have bowed 

towards (1, 1) coordinate more than the orange coloured 

plots of the random approach. This further emphasizes the 

higher skill of the models that are trained using proposed 

PUL approach. 

A further comparison between the three ROC curves 

emphasize that DNN classifier gives the highest skilled 

model out of the three classifiers. The reason is that the ROC 

curve of DNN classifier is bowed the most towards the (0, 1) 

coordinate of the plot. The PR curve of DNN classifier is 

bowed the most towards (1, 1) coordinate showing the least 

number of false negatives and false positives. This further 

proves the higher skill of the DNN classifier. 

We built the classifiers using SVM, SGD-Classifier and 

DNN and then we combined their individual predictions to 

obtain the final prediction. This may reduce the variance of 

the final outputs. Table 2 summarizes the performance 

assessment of the ensemble learning approach where the 

performance measures of the three classifiers are averaged. 

The evaluation metrics derived by the ensemble learning 

method has shown an improvement of 20.23% in the F1 – 

score for the PUL approach over the random approach. 

Hence, the proposed PUL approach outperforms the 

frequently used random approach and it enables predicting 

reliable repositioning candidates. 

It should be noted that since we have identified 1,916 

known positives [3] and 3,115 negatives by clustering, there 

are 179,004 remaining unlabelled drug combinations for 

predictions (See Fig. 3). We employed the proposed PUL-

based three classification models as base predictors of the 

ensemble learning methodology to classify the unlabelled 

samples. Averaging ensemble learning technique was 

employed. Thereby we could infer 128 drug combinations 

with the highest posterior probabilities greater than 0.99. We 

infer this set of 128 drug combinations as potential 

candidates for drug repositioning. (See Supplementary File 

S4) 

Furthermore, we have employed the proposed PUL 

approach using the three classification models to classify the 

1,919 remaining negatives identified by clustering (not used 

to train the classification models; see Fig. 3). We assessed 

the predicted probabilities greater than 0.5 for class 0 

(negative class) for those 1919 drug pairs. We observed 

91.40%, 95.73%, and 98.59% accuracy of being predicted as 

a negative drug combination using SVM, SGD-Classifier, 

and DNN classifier, respectively. Similarly, we have 

observed that accuracy is 98.44% when the ensemble 

averaging technique is applied. Moreover, it is relatively 

higher than that of the SVM and SGD-classifiers. These 

observations confirm the accuracy of the used negatives, and 

on the other hand, it depicts the high accuracy of the 

prediction models based on the proposed PUL approach. 

Further, it clearly depicts the significance of the ensemble 

learning methodology. 

V. DISCUSSIONS 

Most of the real world data exist as positive and unlabelled 

samples. It is the same in pharmaceutical domain. Several 

drug combination repositioning studies have used binary 

classification based approaches to build novel drug 

repositioning models. Since there exist only labelled 

positives and no labelled negatives, researchers use different 

approaches to define their own negative samples. However, 

 Random PUL 

Accuracy 0.6950 0.8807 

Precision 0.7346 0.9261 

Recall 0.6283 0.8339 

F1 - score 0.6741 0.8764 



81                Yashodha Ruchini Maralanda, Pathima Nusrath Hameed 

 

International Journal on Advances in ICT for Emerging Regions            December 2022 

directly taking unlabelled samples as negative data might not 

provide accurate results since unlabelled data may contain 

unidentified positive samples within it. This will cause the 

model to provide wrong predictions. The problem of not 

having an exact method for identifying the most probable set 

of negative samples from drug combination related 

unlabelled data is yet not experimented. So, in this study, that 

gap is being addressed.  

We have used balanced samples of positives and negatives 

for both random and PUL approaches to train the three 

classification models because a balanced sample ratio 

reduces the bias of the model predictions [4]. Since we 

observed a significant improvement when the PUL approach 

is used, it is employed to infer plausible drug combinations. 

We have predicted the probability of each drug combination 

to have a positive or a negative class label by using the 

averaging ensemble learning technique and thereby the label 

of the highest probability was assigned to the drug 

combination. Carrying out further experiments is essential to 

validate the effectiveness of the predicted 128 drug 

combinations (see Supplementary File S4) so that some drug 

combinations out of the above prediction can be 

experimentally proved as repositionable drug combinations.  

One limitation involved with our approach is that, it only 

involves one clustering technique to cluster the drug 

combinations. Another limitation with this study is that we 

haven’t kept any bench mark dataset to verify the model 

performances so that, we would have verified our results and 

findings.  Furthermore, as a future directive, we will involve 

side effects associated with the drugs, so that we can filter 

out the drug combinations, which are free of harmful side 

effects and it will further improve the reliability and accuracy 

of the predictions. However, in the current experiment, we 

did not take side effects associated with the drugs into 

consideration. 

A. Literature-based evidence for predicted drug 
combinations 

Out of the 128 predicted candidates, we found literature-

based evidence to support that five drug combinations as 

already experimentally proven as co-administered drugs. 

The non-steroidal anti-inflammatory drug, Tenoxicam was 

experimentally identified by Moser et al. [32] as a treatment 

for chronic painful inflammatory conditions that occur with 

degenerative and extra-articular rheumatic diseases of 

musculo-skeletal system. This was identified to be as 

effective as Piroxicam. Similarly, the ratio of the compounds, 

Nortriptyline to Amitriptyline in the plasma of patients who 

were treated with Amitriptyline is identified to be useful in 

treating patients with depression [33]. Terazosin and 

Doxazosin is a drug combination that was predicted by our 

ensemble methodology and they have shown experimental 

efficacy in treatment to symptomatic benign prostatic 

hyperplasia in normotensive men [34]. Ofloxacin and 

Norfloxacin is a drug combination that is belonging to 

Fluoroquinolones family and able to be used as antibacterial 

agents. Murillo et al. [35] has tested the resolution of this 

drug combination as a binary mixture. Diltiazem and 

Betaxolol is another drug combination that has been 

predicted as effective in our study. Koh et al. [36] has 

experimentally proven that Diltiazem and Betaxolol both are 

effective in controlling ventricular rate in chronic atrial 

fibrillation when combined with digoxin. 

VI. CONCLUSION 

Drug combination repositioning is an emerging research 

focus that gained attention of pharmaceutical and 

computational researchers. Moreover, computational-based 

approaches have showed a significant contribution for the 

development and improvement of drug repositioning. Since 

the number of known drug combinations is significantly low 

with compared to the number of possible drug combinations, 

we proposed a Positive Unlabelled Learning based ensemble 

learning approach to infer reliable plausible drug 

combinations as repositioning candidates. The ensemble 

learning approach enables aggregating the classification 

results of SVM, SGD-Classifier and DNN classification 

model to minimize the variance of the final predictions. 

Further, we have shown the applicability of proposed PUL 

approach in predicting drug repositioning candidates. The 

literature-based evidence shows the clinical significance of 

the proposed approach. 

REFERENCES 

 

[1] Wouters O. J., McKee M., and Luyten J. (2020). Estimated Research 
and Development Investment Needed to Bring a New Medicine to 

Market, JAMA - Journal of the American Medical Association, 

323(9), 844–853 
[2] DeVita V. T. & Schein, P. S. (1973). The use of drugs in combination 

for the treatment of cancer: rationale and results. The New England 

journal of medicine, 288(19), 998–1006.  
[3] Wishart D. et al. (2018). DrugBank 5.0: a major update to the 

DrugBank database for 2018. Nucleic acids research, 46(D1), 

D1074–D1082.  
[4] Li J., Tong X. Y., Zhu L. D., Zhang H. Y. (2020). A Machine 

Learning Method for Drug Combination Prediction. Frontiers in 
genetics, 11, 1000.  

[5] Chen L., Li B. Q., Zheng M. Y., Zhang J., Feng K. Y., Cai Y. D. 
(2013). Prediction of effective drug combinations by chemical 
interaction, protein interaction and target enrichment of KEGG 

pathways. BioMed research international, 2013, 723780. 

[6] Sun Y., Xiong Y., Xu Q., Wei D. (2014). A Hadoop-based method to 
predict potential effective drug combination. BioMed research 

international, 2014, 196858.  

[7] Liu Y., Hu B., Fu C., Chen X. (2010). DCDB: Drug combination 
database, Bioinformatics (Oxford, England), 26(4), 587–588.  

[8] Janizek J., Celik S., Lee S. (2018). Explainable machine learning 
prediction of synergistic drug combinations for precision cancer 
medicine. bioRxiv.  

[9] Huang H., Zhang P., Qu A., Sanseau, P., Yang, L. (2014). Systematic 
prediction of drug combinations based on clinical side-effects. 
Scientific reports, 4.  

[10] Chen X., Ren B., Chen M., Wang Q., Zhang L., Yan, G. (2016). 
NLLSS: Predicting Synergistic Drug Combinations Based on Semi-
supervised Learning. PLoS computational biology, 12(7), 1-23.  

[11] LOEWE S. (1953). The problem of synergism and antagonism of 
combined drugs. Arzneimittel-Forschung, 3(6), 285–290. 

[12] KalantarMotamedi Y., Eastman R.T., Guha R., Bender A. (2018). A 
systematic and prospectively validated approach for identifying 

synergistic drug combinations against malaria. Malaria Journal, 
17(1), 1-15.  

[13] Li P et al. (2015). Large-scale exploration and analysis of drug 
combinations. Bioinformatics (Oxford, England), 31(12), 2007–2016.  

[14] Shi J. Y. et al. (2018). TMFUF: A triple matrix factorization-based 
unified framework for predicting comprehensive drug-drug 

interactions of new drugs. BMC Bioinformatics, 19 (14) 
[15] Kuru H. I., Tastan O., Cicek E. (2021). MatchMaker: A Deep 

Learning Framework for Drug Synergy Prediction. IEEE/ACM 

transactions on computational biology and bioinformatics.  
[16] Preuer K., Lewis R., Hochreiter S., Bender A., Bulusu K. C., 

Klambauer G. (2018). DeepSynergy: predicting anti-cancer drug 

synergy with Deep Learning. Bioinformatics (Oxford, 
England), 34(9), 1538–1546.  

[17] Lee G., Park C., and Ahn J. (2019). Novel deep learning model for 
more accurate prediction of drug-drug interaction effects. BMC 
Bioinformatics, 20 (1), 1–8. 



Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 82 

December 2022                        International Journal on Advances in ICT for Emerging Regions 

[18] Li Z. et al. (2020). Identification of Drug-Disease Associations Using 
Information of Molecular Structures and Clinical Symptoms via 
Deep Convolutional Neural Network. Frontiers in Chemistry, 7. 

[19] Peng B. and Ning X. (2019). Deep learning for high-order drug-drug 
interaction prediction. ACM-BCB 2019 - Proceedings of the 10th 
ACM International Conference on Bioinformatics, Computational 

Biology and Health Informatics, 197–206. 

[20] Lee G., Park C., and Ahn J. (2019). Novel deep learning model for 
more accurate prediction of drug-drug interaction effects. BMC 

Bioinformatics, 20, (1), 1–8. 

[21] Zhang W., Chen Y., Liu F., Luo F., Tian  G., and Li X. (2017). 
Predicting potential drug-drug interactions by integrating chemical, 

biological, phenotypic and network data, BMC Bioinformatics, 18 (1), 

1–12. 
[22] Sellamanickam S., Garg P., and Selvaraj S. K. (2011). A pairwise 

ranking based approach to learning with positive and unlabeled 

examples. International Conference on Information and Knowledge 
Management, Proceedings, 663–672. 

[23] Liu Y. et al. (2017). Computational drug discovery with dyadic 
positive-unlabeled learning. Proceedings of the 17th SIAM 
International Conference on Data Mining, SDM, 45–53. 

[24] Zhao J., Liang X., Wang Y., Xu Z., and Liu Y. (2016). Protein 
complexes prediction via positive and unlabeled learning of the PPI 
networks, 13th International Conference on Service Systems and 

Service Management, ICSSSM. 

[25] Nguyen, Q.H., Ly, H., Ho, L.S., Al‐Ansari, N., Le, H.V., Tran, V.Q., 
Prakash, I., & Pham, B.T. (2021). Influence of Data Splitting on 

Performance of Machine Learning Models in Prediction of Shear 

Strength of Soil. Mathematical Problems in Engineering, 2021, 1-15. 
[26] Pedregosa F. et al. (2011). Scikit-learn: Machine Learning in Python. 

The Journal of Machine Learning Research. 12: 2825–2830.  

[27] Kohonen T. (1990). The self-organizing map. Proceedings of the 
IEEE. 78(9), 1464-1480.  

[28] Aliper A., Plis S., Artemov A., Ulloa A., Mamoshina P., 
Zhavoronkov A. (2016). Deep Learning Applications for Predicting 
Pharmacological Properties of Drugs and Drug Repurposing Using 

Transcriptomic Data. Molecular pharmaceutics, 13(7), 2524–2530.  

[29] Kiviluoto K. (1996). Topology preservation in self-organizing maps. 
Proceedings of IEEE International Conference on Neural Networks 
(ICNN'96). 1, 294-299.  

[30] Pölzlbauer G. (2004). Survey and Comparison of Quality Measures 
for Self-Organizing Maps. Proceedings of the Fifth Workshop on 
Data Analysis (WDA'04), 67—82. 

[31] Cortes C., Vapnik V. (1995). Support-vector networks. Machine 
Learning. 20, 273–297.  

[32] Moser, U., Waldburger, H., Schwarz, H. A., & Gobelet, C. A. (1989). 
A double-blind randomised multicentre study with tenoxicam, 

piroxicam and diclofenac sodium retard in the treatment of ambulant 
patients with osteoarthritis and extra-articular rheumatism. 

Scandinavian Journal of Rheumatology, 18(S80), 71–80.  

[33] Jungkunz, G., Kuß, H., & Nortriptylin-arnitriptylin-, Z. (1980). On 
the Relationship of Nortriptyline : Amitriptyline Ratio to Clinical 

Improvement of Amitriptyline Treated Depressive Patients. 

Pharmakopsychiatrie, Neuro-Psychopharmakologie. 13, 111–116. 
[34] Kaplan, S. A., Soldo, K. A., & Olsson, C. A. (1995). Terazosin and 

doxazosin in normotensive men with symptomatic prostatism: A pilot 

study to determine the effect of dosing regimen on efficacy and safety. 
European Urology. 28(3), 223–228.  

[35] Murillo J. A., Alañón M. A., Muñoz De La P. A., Durán M.I., & 
Jiménez G. A. (2007). Resolution of ofloxacin-ciprofloxacin and 
ofloxacin-norfloxacin binary mixtures by flow-injection 

chemiluminescence in combination with partial least squares 

multivariate calibration. Journal of Fluorescence. 17(5), 481–491.  
[36] Koh, K. K., Song, J. H., Kwon, K. S., Park, H. B., Baik, S. H., Park, 

Y. S., In, H. H., Moon, T. H., Park, G. S., Cho, S. K., & Kim, S. S. 

(1995). Comparative study of efficacy and safety of low-dose 
diltiazem or betaxolol in combination with digoxin to control 

ventricular rate in chronic atrial fibrillation: randomized crossover 

study. International Journal of Cardiology. 52(2), 167–174.