IEEE Paper Template in A4 (V1)

International Journal on Advances in ICT for Emerging Regions 2022 15 (3):

December 2022 International Journal on Advances in ICT for Emerging Regions

Improving Drug Combination Repositioning using

Positive Unlabelled Learning and Ensemble

Learning
Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

Abstract— Drug repositioning is a cost-effective and time-

effective concept that enables the use of existing drugs/drug

combinations for therapeutic effects. The number of drug

combinations used for therapeutic effects is smaller than all

possible drug combinations in the present drug databases.

These databases consist of a smaller set of labelled positives and

a majority of unlabelled drug combinations. Therefore, there is

a need for determining both reliable positive and reliable

negative samples to develop binary classification models. Since,

we only have labelled positives, the unlabelled data has to be

separated into positives and negatives by a reliable technique.

This study proposes and demonstrates the significance of using

Positive Unlabelled Learning, for determining reliable positive

and negative drug combinations for drug repositioning. In the

proposed approach, the dataset with known positives and

unlabelled samples was clustered by a Deep Learning based Self

Organizing Map. Then, an ensemble learning methodology was

followed by employing three classification models. The

proposed PUL model was compared with the frequently used

approach that randomly selects negative drug pairs from

unlabelled samples. A significant improvement of 19.15%,

20.56% and 20.23% in the Precision, Recall and F-measure,

respectively, was observed for the proposed PUL-based

ensemble learning approach. Moreover, 128 drug repositioning

candidates were predicted by the proposed methodology.

Further, we found literature-based evidence to support five

drug combinations that may be able to be repositioned. These

discoveries show our proposed PUL approach as a promising

strategy that is applicable in drug combination prediction for

repositioning.

Keywords— Drug repositioning, Positive Unlabelled Learning

(PUL), Deep Learning, Self-Organizing Maps (SOM), Support

Vector Machine (SVM)

I. INTRODUCTION

ntroducing a new drug to the market is time consuming

and costly. Nearly it takes seven to fifteen years to

introduce a new drug to the market and approximately

around $700-$1000 million cost for the whole process since

it requires to undergo a massive experimental procedure

before going to the hand of patients. [1]. Therefore, most of

the pharmaceutical companies and medical research

institutes are trying to find alternatives, which can be used to

prevent and cure human diseases. As one of the most

efficient and trust worthy approaches, repurposing or the

reuse of existing drugs as treatments to some other diseases

that still do not have proper treatments is an emerging topic

Correspondence: Yashodha Ruchini Maralanda (E-mail:

yashodar95@gmail.com) Received: 24-08-2021

Revised:04-01-2023 Accepted: 11-01-2023

Yashodha Ruchini Maralanda, Pathima Nusrath Hameedis from Department

of Computer Science, University of Ruhuna, Sri Lanka
(yashodar95@gmail.com, nusrath@dcs.ruh.ac.lk ).

DOI: http://doi.org/10.4038/icter.v15i3.7232

from the last decade. This concept is known as drug

repositioning or drug repurposing.

Moreover, drug combinational treatments are identified to be

much efficient in avoiding drug resistance at treating

complex diseases like cancer [2]. Since there exist

approximately 16,000 [3] of approved drugs in the market,

an extremely large number of drug combinations can be

formed. However, only a very small number out of them are

confirmed with experimental researches. Therefore, there is

a need of an accurate and more predictive approach to infer

useful drug combinations out of those millions of possible

drug combinations, which remains yet unlabelled.

Existing drug combination repositioning approaches have

followed binary classifications [4]–[6] as well as several

other approaches such as tree based techniques [8] for

repositioning of drug combination data. In the existing

binary classification approaches, the unlabelled samples

were considered as negatives [4]–[6]. Therefore, the results

of existing studies might be unreliable, inaccurate and may

cause to the loss of valuable and repositionable drug

combinations.

In this study, a Positive Unlabelled Learning (PUL) based

approach was proposed to address this problem. It uses a

deep learning based unsupervised clustering approach

followed by binary classification enabling us to select

reliable negatives for binary classification. Unsupervised

clustering method was based on a Deep Learning model

using identified drug-drug similarities. The clusters with

least significant known drug combinations were considered

as the clusters with negatives. Our model has been compared

using the frequently used binary classification approach that

randomly selects samples as negatives from unlabelled data.

Thereby, model predictions were evaluated and the

significance of the PUL approach has been highlighted. To

the best of our knowledge, this is the first attempt focusing

on learning from positive unlabelled data for drug

combination repositioning using drug-based features.

In Section II, an overview of existing literature under the

domain, and their limitations are emphasized stating the need

and the importance of our work. In Methodology and

Materials Section, our dataset and our research workflow are

explained in detail. Then, in the Results Section, we have

illustrated our results that are relevant to the PUL-based

ensemble learning methodology and the final predictions.

Next, under Discussions Section, we have emphasized the

significance of the proposed PUL approach, future work

capabilities and literature based evidences about some of the

predicted results. Finally, Section VI provides the

concluding remarks of this study.

II. RELATED WORK

Drug repositioning via in-silico methods has become

popular and there exist many successful efforts in this

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted

use, distribution, and reproduction in any medium, provided the original author and source are credited

mailto:yashodar95@gmail.com
mailto:yashodar95@gmail.com
mailto:nusrath@dcs.ruh.ac.lk
http://doi.org/10.4038/icter.v15i3.7232
http://doi.org/10.4038/icter.v15i3.7232
https://creativecommons.org/licenses/by/4.0/legalcode
https://creativecommons.org/licenses/by/4.0/legalcode

73 Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

International Journal on Advances in ICT for Emerging Regions December 2022

domain. Majority of them are single drug repositioning

approaches while a considerable number have focused on

repositioning of drug combinations as novel therapeutics to

diseases. Moreover, Machine Learning techniques such as

Support Vector Machine (SVM), Naïve Bayes, Logistic

Regression, and Random Forest as well as Deep learning

techniques including Deep Neural Networks, Convolutional

Neural Networks and Deep Feed Forward Networks were

employed.

Use of unlabelled data under binary classification,

involves different methods. One common paradigm is the

random selection of samples from unlabelled data as labelled

negatives. Li et al. [4]'s study was for repositioning of drug

combinations and it was a binary classification problem.

Their dataset was composed of a majority of unlabelled data

and a comparably smaller set of labelled positive samples.

However, they have taken all the unlabelled samples as

negative rather than selecting the plausible positives and

negatives in unlabelled data and have further checked for

overfitting by varying the positive and negative sample ratio

from 1:12. Finally, they have chosen 1:1 as the most

appropriate ratio since it has produced the best result. Even

though their study was involved with an ensemble learning

methodology, the reliability of the predictions might not be

satisfiable because of the random selection of the negative

sample.

Similarly, Chen et al. [5] has used this approach of random

negative selection from a set of drug combinations, which do

not have a proper labelling. They have carried out a binary

classification of the selected labelled positives and randomly

sampled negatives via Random Forest based on chemical

interactions between drugs, protein interactions between

drug targets and the target enrichment of KEGG pathways.

Furthermore, map reduce programming model was used

together with SVM and Naïve Bayes classifiers to identify

novel drug combinations by Sun et al. [6]. Their negative

dataset was composed of randomly paired drugs, which were

belonging to the 103 single drugs that have been selected

from DCDB [7].

TreeCombo [8] is another work, which has used a tree

based approach to predict drug combinations with the use of

physical and chemical properties of drugs together with gene

expression levels of cell lines. Use of clinical side-effects to

predict drug combinations has been tested by Huang et al.

[9]. They have applied Logistic Regression and predicted

drug combinations based on their clinical side effects. Here,

they have categorized drug combinations as safe and unsafe

by using three key side effects that were identified as more

contributing towards model performance. NLLSS [10] was

another approach that has integrated known synergistic drug

combinations, unlabelled combinations, drug-target

interactions and drug chemical structures to predict

synergistic drug combinations. Moreover, they have

followed a different method by involving Loewe Score [11]

for drug combinations and they have classified data into

principal drugs and adjuvant drugs based on a set of rules.

Kalantarmotamedi et al. [12] applied a Random Forest

approach with Transcriptional Drug Repositioning in order

to identify synergistic drug combinations against Malaria. Li

et al. [13] has implemented PEA; an algorithm to model drug

combinations using a Bayesian network which was also

integrated with a similarity algorithm.

Shi et al. [14] have used Matrix Factorization to predict

potential Drug-Drug Interactions (DDIs) between two drugs

as well as between a set of drugs by using side effect

information of known drugs. Moreover, they have

introduced the ability of predicting the interaction between

new drugs with another new drug that has no yet approved

interactions.

Apart from machine learning, recently, Deep Learning has

grabbed more interest in the domain of drug combination

repositioning. Several studies have been carried out in order

to predict novel drug candidate pairs. MatchMaker [15] is a

Supervised Learning framework implemented based on a

Deep Neural Network to predict drug synergy scores referred

to as Loewe score. Chemical structures and untreated cell

line gene expression profiles of drugs were utilized with

three separate sub-networks where two of them are parallel

executions for separate drugs in a pair and the third sub-

network is for the whole drug pair.

DeepSynergy [16] for predicting anti-cancer drug synergy

is based on a Feed Forward Neural Network which takes

three inputs including chemical descriptors from two drugs

and the genomic information of the cell line. The output from

the network was the synergy score for the given input drugs.

These synergy scores then decided whether the drug

combination is positive or negative.

Lee et al. [17] have used Deep Feed-Forward Networks to

predict DDI effects based on a set of drug features; structural

similarity profiles, gene ontology term similarity profiles and

target gene similarity profiles. In order to perform feature

reductions, they have used Autoencoders, which was proven

to have improved performances rates than Principle

Component Analysis (PCA). Reduced profile pairs were

then concatenated and fed to the network. RMSprop and

Adam were used as optimizers with the Autoencoder and

Deep Feed Forward Network respectively. Autoencoders

were trained twice in order to predict DDI types more

accurately. Li et al. [18] have presented a novel

Convolutional Neural Network based model which is

capable of predicting indications for new drugs by

identifying the relevant lead compounds using the drug

molecular structure information and disease symptom

information. Under this, they have constructed similarity

matrices out of the above two vectors of information and they

were mapped into one grey scale image. Finally, this was

used as the input to a Convolutional Neural Network model

and that was executed using MATLAB software. Here, they

have used stochastic gradient descent as an optimizer.

Peng et al. [19] has performed a prediction of drug-drug

interactions using a deep learning model. They have taken

true positive and true negative drug combinations from the

dataset under first approach and true positive and sampled

negative drug combinations in their second approach. Lee et

al. [20] has involved drug pairs but it is not an approach for

prediction of drug combinations as repositioning candidates.

They have used Deep feed-forward networks to predict drug-

drug interaction effects based on a set of drug features.

Zhang et al. [21] have implemented an ensemble model

for DDI predictions. They have followed a semi-supervised

learning approach because they wanted to identify

unobserved DDIs, which might be available among other

possible drug pairs. This is similar to identifying possible

positive samples out of an unlabelled sample.

Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 74

December 2022 International Journal on Advances in ICT for Emerging Regions

PUL has become an emerging topic since most of the

natural data exist as positive and unlabelled data samples

rather than having already defined positive and negative

samples. There are several researches carried out under

learning from PU data.

Sellamanickam et al. [22] have proposed a ranking based

SVM model (RSVM) where the positive samples obtain

higher scores than the unlabelled samples. A threshold

parameter was estimated to form their final classifier. Liu et

al [23] have followed a similar approach. They have

introduced a novel computational framework for drug-drug

interaction prediction with Dyadic PUL. They have

identified the lack of a reliable method for separation of their

unlabelled data into positives and negatives. Therefore, they

have introduced a scoring function and assigned a certain

score to each data pair. According to the assigned scores they

have separated data into positive and negative by making the

top scoring data pairs as positives while keeping the lower

scoring data pairs as negatives. The top scoring data pairs

were defined as the samples, which obtain a higher score

than the average score of the unlabelled data pairs.

Further, Zhao et al. [24] have proposed a method for

protein complex mining by employing SVM with the use of

PUL. They have introduced an efficient sub graph searching

method that can search complex sub graphs. First, they have

tried to express the traditional training dataset with positive

and negative samples as a non-traditional training set with

positive and unlabelled samples. Then they have tried to

identify the relationship between the two classifiers that were

trained with those two types of training samples.

Even though, there are studies that have used PUL, to the

best of our knowledge, no study was identified that has

specifically focused on learning from positive unlabelled

data for drug combination repositioning domain according to

drug based features. Since, drug combination repositioning

is one of the interesting and hot topics, and similarly PUL is

an emerging field, we can identify the need of a PUL study

with related to drug combination repositioning. The primary

objective of our study is to introduce a new reliable

computational method for PUL-based drug combination

repositioning.

III. MATERIALS AND METHODOLOGY

A. Dataset

In order to demonstrate the effectiveness of the proposed

approach, 183,315 drug combinations from 606 drugs that

was collected from Li et al [4]’s study were used. Drug

Target Similarity, Drug Indication Similarity, Drug Structure

Fig. 1 Workflow of the Random and PUL approaches

75 Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

International Journal on Advances in ICT for Emerging Regions December 2022

Similarity, Drug Expression Similarity and Drug Module

Similarity of the above drug combinations were also

collected from Li et al [4]'s study. They consist of Jaccard

coefficient to represent the above similarities between drug

pairs. Using them, we constructed a drug combination

similarity matrix with the corresponding five feature

similarity scores and the file was (183,315, 5) dimensions

large. Li et al.[4]’s study was composed of 1,196 labelled

positive drug combinations for the 606 drugs that we are

interested. After separation of labelled positives, there were

182,119 drug combinations in the unlabelled dataset

(Supplementary Files S1 and S2).

B. Proposed Methodology

The concept of learning from positive and unlabelled data

is a setting where we have only that majority of unlabelled

data and a set of already labelled positive data. Even though

it is yet unlabelled, this set of unlabelled data may also

contain both positive and negative samples. With this PUL

technique, we are trying to identify them separately. The

concept of PUL has drawn the attention of researchers due to

its ability of providing reliable solutions. With the surge of

this technique, it has diminished the need of having fully

supervised data for computational model driven research

work. With this PUL concept, it has enabled the involvement

of unlabelled data for computational model driven learning

processes. Many applications and research work have

utilized this concept.

Unlabelled drug combinations may compose of plausible

negative samples as well as repositionable drug

combinations. Therefore, there is a need of a proper

mechanism to identify the most probable set of negative

samples to develop a reliable classification model. We have

introduced a novel PUL approach for drug combination

repositioning. Our proposed method enables learning from

positives and unlabelled drug combinations in order to

identify plausible negatives as well as plausible positives

within majority of unlabelled data. We proposed PUL using

a deep learning and ensemble learning methodology to

predict reliable drug combinations for repositioning.

Here, we have used two approaches, which can be used to

determine negative drug combinations from the unlabelled

dataset. Firstly, the frequently used random selection of

negatives from unlabelled data and secondly, the proposed

PUL using deep learning and ensemble learning. Fig. 1 and

Fig. 2 illustrates the complete workflow based on the two

approaches. We demonstrate a comparison of the

performance of both approaches employing Receiver

Operating Curve, Precision-Recall Curve, accuracy,

precision, recall and F-measure. Hence, the significance of

the PUL approach for drug combination based drug

repositioning is emphasized. Furthermore, we have

identified a set of plausible positive drug combinations that

can be repositioned for new/rare diseases. Repositioning of

these predicted drug combinations need further research with

laboratory experiments and other background analysis with

expertise knowledge. Therefore, it needs to be carried on as

a separate experiment which becomes the second phase of

our research.

C. Random Approach

In this approach, a randomly selected sample of

unlabelled drug combinations, which is equal in size to that

of the labelled positive sample was employed. Our labelled

Fig. 2 Workflow of the ensemble methodology for inferring drug reositioning candidates

Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 76

December 2022 International Journal on Advances in ICT for Emerging Regions

positive sample was composed of 1,196 drug combinations.

Hence, we have taken a random sample of 1,196 unlabelled

drug combinations as negatives. As this was a binary

classification, class labels were assigned as 1 and 0, where 1

for positive and 0 for negative classes respectively.

Classification was carried out using the three classifiers;

SVM, Stochastic Gradient Descent-based Classifier (SGD-

Classifier) and the Deep Neural Network (DNN) classifier.

According to Nguyen et al. [25], we have identified that a

train-test split of 70:30 is much effective with random

sampling. Therefore, we decided to use the same split for

both approaches. Out of the positive and negative datasets,

30% was used for model testing while the remaining 70%

was taken for training the model. Implementation was

carried out using python with scikit-learn library [26] for

SVM and SGD-Classifier and the Keras library for the deep

neural network. The accuracy, precision, recall and F1-

scores were then recorded.

D. Positive Unlabelled Learning (PUL) Approach

Labelled positive sample was the same as in random

approach, but selection of the negative sample was carried

out by learning from positive and unlabelled drug

combination data. A Self-Organizing Map (SOM) was used

to cluster the sample with positive and unlabelled data and

then the clusters were analysed to identify plausible negative

samples from unlabelled data.

For each cluster, probability of having labelled positive

samples was calculated according to the Positive Probability.

(Positive probabilities for each cluster are provided in

Supplementary File S3). We defined the Positive Probability

to be the ratio between Known drug combinations in cluster

i and Total number of combinations in cluster i where i is the

cluster ID.

Since there are 1,196 known positives, we need 1,196

reliable negatives to train the binary classifier. Therefore, we

sorted each cluster based on its calculated positive

probability value. The unlabelled drug pairs in the clusters

with the lowest positive probability are considered as reliable

negatives. Therefore, we aggregated the clusters with lower

positive probability until we observe a sample size greater

than or equal to 1,196. Accordingly, three clusters with the

least positive probabilities were combined to get the set of

least significant drug combinations. Thereby we observed

3,115 negatives by aggregating the clusters where the

positive probability is less than or equal to 0.000962. Since

we required balanced positive and negative samples, we

randomly selected 1,196 negatives from the above-identified

3,115 negatives.

After selection of a negative sample via PUL, labelled

positive and the negative sample were classified using the

SVM, SGD-Classifier and the DNN model. Since, we

needed to compare the performance of random and the PUL

approach, we kept the model parameters fixed to the ones

that were used in random approach. Similarly, 30% of data

Fig. 3 Venn diagram to denote the distribution of the data

Fig. 1.

TABLE 1

PERFORMANCE ASSESSMENT OF THE PROPOSED POSITIVE UNLABELLED LEARNING APPROACH AND RANDOM APPROACH

SVM SGD-Classifier DNN Classifier

Random PUL Random PUL Random PUL

Accuracy 0.6421 0.7925 0.7103 0.8774 0.7326 0.9721

Precision 0.6799 0.8413 0.8036 0.9564 0.7203 0.9806

Recall 0.5628 0.7454 0.5893 0.7917 0.7328 0.9646

F1 - score 0.6158 0.7904 0.6800 0.8663 0.7265 0.9725

77 Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

International Journal on Advances in ICT for Emerging Regions December 2022

taken as the testing set while remaining 70% was taken for

model training. Then, accuracy, precision, recall and the F1-

score given by the model were recorded.

E. Ensemble Learning Methodology

Figure 2 illustrates the ensemble learning approach used

in this study. In order to predict drug repositioning

candidates from unlabelled drug combinations, averaging

ensemble learning technique was used. First, class

probabilities for the unlabelled combinations were predicted

using the three individual models separately. Then the

separate probabilities obtained for each drug combination to

be belonged to class 0 (negative class) or class 1 (positive

class) were averaged and predicted a novel probability for

each drug combination. The new class probabilities were the

ensemble learning based class predictions. We then

predicted the best candidate drug combinations.

F. Clustering and Classification Models

1) Self-Organizing Maps (SOM): SOM [27] is an
Artificial Neural Network, which is widely used under

unsupervised learning problems. The major difference of

SOM with compared to other neural network models is the

use of competitive learning. SOM has the capabilities of

dimensionality reduction and it has the ability to identify

similarities in data. It is evident that deep learning models

have higher performance, with compared to machine

learning approaches [28]. So, we have decided to cluster our

unlabelled dataset using a minimalistic and Numpy based

implementation of SOM known as MiniSom

(https://github.com/JustGlowing/minisom/), which is a

python library and much more adaptive with the

environment where it is being used.

A two-dimensional SOM of size 9x9 was chosen as the

optimal size with a learning rate of 0.09 which is trained for

Fig. 4 Plot of Quantization Error and Topogrphic Error with a fixed learning rate of 0.5 and fixed map size of 7x7

Fig. 5 Plot of Quantization Error and Topogrphic Error with a fixed map size of 9x9 and fixed number of iterations of 8000

https://github.com/JustGlowing/minisom/

Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 78

December 2022 International Journal on Advances in ICT for Emerging Regions

8000 iterations. Selection of the optimal size; learning rate

and the number of iterations were performed after

calculating the quantization and topographic errors by

varying their values appropriately [29], [30].

As the first step of optimal parameter identification, a set

of initial parameters were needed to be determined. Hence,

the learning rate of 0.5, was chosen as the initial learning rate

for our model. Since a large dataset is used in this study, a

considerably larger map size is required. Therefore, the map

size of SOM was decided gradually increasing the

dimensionality from 7x7. Hence, the initial parameters for

learning rate and map size were defined as 0.5 and 7x7

respectively.

Model training was carried out multiple times with

varying number of iterations, fixed learning rate and map

size in order to record the Topographic Error and

Quantization Error based on each experiment. Recorded

error values for number of iterations that has been used in

each experiment were plotted (see Fig. 4). According to the

elbow technique, the experiment with 8000 iterations was

chosen as the optimal value.

Since the optimal number of iterations was identified, our

next experiment was followed to identify the optimal map

size. We fixed the learning rate to 0.5 and number of

iterations to 8000 and performed training of the model

multiple times by gradually increasing the map size at each

experiment. At a map size of 9x9, we could observe a clear

deduction in Topographic and Quantization Error, which

then again shows an increase in error values (see Fig. 5).

Therefore, we determined 9x9 as the optimal map size.

After that, we used the above identified map size and the

number of iterations to determine the optimal learning rate.

We set the number of iterations and map size to 8000 and

9x9, respectively. The training process was performed

multiple times for different learning rates. Finally, an

experiment of the error values corresponding to a learning

rate of 0.09 was determined as the optimal learning rate in

our problem.

2) Support Vector Machine (SVM): SVM [31] is an
algorithm, which always finds a hyperplane in an n-

dimensional space where the number of dimensions is equal

to the number of features used in the dataset. This can be

applied for both binary classification as well as multi-class

classification problems. Since this is an algorithm that has

been widely used because of its higher prediction capabilities,

we have decided to use it as a binary classifier in our work.

The employed SVM model was followed by a sigmoid

kernel, since sigmoid kernel is the most appropriate for

binary classification problems.

Fig. 6 Plot of Quantization Error and Topogrphic Error with a fixed learning rate of 0.5 and fixed number of iterations of 8000

Fig. 7 Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Deep Neural Network classifier for Random

and PUL approache

79 Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

International Journal on Advances in ICT for Emerging Regions December 2022

3) Stochastic Gradient Descent based Classifier
(SGD-Classifier): This is a linear classifier that is

emphasized in Scikit-learn [26], that has been optimized

using Stochastic Gradient Descent (SGD). It supports loss

functions and penalties that are used in classification

purposes. Further, this is capable of minimizing/maximizing

the loss function defined by the model. Here, we have used

the log los function, and with that, our model acts similar to

logistic regression (LR). However, importance of using

SGD-Classifier with log loss apart from direct LR model is

that, even if LR is not capable of directly calculating the

minimum value of its loss function, with the use of SGD-

Classifier we can easily perform it. Therefore, the

performance is comparably better and so that we have used

SGD-Classifier for classification purposes in our work. Even

though both log loss and modified_huber loss for the loss

parameter in SGD-Classifier enables to predict class

probabilities for data, log loss has given the best performance

in our case. Therefore, we employed a SGD-Classifier model

followed by a log loss function.

4) Deep Neural Network (DNN) Classifier: The DNN
model that was implemented using Keras library

(http://github.com/keras-team/keras) was composed of a

fully connected network with three layers. Since, ReLu

activation function shows better performance when referring

to a majority of current researches, it was used in the first

two layers and sigmoid activation function was used in the

output layer since this is a binary classification problem. The

dimensions of the layers were selected as 5, 12, 5 and 1 for

the input layer, two hidden layers and the output layer

respectively such that it gives a better model for the

classification of our dataset. We have set the loss parameter

as binary_crossentrphy as it is specifically designed for

binary classification problems in Keras. Further, we have

employed the Adam optimizer as it is well suited for the

instances where there are large datasets. Since our prediction

dataset is large, we have involved Adam optimizer to

improve the accuracy of predictions.

G. Evaluation Metrics

We have divided our dataset into training and testing sets

in order to validate the implemented model performances. 70%

of the dataset was used for training and 30% was used for

testing. Common validation measures including accuracy,

precision, recall and F1-scores from the random and PUL

approaches were calculated using below equations where,

TP – True Positive, FP – False Positive, TN – True Negative

and FN – False Negative.

Accuracy = (TP + TN) / (TP + TN + FP + FN) (1)

Precision = TP / (TP + FP) (2)

Recall = TP / (TP + FN) (3)

F1 Score = 2 * Precision * Recall / (Precision + Recall) (4)

Furthermore, Receiver Operating Curve (ROC) is an

important measure at binary classification problems, which

Fig. 8 Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Support Vector Machine classifier for Random
and PUL approaches

Fig. 9 Receiver Operating Curve and Precision Recall Curve demonstrating the performance of Stochastic Gradient Discent-based classifier for

Random and PUL approaches

http://github.com/keras-team/keras

Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 80

December 2022 International Journal on Advances in ICT for Emerging Regions

plots false positive rate versus true positive rate. Precision-

Recall (PR) Curve provides more information by plotting the

precision and recall for different thresholds. Therefore, we

have observed the ROC and PR curves for our two

approaches.

IV. RESULTS

In comparison to the random approach for negative

sample selection, our proposed PUL approach demonstrates

a significant improvement in the performance. (See Table 1).

The accuracy, precision, recall, and F1-score for the PUL

approach based on the three classifiers SVM, SGD-Classifier

and the DNN classifier shows higher accuracies than the

values recorded with random approach. For instance, F1-

score has improved by 17.46%, 18.63% and 24.60% for

SVM, SGD-Classifier and the DNN classifier respectively

when the PUL approach is used.

When comparing the performance of three classifiers

based on accuracy, precision, recall and F1-score, DNN

classifier shows relatively higher performance for both

random as well as the PUL approach. (See Table 1) SGD-

Classifier shows the second-best performance while SVM

has relatively lower performance with compared to the other

two classifiers.

A comparison of the ROC and PR curves for random and

the PUL approaches based on the three models also

emphasize the higher skill of the model that was trained

under the PUL approach (See Fig. 7, Fig. 8, and Fig. 9).

The ROC and PR curves are drawn in blue and orange

colours for PUL, random approaches respectively. The x-

axis represents false positive rate. If this rate is closer to zero,

our model predicts only a few false positives. Similarly, the

y-axis shows true positive rate. If this rate is closer to one,

the model predicts a majority of the true positives. Therefore,

an ROC curve that has bowed much towards the (0, 1)

coordinate of the plot is considered to have higher skill

compared to others. The blue coloured ROC plot based on

each classifier has bowed towards the (0, 1) coordinate of the

plot more than the orange coloured plot of random approach.

Hence, the ROC curves emphasize the higher skill of the

models that are trained using PUL approach.

The x-axis of PR curve represents recall. If recall gives a

value that is closer to one, our model predicts only a few false

negatives. Similarly, the y-axis shows precision. If precision

is closer to one, the model predicts only a few false positives.

Therefore, a PR curve that has bowed much towards the (1,

1) coordinate of the plot is considered to have higher skill

compared to others.

TABLE 2

PERFORMANCE ASSESSMENT OF ENSEMBLE LEARNING

The blue coloured plots of PUL approach have bowed

towards (1, 1) coordinate more than the orange coloured

plots of the random approach. This further emphasizes the

higher skill of the models that are trained using proposed

PUL approach.

A further comparison between the three ROC curves

emphasize that DNN classifier gives the highest skilled

model out of the three classifiers. The reason is that the ROC

curve of DNN classifier is bowed the most towards the (0, 1)

coordinate of the plot. The PR curve of DNN classifier is

bowed the most towards (1, 1) coordinate showing the least

number of false negatives and false positives. This further

proves the higher skill of the DNN classifier.

We built the classifiers using SVM, SGD-Classifier and

DNN and then we combined their individual predictions to

obtain the final prediction. This may reduce the variance of

the final outputs. Table 2 summarizes the performance

assessment of the ensemble learning approach where the

performance measures of the three classifiers are averaged.

The evaluation metrics derived by the ensemble learning

method has shown an improvement of 20.23% in the F1 –

score for the PUL approach over the random approach.

Hence, the proposed PUL approach outperforms the

frequently used random approach and it enables predicting

reliable repositioning candidates.

It should be noted that since we have identified 1,916

known positives [3] and 3,115 negatives by clustering, there

are 179,004 remaining unlabelled drug combinations for

predictions (See Fig. 3). We employed the proposed PUL-

based three classification models as base predictors of the

ensemble learning methodology to classify the unlabelled

samples. Averaging ensemble learning technique was

employed. Thereby we could infer 128 drug combinations

with the highest posterior probabilities greater than 0.99. We

infer this set of 128 drug combinations as potential

candidates for drug repositioning. (See Supplementary File

S4)

Furthermore, we have employed the proposed PUL

approach using the three classification models to classify the

1,919 remaining negatives identified by clustering (not used

to train the classification models; see Fig. 3). We assessed

the predicted probabilities greater than 0.5 for class 0

(negative class) for those 1919 drug pairs. We observed

91.40%, 95.73%, and 98.59% accuracy of being predicted as

a negative drug combination using SVM, SGD-Classifier,

and DNN classifier, respectively. Similarly, we have

observed that accuracy is 98.44% when the ensemble

averaging technique is applied. Moreover, it is relatively

higher than that of the SVM and SGD-classifiers. These

observations confirm the accuracy of the used negatives, and

on the other hand, it depicts the high accuracy of the

prediction models based on the proposed PUL approach.

Further, it clearly depicts the significance of the ensemble

learning methodology.

V. DISCUSSIONS

Most of the real world data exist as positive and unlabelled

samples. It is the same in pharmaceutical domain. Several

drug combination repositioning studies have used binary

classification based approaches to build novel drug

repositioning models. Since there exist only labelled

positives and no labelled negatives, researchers use different

approaches to define their own negative samples. However,

Random PUL

Accuracy 0.6950 0.8807

Precision 0.7346 0.9261

Recall 0.6283 0.8339

F1 - score 0.6741 0.8764

81 Yashodha Ruchini Maralanda, Pathima Nusrath Hameed

International Journal on Advances in ICT for Emerging Regions December 2022

directly taking unlabelled samples as negative data might not

provide accurate results since unlabelled data may contain

unidentified positive samples within it. This will cause the

model to provide wrong predictions. The problem of not

having an exact method for identifying the most probable set

of negative samples from drug combination related

unlabelled data is yet not experimented. So, in this study, that

gap is being addressed.

We have used balanced samples of positives and negatives

for both random and PUL approaches to train the three

classification models because a balanced sample ratio

reduces the bias of the model predictions [4]. Since we

observed a significant improvement when the PUL approach

is used, it is employed to infer plausible drug combinations.

We have predicted the probability of each drug combination

to have a positive or a negative class label by using the

averaging ensemble learning technique and thereby the label

of the highest probability was assigned to the drug

combination. Carrying out further experiments is essential to

validate the effectiveness of the predicted 128 drug

combinations (see Supplementary File S4) so that some drug

combinations out of the above prediction can be

experimentally proved as repositionable drug combinations.

One limitation involved with our approach is that, it only

involves one clustering technique to cluster the drug

combinations. Another limitation with this study is that we

haven’t kept any bench mark dataset to verify the model

performances so that, we would have verified our results and

findings. Furthermore, as a future directive, we will involve

side effects associated with the drugs, so that we can filter

out the drug combinations, which are free of harmful side

effects and it will further improve the reliability and accuracy

of the predictions. However, in the current experiment, we

did not take side effects associated with the drugs into

consideration.

A. Literature-based evidence for predicted drug
combinations

Out of the 128 predicted candidates, we found literature-

based evidence to support that five drug combinations as

already experimentally proven as co-administered drugs.

The non-steroidal anti-inflammatory drug, Tenoxicam was

experimentally identified by Moser et al. [32] as a treatment

for chronic painful inflammatory conditions that occur with

degenerative and extra-articular rheumatic diseases of

musculo-skeletal system. This was identified to be as

effective as Piroxicam. Similarly, the ratio of the compounds,

Nortriptyline to Amitriptyline in the plasma of patients who

were treated with Amitriptyline is identified to be useful in

treating patients with depression [33]. Terazosin and

Doxazosin is a drug combination that was predicted by our

ensemble methodology and they have shown experimental

efficacy in treatment to symptomatic benign prostatic

hyperplasia in normotensive men [34]. Ofloxacin and

Norfloxacin is a drug combination that is belonging to

Fluoroquinolones family and able to be used as antibacterial

agents. Murillo et al. [35] has tested the resolution of this

drug combination as a binary mixture. Diltiazem and

Betaxolol is another drug combination that has been

predicted as effective in our study. Koh et al. [36] has

experimentally proven that Diltiazem and Betaxolol both are

effective in controlling ventricular rate in chronic atrial

fibrillation when combined with digoxin.

VI. CONCLUSION

Drug combination repositioning is an emerging research

focus that gained attention of pharmaceutical and

computational researchers. Moreover, computational-based

approaches have showed a significant contribution for the

development and improvement of drug repositioning. Since

the number of known drug combinations is significantly low

with compared to the number of possible drug combinations,

we proposed a Positive Unlabelled Learning based ensemble

learning approach to infer reliable plausible drug

combinations as repositioning candidates. The ensemble

learning approach enables aggregating the classification

results of SVM, SGD-Classifier and DNN classification

model to minimize the variance of the final predictions.

Further, we have shown the applicability of proposed PUL

approach in predicting drug repositioning candidates. The

literature-based evidence shows the clinical significance of

the proposed approach.

REFERENCES

[1] Wouters O. J., McKee M., and Luyten J. (2020). Estimated Research
and Development Investment Needed to Bring a New Medicine to

Market, JAMA - Journal of the American Medical Association,

323(9), 844–853
[2] DeVita V. T. & Schein, P. S. (1973). The use of drugs in combination

for the treatment of cancer: rationale and results. The New England

journal of medicine, 288(19), 998–1006.
[3] Wishart D. et al. (2018). DrugBank 5.0: a major update to the

DrugBank database for 2018. Nucleic acids research, 46(D1),

D1074–D1082.
[4] Li J., Tong X. Y., Zhu L. D., Zhang H. Y. (2020). A Machine

Learning Method for Drug Combination Prediction. Frontiers in
genetics, 11, 1000.

[5] Chen L., Li B. Q., Zheng M. Y., Zhang J., Feng K. Y., Cai Y. D.
(2013). Prediction of effective drug combinations by chemical
interaction, protein interaction and target enrichment of KEGG

pathways. BioMed research international, 2013, 723780.

[6] Sun Y., Xiong Y., Xu Q., Wei D. (2014). A Hadoop-based method to
predict potential effective drug combination. BioMed research

international, 2014, 196858.

[7] Liu Y., Hu B., Fu C., Chen X. (2010). DCDB: Drug combination
database, Bioinformatics (Oxford, England), 26(4), 587–588.

[8] Janizek J., Celik S., Lee S. (2018). Explainable machine learning
prediction of synergistic drug combinations for precision cancer
medicine. bioRxiv.

[9] Huang H., Zhang P., Qu A., Sanseau, P., Yang, L. (2014). Systematic
prediction of drug combinations based on clinical side-effects.
Scientific reports, 4.

[10] Chen X., Ren B., Chen M., Wang Q., Zhang L., Yan, G. (2016).
NLLSS: Predicting Synergistic Drug Combinations Based on Semi-
supervised Learning. PLoS computational biology, 12(7), 1-23.

[11] LOEWE S. (1953). The problem of synergism and antagonism of
combined drugs. Arzneimittel-Forschung, 3(6), 285–290.

[12] KalantarMotamedi Y., Eastman R.T., Guha R., Bender A. (2018). A
systematic and prospectively validated approach for identifying

synergistic drug combinations against malaria. Malaria Journal,
17(1), 1-15.

[13] Li P et al. (2015). Large-scale exploration and analysis of drug
combinations. Bioinformatics (Oxford, England), 31(12), 2007–2016.

[14] Shi J. Y. et al. (2018). TMFUF: A triple matrix factorization-based
unified framework for predicting comprehensive drug-drug

interactions of new drugs. BMC Bioinformatics, 19 (14)
[15] Kuru H. I., Tastan O., Cicek E. (2021). MatchMaker: A Deep

Learning Framework for Drug Synergy Prediction. IEEE/ACM

transactions on computational biology and bioinformatics.
[16] Preuer K., Lewis R., Hochreiter S., Bender A., Bulusu K. C.,

Klambauer G. (2018). DeepSynergy: predicting anti-cancer drug

synergy with Deep Learning. Bioinformatics (Oxford,
England), 34(9), 1538–1546.

[17] Lee G., Park C., and Ahn J. (2019). Novel deep learning model for
more accurate prediction of drug-drug interaction effects. BMC
Bioinformatics, 20 (1), 1–8.

Improving Drug Combination Repositioning using Positive Unlabelled Learning and Ensemble Learning 82

December 2022 International Journal on Advances in ICT for Emerging Regions

[18] Li Z. et al. (2020). Identification of Drug-Disease Associations Using
Information of Molecular Structures and Clinical Symptoms via
Deep Convolutional Neural Network. Frontiers in Chemistry, 7.

[19] Peng B. and Ning X. (2019). Deep learning for high-order drug-drug
interaction prediction. ACM-BCB 2019 - Proceedings of the 10th
ACM International Conference on Bioinformatics, Computational

Biology and Health Informatics, 197–206.

[20] Lee G., Park C., and Ahn J. (2019). Novel deep learning model for
more accurate prediction of drug-drug interaction effects. BMC

Bioinformatics, 20, (1), 1–8.

[21] Zhang W., Chen Y., Liu F., Luo F., Tian G., and Li X. (2017).
Predicting potential drug-drug interactions by integrating chemical,

biological, phenotypic and network data, BMC Bioinformatics, 18 (1),

1–12.
[22] Sellamanickam S., Garg P., and Selvaraj S. K. (2011). A pairwise

ranking based approach to learning with positive and unlabeled

examples. International Conference on Information and Knowledge
Management, Proceedings, 663–672.

[23] Liu Y. et al. (2017). Computational drug discovery with dyadic
positive-unlabeled learning. Proceedings of the 17th SIAM
International Conference on Data Mining, SDM, 45–53.

[24] Zhao J., Liang X., Wang Y., Xu Z., and Liu Y. (2016). Protein
complexes prediction via positive and unlabeled learning of the PPI
networks, 13th International Conference on Service Systems and

Service Management, ICSSSM.

[25] Nguyen, Q.H., Ly, H., Ho, L.S., Al‐Ansari, N., Le, H.V., Tran, V.Q.,
Prakash, I., & Pham, B.T. (2021). Influence of Data Splitting on

Performance of Machine Learning Models in Prediction of Shear

Strength of Soil. Mathematical Problems in Engineering, 2021, 1-15.
[26] Pedregosa F. et al. (2011). Scikit-learn: Machine Learning in Python.

The Journal of Machine Learning Research. 12: 2825–2830.

[27] Kohonen T. (1990). The self-organizing map. Proceedings of the
IEEE. 78(9), 1464-1480.

[28] Aliper A., Plis S., Artemov A., Ulloa A., Mamoshina P.,
Zhavoronkov A. (2016). Deep Learning Applications for Predicting
Pharmacological Properties of Drugs and Drug Repurposing Using

Transcriptomic Data. Molecular pharmaceutics, 13(7), 2524–2530.

[29] Kiviluoto K. (1996). Topology preservation in self-organizing maps.
Proceedings of IEEE International Conference on Neural Networks
(ICNN'96). 1, 294-299.

[30] Pölzlbauer G. (2004). Survey and Comparison of Quality Measures
for Self-Organizing Maps. Proceedings of the Fifth Workshop on
Data Analysis (WDA'04), 67—82.

[31] Cortes C., Vapnik V. (1995). Support-vector networks. Machine
Learning. 20, 273–297.

[32] Moser, U., Waldburger, H., Schwarz, H. A., & Gobelet, C. A. (1989).
A double-blind randomised multicentre study with tenoxicam,

piroxicam and diclofenac sodium retard in the treatment of ambulant
patients with osteoarthritis and extra-articular rheumatism.

Scandinavian Journal of Rheumatology, 18(S80), 71–80.

[33] Jungkunz, G., Kuß, H., & Nortriptylin-arnitriptylin-, Z. (1980). On
the Relationship of Nortriptyline : Amitriptyline Ratio to Clinical

Improvement of Amitriptyline Treated Depressive Patients.

Pharmakopsychiatrie, Neuro-Psychopharmakologie. 13, 111–116.
[34] Kaplan, S. A., Soldo, K. A., & Olsson, C. A. (1995). Terazosin and

doxazosin in normotensive men with symptomatic prostatism: A pilot

study to determine the effect of dosing regimen on efficacy and safety.
European Urology. 28(3), 223–228.

[35] Murillo J. A., Alañón M. A., Muñoz De La P. A., Durán M.I., &
Jiménez G. A. (2007). Resolution of ofloxacin-ciprofloxacin and
ofloxacin-norfloxacin binary mixtures by flow-injection

chemiluminescence in combination with partial least squares

multivariate calibration. Journal of Fluorescence. 17(5), 481–491.
[36] Koh, K. K., Song, J. H., Kwon, K. S., Park, H. B., Baik, S. H., Park,

Y. S., In, H. H., Moon, T. H., Park, G. S., Cho, S. K., & Kim, S. S.

(1995). Comparative study of efficacy and safety of low-dose
diltiazem or betaxolol in combination with digoxin to control

ventricular rate in chronic atrial fibrillation: randomized crossover

study. International Journal of Cardiology. 52(2), 167–174.