Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2073 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers Ali Ghasemy Armaki Mir Feiz Fallah Mahmoud Alborzi Amir Mohammadzadeh Department of Management Qazvin Branch Islamic Azad University Qazvin, Iran alghasemy@yahoo.com Tehran Central Branch Islamic Azad University Tehran, Iran fallahshams@gmail.com Information Technology Management Department Science and Research Branch Islamic Azad University Tehran, Iran Mahmood_alborzi@yahoo.com Department of Management Qazvin Branch Islamic Azad University Qazvin, Iran amn_1378@yahoo.com Abstract—Financial institutions are exposed to credit risk due to issuance of consumer loans. Thus, developing reliable credit scoring systems is very crucial for them. Since, machine learning techniques have demonstrated their applicability and merit, they have been extensively used in credit scoring literature. Recent studies concentrating on hybrid models through merging various machine learning algorithms have revealed compelling results. There are two types of hybridization methods namely traditional and ensemble methods. This study combines both of them and comes up with a hybrid meta-learner model. The structure of the model is based on the traditional hybrid model of ‘classification + clustering’ in which the stacking ensemble method is employed in the classification part. Moreover, this paper compares several versions of the proposed hybrid model by using various combinations of classification and clustering algorithms. Hence, it helps us to identify which hybrid model can achieve the best performance for credit scoring purposes. Using four real-life credit datasets, the experimental results show that the model of (KNN-NN-SVMPSO)-(DL)-(DBSCAN) delivers the highest prediction accuracy and the lowest error rates. Keywords-credit scoring; hybrid machine learning models; stacking; deep learning I. INTRODUCTION Owing to the recent global financial crisis and European sovereign debt crisis, credit risk assessment has turn out to be an increasingly vital issue for banks and credit institutions throughout the world. Also, the sharp competition in financial sector has caused a large decline in banking profit. This leads banks toward more consumer loans to make higher interest profits. However, the expected profitability depends on the quality of consumer loans issued by the banks, which requires a vigilant credit scoring process. It is worthwhile to mention that even 1% enhancement on the accuracy of credit scoring system would significantly increase the profit of banks and other financial institutions [1]. Traditionally, credit decisions were made by human experts based on past experiences, historical performances, and some guidelines specially the classic five C’s of credit: character, capacity, capital, collateral and conditions [2]. But this approach suffers from some drawbacks including inconsistent decisions, repeated incorrect decisions, and high training costs. Therefore, with the quick development in credit industry, various credit scoring techniques are being used for the credit evaluation. The credit scoring models have been developing at a fast pace to distinguish bad credit applicants from good ones through their associated features such as gender, age, education, income, job and marital status or based on their historical credit performance. The advantages of credit scoring models can be enumerated as cost reduction of credit analysis, faster credit decisions, higher rate of credit collections, efficient performance monitoring of the model, mitigating possible risks and changes in economic conditions or policies can simply be integrated into the model [3-5]. Even a minor betterment in the accuracy of credit scoring models may diminish a significant amount of credit risks and generate noteworthy future savings. Due to both the impacts of financial crisis and soaring risk appetite, the number of non-performing loans has sharply intensified along with banks giving more credits to applicants without sufficient assessments. Thus, the use of efficient credit scoring models seems to be inevitable for the banks and other credit institutions. There are several approaches employed by financial institutions over the past decades to model the credit risk which are mainly classified into two groups of statistical and Artificial Intelligence (AI) techniques. Generally, the statistical methods include Logistic Regression (LR) and Linear Discriminant Analysis (LDA). On the other hand, AI approaches mainly comprise of machine learning techniques such as Support Vector Machines (SVM), Artificial Neural Networks (ANN), Decision Trees (DT) and so many other machine learning (classification and clustering) algorithms. There are some pros and cons associated with these methods. For instance, LDA assumes a normal distribution of the variables and a linear relationship between explanatory variables but it is unable to verify fulfillment of these assumptions [4, 5]. LR is used for forecasting on a dataset with binary outcomes. Although, the normality assumption is not required by LR, but linear relationship among variables is a basic assumption for both models. Therefore, some researchers [4-7] are having doubts about predictive performance of these models for credit scoring. In contrast, artificial intelligence techniques recently draw attention from many scholars for coping with credit scoring problems. These techniques are best known for their higher predictive accuracy compared to statistical models and usually do not require abovementioned Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2074 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers assumptions. For example, ANN which simulates the human brain’s mechanism on the computer environment does not need any assumptions and in the field of credit scoring, it performs much better than its classical rivals including LR and LDA [8- 13]. In general, it can be said that AI methods are superior to traditional ones [14-16]. In recent years, many researchers have focused on the development of machine learning techniques for credit scoring applications. One of the methods that they are using to improve the performance machine learning algorithms is the hybridization. These researchers believe that the credit scoring models which are built by combing classification (supervised learning) and clustering (unsupervised learning) techniques have the ability to outperform sole machine learning methods [12, 17-20]. In this study, a new hybrid method is introduced for credit scoring which is based on a combination of traditional hybrid and stacking ensemble methods. The idea comes from the traditional hybrid model of classification plus clustering. This is because clustering is considered as an unsupervised learning method and it cannot differentiate data precisely like supervised methods. Accordingly, a classifier or set of classifiers can be trained first, and then its output is used as the input for the clustering method to enhance the clustering outcomes [21]. In this model, instead of using a single classification algorithm in the first part of the hybrid model, we adopt a stacking ensemble method and in the second part several clustering algorithms will be interchangeably used. Also, this model benefits from a deep learning algorithm as the meta-learner classifier. It is believed that the superior learning capacity of deep learning can improve the predictive accuracy of the new hybrid credit scoring model. This study has chosen various types of classifiers and clusterers to be used in this hybrid model. In the relevant literature, many studies have developed hybrid credit rating models only by choosing single learning algorithms as the baselines (traditional hybrid models) but this study adopts the stacking ensemble method as the baseline of the hybrid model. Moreover, this paper tries to compare several versions of the proposed hybrid model by using various combinations of classification and clustering algorithms. Thus, it helps us to identify which hybrid model can achieve the best prediction accuracy for credit scoring purposes. The structure of the paper is as follows: Section II reviews the literature in terms of different hybrid and ensemble credit scoring models. Section III explains the data and methodology of the study and Section IV presents the experimental results and analysis. Finally, Section V concludes the study and discusses future work opportunities. II. LITERATURE REVIEW In this section, the literature of machine learning in the field of credit scoring will be reviewed. When banks want to grant credit to their customers, they evaluate their credit. Through adopting a good credit scoring system, banks can classify their customers in terms of risk (probability of default), so offer them risk adjusted loans with different interest rates and collateral conditions. Therefore, optimal credit decisions can be made based on the outputs of the credit scoring models. Since the emergence of AI systems like neural networks, genetic algorithm and expert systems, these methods have been increasingly used in financial researches and also implemented by many financial institutions specially banks. Authors in [22] adopt the four different types of traditional hybrid machine learning techniques to identify which method can achieve the best predictive results. They combine different classification and clustering algorithms such as Naïve Bayesian, Decision Tress, Logistic regression, Neural Network, K-means and Expectation Maximization. Then, they apply these hybrid models on a real credit dataset from Taiwan. Comparative results show that the “classification + classification” hybrid model outperforms the other hybrid models. This model utilizes Logistic regression and Neural Network as the first and second classifiers (LR + NN), respectively. They state that these hybrid credit scoring models can help financial institutions make more correct decisions for issuing consumer loans with high confidence in the future. Authors in [23] study the behavior of imbalanced credit scoring datasets by different machine learning methods. Data imbalances take place when the number of defaulting customers in a dataset is typically much lower than the number of non-defaulting ones. They tested various models on five real-world credit datasets. Finally, they showed that when datasets are imbalanced, machine learning methods like decision trees, KNN, linear discriminant analysis (LDA) do no perform well. On the other hand, models such as gradient boosting and random forests have much better predictive performance. Although, decision trees (DT) is one of the most popular algorithms used in machine learning and credit scoring, it suffers from two drawbacks: 1) it’s very sensitive to noise and 2) redundant features may falsify the learning process. Hence, authors in [24] suggest two ensemble methods namely Bagging-RS DT and RS-Bagging DT to deal with these problems. In these models they adopt Random Subspace (RS) and Bootstrap Aggregating (Bagging) strategies. They test these two models on Australian and German credit datasets and results show that these two models perform better than other base models. Author in [25] introduces a new solution for credit scoring problems which is based on a modified version of SVM. He mentions that since most of real credit datasets are pretty big, the use of conventional nonlinear SVMs even with high levels accuracy are computationally suboptimal. Consequently, he proposes a clustered support vector machine (CSVM) to cope with this problem. He concludes that the CSVM, despite gaining similar prediction performance, can stay relatively cheap from computational point of view. In another attempt to create an optimal credit scoring model, authors in [26] have proposed the Ensemble Classification based Supervised Clustering (ECSC) method. The main idea behind this model is that data samples from the identical class might have dissimilar characteristics or patterns. By means of supervised clustering, samples with similar characteristics or patterns are categorized into the same cluster. Hence, the training subsets, formed by mixture of clusters from diverse classes, could well express various patterns of samples, which is beneficial to enhance the variety and accuracy of base classifiers. In this paper they use base classifiers such as logistic regression, decision trees, SVM and also K-means for supervised clustering. They have applied this model along with random subspace bagging (RS-Bagging), Bagging-RS, and dynamic classifier ensemble using classification confidence (DCE-CC) on German and Australian Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2075 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers credit datasets. Results show the ECSC is relatively more accurate than other models. One of the heuristic methods used for credit scoring is fuzzy SVM based on Support Vector Data Description (SVDD) which is introduced in [27]. SVDD is based on the SVM classifier, which looks for a spherical-shaped border around a dataset to identify outliers or unique data. This approach uses SVDD to mitigate the impact of outliers and noisy data in order to improve the Fuzzy SVM learning rate. Authors adopted this model to test against the ordinary linear and nonlinear fuzzy SVM on the Australian and German credit datasets. What can be drawn as a conclusion is that although the best result is obtained by the SVDD-FSVM its superiority is negligible. In a recent study, authors in [28] introduced a model based on the combination of hybrid and ensemble methods. They believe that merging filtering and feature selection methods can perform as an effective pre-processor for machine learning models. For this reason, they have combined Multivariate Adaptive Regression Splines (MARS) and Gabriel Neighborhood Graph editing (GNG) in the hybrid modeling stage. As base classifiers, they have selected decision trees, ANN, random forests, Bayesian network and SVM. They have applied these models on seven real world credit datasets. Results illustrate that the authors’ proposed model relatively improves the predictive performance relative compared with base learners. III. METHODOLOGY This section describes the procedure of developing the credit scoring system introduced by this study. Generally, there are two ways to establish a hybrid machine learning model which are traditional and ensemble methods. The traditional hybridization method offers four different ways to combine machine learning algorithms. These options are (1) merging two classification algorithms, (2) merging one classification algorithm with one clustering algorithm, (3) merging one clustering algorithm with one classification algorithm, and (4) merging two clustering algorithms [22, 29]. On the other hand, ensemble methods offer sophisticated ways of hybridizing machine learning techniques. Employing ensembles is beneficial as they can overcome the three problems of base learning algorithms namely statistical, computational, and representational problems [30]. When the size of a dataset is too small compared with the potential space of hypotheses, a learning algorithm may select to yield a hypothesis from a group having the equal predictive accuracy on the training data. Thus, the statistical problem emerges in such cases if the selected hypothesis is unable to forecast new data. When a learning algorithm is trapped in an incorrect local minimum rather than finding the best hypothesis within the hypotheses space, the computational problem will arise. Lastly, the representational problem occurs when no hypothesis inside the hypotheses space is a good estimate to the correct function [31]. There are several forms of ensembles including bagging, boosting, and stacking. These techniques are frequently used in the literature of machine learning and credit scoring. Findings suggest that ensemble methods usually achieve superior predictive performance compared to other single algorithms or traditional hybrid models [26, 28, 32-36]. Unlike the bagging and boosting which are used in many papers, few researchers have employed the Stacking method. Stacking (stacked generalization) is designed to enhance predictive performance through combining the predictions of several machine learning algorithms [37]. It consists of training a combiner algorithm to amalgamate the predictions of various learning algorithms. First, an ensemble of classifiers (base classifiers) is trained using the available data via bootstrapped sampling (Tier 1 classifiers). Then the output of base classifiers are used as an input to train a meta-classifier (Tier 2 classifier) [36]. In other words, stacking trains a set of classifiers parallelly and then learning is done by a meta-learner. Author in [38] emphasizes that classifiers which are functioning in a collaborative way can significantly outpace those working separately, showing the importance of using such a model. The meta-learner (meta-classifier) in the stacking algorithm generates a vector of weight distribution by assigning a weight to each base classifier that is proportional to their performances [31]. Stacking can be considered as a fully customizable hybrid machine learning system as it hosts various types of base- and meta-classifiers. Also, it has been successfully employed on both supervised and unsupervised learning tasks [39-41]. In a recent study [32] is showed that a hybrid ensemble machine learning system with stacking is superior to other types of ensemble methods. The proposed hybrid meta-learner model is built based on the combination of traditional hybrid and ensemble modeling of credit scoring systems. The foundation of the model is based on the traditional hybrid model of “classification+clustering” which uses a classification technique as a pre-processor for the clustering algorithm. The only difference is that this paper adopts the stacking ensemble method in the first part instead of using a single classification method. Also, several classification and clustering techniques are used in this study which are briefly described in following sub sections. A. Classification Techniques Classification (or supervised learning) methods are capable of mapping input vectors into one of various preferred output classes through learning by examples. A classifier can be learned by computing the rough distance between input–output instances and correctly labeling outputs out of training set. This procedure is named as the model generation stage. After generating the model, the resulting classifier is able to classify an unidentified example based on the learned classes in the training set. Various classification techniques are employed in this paper which are Artificial Neural Network (NN), Automated Multilayer Perceptron (AMLP), Decision Tree (DT), K-nearest Neighbors (KNN), Logistic Regression (LR), Naïve Bayesian (NB), Support Vector Machines (SVM), and Support Vector Machines optimized by Particle Swarm Optimization (SVM-PSO). B. Clustering Techniques Clustering (or unsupervised learning) methods can be viewed as the way toward combining similar examples into a cluster. Unlike the classification, labeled examples are not available in clustering. The main aim of clustering approach is Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2076 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers to increase the resemblance between the group members. Moreover, the data between different clusters should have the highest dissimilarity. Conversely, the highest data similarity should exist within each cluster [42]. There are two categories for clustering algorithms, which are partitional and hierarchical clustering algorithms but the former is much more popular [43]. Partitional clustering has been widely implemented in many credit scoring problems. K-means and expectation maximization are two renowned partitional clustering algorithms. In contrast, hierarchical clustering generates clusters according to a hierarchy by means of the accumulation algorithm. Then, a different single cluster will be merged individually until fulfilling some rules. The outcome will create a series of branching partitions. This study uses five clustering algorithms namely Expectation Maximization (EM), K-means (KM), Fuzzy C-means (FCM), Density-based spatial clustering of applications with noise (DBSCAN), and Self-organizing Maps (SOM). C. The Hybrid Meta-Learner Model This study introduces a new hybrid method for credit scoring which is a mixture of traditional hybrid and stacking ensemble methods. The idea comes from the traditional hybrid model of “classification plus clustering” due to the fact that clustering is an unsupervised learning method and it is unable to distinguish data accurately like supervised approaches. Therefore, a classifier or set of classifiers can be trained first, and then the output can be used as the input for the clustering technique to improve the clustering results. In this process, instead of using a single classification algorithm in the first part of the hybrid model, a stacking ensemble method will be used. This stacking model utilizes three different base classifiers (level 0 generalizers) to train the meta-classifier (level 1 generalizer). In the second part, several clustering techniques will be interchangeably used in order to find which combination of algorithms yield the best results. One of the advantages on this hybrid system is placing a deep learning algorithm (DL) in the heart of the proposed model as the meta- learner. Owing to the great learning ability of DL, the predictive performance of the model is expected to improve significantly. For the first time, the concept of DL is proposed in 2006. This algorithm is defined in the framework of deep belief networks (DBN). Later, DL has caused considerable amount of scientific researches in several fields [45-47]. This algorithm as a feature selection technique, tries to get feature abstractions at the high-levels through learning various feature structures in the training process. Every DL iteration is an unsupervised learning process for feature extraction, and the mix of different layers has the ability to create a deep supervised predictor [48]. DL has various theoretical frameworks, but this study utilizes the H2O version, which is based on the feedforward architecture. As shown in the Figure 1, the main part of the DL model is the neuron which is inspired by the human neural system. In this model, the weighted mix of input signals (α) is combined, and then an output signal f(α) conveyed by the connected neuron. The nonlinear activation function and neuron’s activation threshold (bias) are denoted by f and b, respectively [49]. Fig. 1. The neuron architecture in the deep learning model The weights which are connecting neurons and biases with other neurons define the output of the whole network. The error on the labeled training data should be minimized through weight adjustment procedure in order to make learning possible. Specifically, the aim is to minimize the loss function of L(W; B | j) for every training example j [49]. DL as the meta-learner in the stacking algorithm creates a vector of weight distribution by giving a weight to each base classifier that is proportional to their performances. Stacking can be seen as a completely customizable hybrid machine learning technique since it embraces various types of base- and meta- classifiers. This study employs different types of classifiers and clusterers to be used in this hybrid model. In the literature, many works have been done by developing hybrid credit rating models only by choosing single learning algorithms as the baselines but this study adopts the stacking ensemble method as the baseline of the hybrid model. Furthermore, the aim is to compare several versions of the proposed hybrid model by selecting various mix of classification and clustering algorithms in order to find the best model. Figure 2 illustrates the framework for the proposed hybrid meta-learner model. Fig. 2. The hybrid meta-learner model Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2077 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers After collecting the desired datasets, the pre-process treatment should be applied on the data. In this stage, those data points which are considered as outliers or anomalies should be removed from datasets. For this reason, a distance- based outlier detection algorithm is used to identify n outliers in the given dataset based on the distance to their k nearest neighbors [50]. Also, those examples with missing feature values are removed from datasets. Then, a combination of three classifier algorithms is interchangeably selected out of various classifiers as the base learners. As mentioned before, the stacking model is equipped with DL algorithm as the meta- learner. The stacking model uses the results of base learners (level 0 generalizers) to train the meta-learner (level 1 generalizers). In the next step, the results of the DL as the processed data are fed into the clustering unit. Hence, it is expected to see significant improvement in the prediction accuracy of the model. The performances of models are compared after applying 10-fold cross-validation with stratified sampling during the training and testing stages. Therefore, each dataset is divided into 10 unique subsets (strata), in which any 9 of the 10 subsets are used for training and the leftover for testing. In other words, each model will be trained and tested 10 times. D. Data Financial institutions need to have a system for evaluating the credit risk of their customers when granting loans. But before that, they need to make sure that the accuracy of their system is at the acceptable level. Since, it is very difficult to assess the performance of a credit scoring system against private datasets, it is necessary to measure its performance against some benchmark datasets. As a result, we study four real-life datasets to evaluate the predictive power of our proposed hybrid meta-learner model. Table I displays the characteristics of the datasets. Particularly, the first three datasets are considered as benchmark datasets in the literature. They are related to consumer credit card loans from Australia, Germany, and Japan; and collected from the UCSD data repository. The last dataset is related to consumer loans in Iran and collected from Mellat Bank. TABLE I. CHARACTERISTICS OF THE DATA SETS Data set #Attributes #Good #Bad Total Source Australian (AUS) 14 307 383 690 UCI Machine Learning Database Repository German (GER) 24 700 300 1000 UCI Machine Learning Database Repository Japanese (JPN) 15 296 357 653 UCI Machine Learning Database Repository Iranian (IRA) 12 9351 649 10000 Mellat Bank E. Evaluation Strategies In order to assess the predictive power of the developed models, prediction accuracy rate and F-measure which is the harmonic mean of precision and recall are taken into account. Precision shows the accuracy degree of classification results and recall is the success rate of identifying classification results. Moreover, besides from these evaluation methods, type I and II errors are also shown for the best model in each dataset (Table II). These evaluation methods can be calculated as follows: TABLE II. CONFUSION MATRIX FOR A CREDIT SCORING PROBLEM Good applicant Bad applicant Predicted as Good TN FN Predicted as Bad FP TP Note: Positive class is Bad. IV. EMPIRICAL RESULTS A. The Single Baseline Classifiers Table III shows the prediction accuracy, F-measure and average rank of single baseline classifiers. In order to rank the performance of the models, we choose the average rank method as presented in [23]. TABLE III. ACCURACY, F-MEASURE AND AVERAGE RANK OF THE SINGLE BASELINE MODELS Model Accuracy F-measure Average Rank AUS GER JPN IRA AUS GER JPN IRA Accuracy F-measure SVMPSO 97.54 93.90 98.47 98.92 97.15 88.68 98.62 91.54 1.00 1.00 NN 87.54 81.20 87.29 98.66 86.65 66.19 87.70 89.25 3.00 3.00 AMLP 87.10 77.40 87.90 98.57 85.53 65.65 88.47 88.62 3.75 3.75 LR 87.68 73.40 88.82 96.21 86.82 64.53 89.62 76.65 4.00 3.25 DT 86.81 68.00 86.68 98.89 86.32 46.67 86.80 90.65 5.00 4.50 SVM 85.51 77.90 86.52 94.75 85.03 55.71 86.67 48.58 5.50 6.50 KNN 73.33 74.80 74.89 95.48 63.35 37.31 77.66 54.34 6.75 7.50 NB 80.72 74.40 80.40 93.29 75.77 60.25 83.55 52.11 7.00 6.50 Note: All the numbers are in percentage form. The analyses are carried out in RapidMiner 7.2 program. This method helps us compare various model performances and identify the top performers. As shown, the SVMPSO has shown the best predictive accuracy and F-measure and it is ranked first among the other baseline classifiers. SVMs are first introduced by authors in [51] as a form of linear classifiers. SVMs can be utilized for twofold classification with the aim of generating a best hyperplane (line) that sorts the input information into two classes (bad or good credit) [52]. As a Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2078 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers population-based stochastic optimization approach, particle swarm optimization (PSO) works via simulating the birds’ behavior inside a flock. This algorithm is introduced by authors in [53] and [54]. PSO can be used to improve the accuracy of SVMs through identifying the best hyperplane which separates two classes. As can be seen from the results, the SVM-PSO is one of the highly accurate heuristic classifiers but only few studies have applied this method. According to the results, the second and third best performers are NN and AMLP. Conversely, the worst predictive performance on average belongs to NB. Here, F-measure shows the overall performance of the model via combining precision and recall values. B. The Hybrid Meta-Learner Models Table IV represents the prediction accuracy, F-measure and average rank of the best hybrid meta-learner models. As it is shown in this table, the (KNN-NN-SVMPSO)-(DL)- (DBSCAN) model has achieved the best accuracy rate and F- measure compared with other hybrid models on all credit datasets. It is interesting that other high performing models also have the same stacking combination as the best model. It is obvious that employing an optimized version of support vector machine, SVMPSO, has a significant role in improving the accuracy of the results. Moreover, among the clustering algorithms, DBSCAN has highest performance and followed by SOM and FCM. DBSCAN is a data clustering algorithm introduced in [55] and it is a density-based clustering algorithm that groups data points which are closely located together. DBSCAN is known as one of the top performing clustering algorithms in the literature. The advantages of DBSCAN algorithm can be enumerated as follows:  Unlike the K-means, DBSCAN does not need the number of clusters in the dataset to be specified a priori.  It has the ability to detect clusters which are arbitrarily shaped. It can even discover a cluster entirely bounded by (but not linked to) a different cluster. TABLE IV. ACCURACY, F-MEASURE AND AVERAGE RANK OF THE BEST HYBRID MODELS Model Accuracy F-measure Average Rank AUS GER JPN IRA AUS GER JPN IRA Accuracy F-measure (KNN-NN-SVMPSO)-(DL)-(DBSCAN) 99.71 99.80 99.85 99.90 99.67 99.67 99.86 99.23 2.00 2.00 (KNN-NN-SVMPSO)-(DL)-(SOM) 99.86 99.70 99.85 99.80 99.84 99.50 99.86 98.45 2.13 2.13 (KNN-NN-SVMPSO)-(DL)-(FCM) 99.71 99.70 99.85 99.74 99.67 99.50 99.86 97.99 3.25 3.25 (KNN-NN-SVMPSO)-(DL)-(EM) 99.86 99.60 99.69 99.78 99.84 99.33 99.72 98.29 3.50 3.50 (KNN-NN-SVMPSO)-(DL)-(KM) 99.71 99.70 99.69 99.72 99.67 99.50 99.72 97.82 4.13 4.13 (DT-KNN-NN)-(DL)-(FCM) 95.65 97.70 94.33 99.48 95.07 96.19 94.71 95.95 11.88 11.38 (DT-KNN-NN)-(DL)-(KM) 95.94 97.90 93.88 99.48 95.39 96.45 94.30 95.91 12.13 13.50 (KNN-NN-SVM)-(DL)-(FCM) 95.22 98.10 94.33 99.46 94.60 96.82 94.78 95.78 15.00 14.63 (NB-KNN-NN)-(DL)-(FCM) 95.65 97.80 94.03 99.43 95.03 96.33 94.42 95.61 16.00 16.75 (KNN-NN-SVM)-(DL)-(DBSCAN) 95.80 97.70 94.03 99.42 95.25 96.10 94.48 95.53 16.50 15.38 (KNN-NN-LR)-(DL)-(SOM) 95.65 97.60 94.18 99.42 95.07 95.99 94.54 95.44 17.25 17.63 (DT-KNN-NN)-(DL)-(EM) 95.80 97.10 93.87 99.55 95.27 95.09 94.40 96.45 17.38 16.75 (DT-KNN-NN)-(DL)-(DBSCAN) 95.36 97.80 93.72 99.55 94.61 96.28 94.12 96.51 17.75 18.50 (NB-KNN-NN)-(DL)-(EM) 95.36 97.40 94.49 99.45 94.70 95.65 94.87 95.74 17.88 18.00 (KNN-NN-AMLP)-(DL)-(EM) 96.52 95.90 94.03 99.46 96.03 92.79 94.45 95.73 17.88 18.25 (KNN-NN-AMLP)-(DL)-(FCM) 95.51 95.70 94.49 99.47 94.99 92.77 94.99 95.80 17.88 17.75 (KNN-NN-AMLP)-(DL)-(DBSCAN) 93.91 98.20 94.49 99.41 93.02 97.01 94.89 95.28 18.87 19.50 (KNN-NN-AMLP)-(DL)-(KM) 95.51 96.00 94.18 99.47 94.82 93.22 94.49 95.87 19.00 20.13 (KNN-NN-AMLP)-(DL)-(SOM) 94.93 98.30 94.33 99.39 94.14 97.12 94.78 95.27 19.63 19.13 (KNN-NN-LR)-(DL)-(DBSCAN) 95.51 98.10 94.18 99.37 94.89 96.85 94.69 95.17 19.63 18.00 (DT-KNN-NN)-(DL)-(SOM) 95.07 97.30 94.03 99.52 94.48 95.43 94.42 96.23 20.13 20.88 (KNN-NN-LR)-(DL)-(KM) 95.36 97.50 94.03 99.44 94.81 95.83 94.53 95.71 20.50 18.88 (NB-KNN-NN)-(DL)-(KM) 95.65 97.50 93.72 99.43 95.15 95.81 94.23 95.48 21.13 21.00 (KNN-NN-LR)-(DL)-(FCM) 95.80 97.20 93.87 99.42 95.19 95.29 94.32 95.47 21.25 21.88 (NB-KNN-NN)-(DL)-(DBSCAN) 95.51 96.90 94.18 99.41 95.06 94.70 94.60 95.52 22.75 19.88 (KNN-NN-LR)-(DL)-(EM) 95.51 97.00 93.87 99.43 95.01 94.92 94.43 95.52 23.63 22.38 (LR-NN-AMLP)-(DL)-(KM) 95.65 95.40 93.11 99.46 95.03 91.87 93.58 95.76 23.88 24.38 (KNN-NN-SVM)-(DL)-(KM) 95.36 97.70 93.42 99.41 94.82 96.10 94.07 95.39 24.38 24.13 (KNN-NN-SVM)-(DL)-(EM) 95.65 97.50 93.87 99.37 95.02 95.77 94.32 95.00 24.38 25.88 (NB-KNN-NN)-(DL)-(SOM) 95.36 97.20 93.87 99.41 94.74 95.38 94.33 95.39 25.88 25.63 (LR-NN-AMLP)-(DL)-(EM) 93.04 97.70 92.50 99.45 92.23 96.17 93.17 95.62 26.50 26.75 (KNN-NN-SVM)-(DL)-(SOM) 95.22 97.50 93.72 99.40 94.53 95.83 94.22 95.41 27.00 26.13 (LR-NN-AMLP)-(DL)-(DBSCAN) 93.33 97.60 92.8 99.43 92.51 96.00 93.41 95.47 27.75 28.62 (LR-NN-AMLP)-(DL)-(SOM) 93.91 97.20 94.03 99.39 92.91 95.29 94.44 95.15 28.00 28.63 (LR-NN-AMLP)-(DL)-(FCM) 93.91 97.60 93.42 99.36 93.16 95.86 93.99 94.94 30.75 30.75 (DT-NB-LR)-(DL)-(KM) 94.49 89.30 90.66 99.38 93.75 83.56 91.30 95.07 35.88 35.63 (DT-NB-LR)-(DL)-(FCM) 93.62 88.80 91.27 99.39 92.59 82.22 91.80 95.07 36.25 37.38 (DT-NB-LR)-(DL)-(DBSCAN) 94.20 88.50 91.27 99.38 93.38 80.21 91.82 95.02 36.25 36.50 (DT-NB-LR)-(DL)-(EM) 93.33 89.20 90.81 99.39 92.65 80.99 91.71 95.12 37.00 37.25 (DT-NB-LR)-(DL)-(SOM) 93.48 89.60 91.73 99.36 92.66 82.61 92.22 94.83 37.13 37.25 Note: See Table III Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2079 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers  It is robust to outliers so it can successfully deal with noises in datasets. Table V compares the best baseline and hybrid models in terms of accuracy, F-measure, type I and II errors. The best hybrid model has improved the accuracy rate and F-measure results of the best baseline model by 2.68% and 5.97% on average, respectively. In contrast, the best hybrid model has reduced the type I and II errors by 79.84% and 95.16% on average, respectively. Furthermore, as we have included three public real-world credit datasets in this study, it is easy to compare the predictive performance of our hybrid models with other studies in the literature. Table VI presents a comprehensive summary of credit scoring performances of various models which are used in the literature by several researchers. As the Australian and German credit data sets are the most frequently used datasets in the relevant literature, we also collected the results of the papers which have used these two datasets as their benchmarks. As shown, different authors have applied different models in their studies and they found various results in terms of predictive performance. It is interesting that the best performing hybrid model of this study is placed in the first rank based on the prediction accuracy measure. This hybrid credit scoring model has reached to an astonishing 99.71% and 99.80% accuracy on Australian and German datasets, respectively. TABLE V. AVERAGE PERFORMANCE COMPARISON OF THE BEST BASELINE AND HYBRID MODELS Best baseline Best hybrid SVMPSO (KNN-NN-SVMPSO)-(DL)-(DBSCAN) Accuracy F-measure Error I Error II Accuracy F-measure Error I Error II AUS 97.54 97.15 0.00 5.54 99.71 99.67 0.26 0.33 GER 93.90 88.68 0.00 20.33 99.80 99.67 0.14 0.33 JPN 98.47 98.62 3.38 0.00 99.85 99.86 0.34 0.00 IRA 98.92 91.54 0.46 10.02 99.90 99.23 0.03 1.08 Average 97.21 94.00 0.96 8.97 99.82 99.61 0.19 0.43 Improvement - - - - 2.68 5.97 -79.84 -95.16 Note: “Improvement” shows the percentage growth rate for the accuracy and F-measure. It also show the type I and II error reduction rates in percentage. TABLE VI. PERFORMANCE COMPARISON OF VARIOUS CREDIT SCORING MODELS IN THE LITERATURE Model Australian German Rank Author(s) Year (KNN-NN-SVMPSO)-(DL)-(DBSCAN) 99.71 99.80 1 (This study) 2017 MC-LR (Intersection) 99.11 99.18 2 Tsai,Hsu [56] 2013 Hybrid SOM-KM-NN 97.98 98.46 3 Hsieh [17] 2005 MLP + FS - 97.20 4 Analide [57] 2011 LibSVM 86.38 94.00 5 Peng, Kou, Shi,Chen [58] 2008 Hybrid NN 91.61 87.45 6 Tsai,Hung [59] 2014 AMMLP 92.75 84.67 7 Marcano-Cedeno, Marin-De-La-Barcena, Jiménez-Trillo, Pinuela,Andina [60] 2011 Gaussian classifier 92.6 83.80 8 Somol, Baesens, Pudil,Vanthienen [61] 2005 ANN 97.32 78.97 9 Tsai,Wu [62] 2008 VBDTM 91.97 81.64 10 Zhang, Zhou, Leung,Zheng [63] 2010 DeepSVM 88.98 83.70 11 Qi, Wang, Tian,Zhang [64] 2016 PSO-SVM 91.03 81.62 12 Lin, Ying, Chen,Lee [65] 2008 SVM 85.70 - 13 Martens, Baesens, Van Gestel,Vanthienen [66] 2007 CLC 86.52 84.80 14 Luo, Cheng,Hsieh [67] 2009 MLP 90.20 79.11 15 Tsai [68] 2008 2SGP 89.17 79.49 16 Huang, Tzeng,Ong [69] 2006 GNG + MARS 88.10 79.00 17 Ala'raj,Abbod [28] 2016 RS-Bagging DT 88.17 78.36 18 Wang, Ma, Huang,Xu [24] 2012 Genetic programming 88.27 77.34 19 Ong, Huang,Tzeng [16] 2005 Parallel Random Forest 89.40 76.20 20 Van Sang, Nam,Nhan [70] 2016 LS-SVM 90.40 74.60 21 Baesens, Van Gestel, Viaene, Stepanova, Suykens,Vanthienen [71] 2003 FA-MLP 86.08 78.76 22 Tsai [72] 2009 SVM + GA 86.90 77.92 23 Huang, Chen,Wang [73] 2007 SVDD-FSVM 87.25 77.30 24 Shi,Xu [27] 2016 RBF-SVM 87.52 76.60 25 Ping,Yongheng [74] 2011 Genetic Fuzzy classifier 88.60 75.00 26 Lahsasna, Ainon,Wah [11] 2010 Mixture-of-experts network 87.25 76.30 27 West [5] 2000 LDA + SVM 86.52 76.70 28 Chen,Li [75] 2010 Bayes 86.70 76.00 29 Hoffmann, Baesens, Mues, Van Gestel,Vanthienen [76] 2007 GR-GA-SVM 86.84 75.75 30 Liu, Fu,Lin [77] 2010 RS-LMNC 87.05 74.67 31 Nanni,Lumini [78] 2009 Adopted CBA 86.96 74.40 32 Lan, Janssens, Chen,Wets [79] 2006 HGA-NN - 78.90 33 Oreski, Oreski,Oreski [80] 2012 ECSC 86.86 70.60 34 Xiao, Xiao,Wang [26] 2016 Note: The above models are ranked based on their average performances on Australian and German credit datasets. Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2080 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers V. CONCLUSION As financial institutions are exposed to credit risk when issuing consumer loans, developing reliable credit scoring systems is crucial for them. Since, machine learning methods have demonstrated their applicability and merit, this study develops and compares several hybrid machine learning approaches for the credit scoring problem. In this paper, a novel framework is proposed for hybrid meta-learning to improve the predictive performance of credit scoring models. Based on the selected datasets, the results show that the hybrid meta-learner model of (KNN-NN-SVMPSO)-(DL)-(DBSCAN) outpaces all the literature’s baseline classifiers in terms of accuracy rate and type I/II errors. This model also outperforms the best models used in the relevant literature in terms of accuracy rate with a significant margin. The findings of this study give us the insight to realize which type of hybrid machine learning techniques is capable of achieving higher accuracy and lower error rates in the case of credit scoring. Also, it is clear that the optimized version of support vector machine, SVMPSO, and deep learning algorithms have significant roles in enhancing predictive power of the proposed models. As a result, it is believed that using the best credit scoring model identified by this study can help financial institutions to make more accurate and confident credit decisions in the future. For further studies, several issues can be studied in the future. One of them can be the reduction of feature dimensionality. To be precise, the pre-process treatment of selected datasets for dimensionality reduction or feature selection can be beneficial to achieve better prediction precision [72]. Although, this paper is quite rich in terms of employing various machine learning algorithms, there are other techniques which can be applied for further comparisons especially those algorithms which are heuristically optimized. Lastly, since this paper specifically concentrates on the credit scoring problem, future studies can examine other problem areas such as corporate loans, house and car loans to identify which hybrid method has the best performance or if the empirical outcomes differ from the results of this paper. REFERENCES [1] D. J. Hand, W. E. Henley, “Statistical classification methods in consumer credit scoring: a review”, Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 160, No. 3, pp. 523-541, 1997 [2] C. R. Abrahams, M. Zhang, Fair lending compliance: Intelligence and implications for credit risk management, John Wiley & Sons, 2008 [3] E. Rosenberg, A. Gleit, “Quantitative methods in credit management: a survey”, Operations Research, Vol. 42, No. 4, pp. 589-613, 1994 [4] L. C. Thomas, D. B. Edelman, J. N. Crook, Credit scoring and its applications, SIAM, 2002 [5] D. West, “Neural network credit scoring models”, Computers & Operations Research, Vol. 27, No. 11-12, pp. 1131-1152, 2000 [6] R. C. Lacher, P. K. Coats, S. C. Sharma, L. F. Fant, “A neural network for classifying the financial health of a firm”, European Journal of Operational Research, Vol. 85, No. 1, pp. 53-65, 1995 [7] T.-S. Lee, I.-F. Chen, “A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines”, Expert Systems with Applications, Vol. 28, No. 4, pp. 743-752, 2005 [8] H. Abdou, J. Pointon, A. El-Masry, “Neural nets versus conventional techniques in credit scoring in Egyptian banking”, Expert Systems with Applications, Vol. 35, No. 3, pp. 1275-1292, 2008 [9] M.-C. Chen, S.-H. Huang, “Credit scoring and rejected instances reassigning through evolutionary computation techniques”, Expert Systems with Applications, Vol. 24, No. 4, pp. 433-441, 2003 [10] V. S. Desai, J. N. Crook, G. A. Overstreet, “A comparison of neural networks and linear scoring models in the credit union environment”, European Journal of Operational Research, Vol. 95, No. 1, pp. 24-37, 1996 [11] A. Lahsasna, R. N. Ainon, T. Y. Wah, “Enhancement of transparency and accuracy of credit scoring models through genetic fuzzy classifier”, Maejo International Journal of Science and Technology, Vol. 4, No. 1, pp. 136-158, 2010 [12] T.-S. Lee, C.-C. Chiu, C.-J. Lu, I.-F. Chen, “Credit scoring using the hybrid neural discriminant technique”, Expert Systems with applications, Vol. 23, No. 3, pp. 245-254, 2002 [13] M.-C. Tsai, S.-P. Lin, C.-C. Cheng, Y.-P. Lin, “The consumer loan default predicting model–An application of DEA–DA and neural network”, Expert Systems with applications, Vol. 36, No. 9, pp. 11682- 11690, 2009 [14] J. N. Crook, D. B. Edelman, L. C. Thomas, “Recent developments in consumer credit risk assessment”, European Journal of Operational Research, Vol. 183, No. 3, pp. 1447-1465, 2007 [15] Z. Huang, H. Chen, C.-J. Hsu, W.-H. Chen, S. Wu, “Credit rating analysis with support vector machines and neural networks: a market comparative study”, Decision Support Systems, Vol. 37, No. 4, pp. 543- 558, 2004 [16] C.-S. Ong, J.-J. Huang, G.-H. Tzeng, “Building credit scoring models using genetic programming”, Expert Systems with Applications, Vol. 29, No. 1, pp. 41-47, 2005 [17] N.-C. Hsieh, “Hybrid mining approach in the design of credit scoring models”, Expert Systems with Applications, Vol. 28, No. 4, pp. 655-665, 2005 [18] A. Jain, A. M. Kumar, “Hybrid neural network models for hydrologic time series forecasting”, Applied Soft Computing, Vol. 7, No. 2, pp. 585-592, 2007 [19] H. Kim, K. Shin, “A hybrid approach based on neural networks and genetic algorithms for detecting temporal patterns in stock markets”, Applied Soft Computing, Vol. 7, No. 2, pp. 569-576, 2007 [20] R. Malhotra, D. K. Malhotra, “Differentiating between good credits and bad credits using neuro-fuzzy systems”, European Journal of Operational research, Vol. 136, No. 1, pp. 190-211, 2002 [21] J. Huysmans, B. Baesens, J. Vanthienen, T. Van Gestel, “Failure prediction with self organizing maps”, Expert Systems with Applications, Vol. 30, No. 3, pp. 479-487, 2006 [22] C. Tsai, M. Chen, “Credit rating by hybrid machine learning techniques”, Applied Soft Computing, Vol. 10, No. 2, pp. 374-380, 2010 [23] I. Brown, C. Mues, “An experimental comparison of classification algorithms for imbalanced credit scoring data sets”, Expert Systems with Applications, Vol. 39, No. 3, pp. 3446-3453, 2012 [24] G. Wang, J. Ma, L. Huang, K. Xu, “Two credit scoring models based on dual strategy ensemble trees”, Knowledge-Based Systems, Vol. 26, pp. 61-68, 2012 [25] T. Harris, “Credit scoring using the clustered support vector machine”, Expert Systems with Applications, Vol. 42, No. 2, pp. 741-750, 2015 [26] H. Xiao, Z. Xiao, Y. Wang, “Ensemble classification based on supervised clustering for credit scoring”, Applied Soft Computing, Vol. 43, pp. 73-86, 2016 [27] J. Shi, B. Xu, “Credit Scoring by Fuzzy Support Vector Machines with a Novel Membership Function”, Journal of Risk and Financial Management, Vol. 9, No. 4, pp. 13, 2016 [28] M. Ala'raj, M. F. Abbod, “A new hybrid ensemble credit scoring model based on classifiers consensus system approach”, Expert Systems with Applications, Vol. 64, pp. 36-55, 2016 [29] M. J. Lenard, G. R. Madey, P. Alam, “The design and validation of a hybrid information system for the auditor’s going concern decision”, Journal of Management Information Systems, Vol. 14, No. 4, pp. 219- 237, 1998 Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2081 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers [30] T. G. Dietterich, “Ensemble learning”, The handbook of brain theory and neural networks, Vol. 2, pp. 110-125, 2002 [31] M. Tavana, K. Puranam, Handbook of Research on Organizational Transformations through Big Data Analytics, IGI Global, 2014 [32] F. Anifowose, J. Labadin, A. Abdulraheem, “Improving the prediction of petroleum reservoir characterization with a stacked generalization ensemble model of support vector machines”, Applied Soft Computing, Vol. 26, pp. 483-496, 2015 [33] J. Kittler, M. Hatef, R. P. Duin, J. Matas, “On combining classifiers”, IEEE transactions on pattern analysis and machine intelligence, Vol. 20, No. 3, pp. 226-239, 1998 [34] D. Opitz, R. Maclin, “Popular ensemble methods: An empirical study”, Journal of Artificial Intelligence Research, Vol. 11, pp. 169-198, 1999 [35] M. Pal, “Ensemble learning with decision tree for remote sensing classification”, World Academy of Science, Engineering and Technology, Vol. 1, No. 12, pp. 3839-3841, 2007 [36] D. H. Wolpert, “Stacked generalization”, Neural networks, Vol. 5, No. 2, pp. 241-259, 1992 [37] J. Sill, G. Takacs, L. Mackey, D. Lin, “Feature-weighted linear stacking”, arXiv:0911.0460, 2009 [38] M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents”, Tenth International Conference on Machine Learning pp. 330-337, 1993 [39] . Breiman, “Stacked regressions”, Machine learning, Vol. 24, No. 1, pp. 49-64, 1996 [40] M. Ozay, F. T. Y. Vural, “A new fuzzy stacked generalization technique and analysis of its performance”, arXiv:1204.0171, 2012 [41] P. Smyth, D. Wolpert, “Linearly combining density estimators via stacking”, Machine Learning, Vol. 36, No. 1-2, pp. 59-83, 1999 [42] T. M. Mitchell, Machine learning. 1997, Burr Ridge, IL: McGraw Hill, 1997 [43] A. K. Jain, M. N. Murty, P. J. Flynn, “Data clustering: a review”, ACM Computing Surveys (CSUR), Vol. 31, No. 3, pp. 264-323, 1999 [44] G. E. Hinton, S. Osindero, Y.-W. Teh, “A fast learning algorithm for deep belief nets”, Neural Computation, Vol. 18, No. 7, pp. 1527-1554, 2006 [45] I. Goodfellow, Q. V. Le, A. M. Saxe, H. Lee, A. Y. Ng, “Measuring invariances in deep networks”, 23rd Annual Conference on Neural Information Processing Systems, pp. 646-654, 2009 [46] D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber, “Deep, big, simple neural nets for handwritten digit recognition”, Neural Computation, Vol. 22, No. 12, pp. 3207-3220, 2010 [47] Y. Bengio, “Practical recommendations for gradient-based training of deep architectures”, arXiv:1206.5533, 2012 [48] Y. Bengio, A. Courville, P. Vincent, “Representation learning: A review and new perspectives”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 8, pp. 1798-1828, 2013 [49] A. Candel, V. Parmar, E. LeDell, A. Arora, Deep Learning with H2O, H2O Inc., 2016 [50] S. Ramaswamy, R. Rastogi, K. Shim, “Efficient algorithms for mining outliers from large data sets”, ACM SIGMOD International Conference On Management Of Data, pp. 427-438, 2000 [51] C. Cortes, V. Vapnik, “Support-vector networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995 [52] S. Li, W. Shiue, M. Huang, “The evaluation of consumer loans using support vector machines”, Expert Systems with Applications, Vol. 30, No. 4, pp. 772-782, 2006 [53] R. Eberhart, J. Kennedy, “A new optimizer using particle swarm theory”, Sixth International Symposium on Micro Machine and Human Science pp. 39-43, 1995 [54] J. F. Kennedy, J. Kennedy, R. C. Eberhart, Y. Shi, Swarm intelligence, Morgan Kaufmann, 2001 [55] M. Ester, H. Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, Second International Conference on Knowledge Discovery and Data Mining, pp. 226-231, 1996 [56] C. F. Tsai, Y. F. Hsu, “A Meta‐learning Framework for Bankruptcy Prediction”, Journal of Forecasting, Vol. 32, No. 2, pp. 167-179, 2013 [57] F. S. C. Analide, “Information asset analysis: credit scoring and credit suggestion”, International Journal of Electronic Business, Vol. 9, No. 3, pp. 203-218, 2011 [58] Y. Peng, G. Kou, Y. Shi, Z. Chen, “A multi-criteria convex quadratic programming model for credit data analysis”, Decision Support Systems, Vol. 44, No. 4, pp. 1016-1030, 2008 [59] C. Tsai, C. Hung, “Modeling credit scoring using neural network ensembles”, Kybernetes, Vol. 43, No. 7, pp. 1114-1123, 2014 [60] A. Marcano-Cedeno, A. Marin-De-La-Barcena, J. Jimenez-Trillo, J. Pinuela, D. Andina, “Artificial metaplasticity neural network applied to credit scoring”, International Journal of Neural Systems, Vol. 21, No. 04, pp. 311-317, 2011 [61] P. Somol, B. Baesens, P. Pudil, J. Vanthienen, “Filter‐versus wrapper ‐based feature selection for credit scoring”, International Journal of Intelligent Systems, Vol. 20, No. 10, pp. 985-999, 2005 [62] C. Tsai, J. Wu, “Using neural network ensembles for bankruptcy prediction and credit scoring”, Expert Systems with Applications, Vol. 34, No. 4, pp. 2639-2649, 2008 [63] D. Zhang, X. Zhou, S. C. Leung, J. Zheng, “Vertical bagging decision trees model for credit scoring”, Expert Systems with Applications, Vol. 37, No. 12, pp. 7838-7843, 2010 [64] Z. Qi, B. Wang, Y. Tian, P. Zhang, “When Ensemble Learning Meets Deep Learning: a New Deep Support Vector Machine for Classification”, Knowledge-Based Systems, Vol. 107, pp. 54-60, 2016 [65] S. Lin, K. Ying, S. Chen, Z. Lee, “Particle swarm optimization for parameter determination and feature selection of support vector machines”, Expert Systems with Applications, Vol. 35, No. 4, pp. 1817- 1824, 2008 [66] D. Martens, B. Baesens, T. Van Gestel, J. Vanthienen, “Comprehensible credit scoring models using rule extraction from support vector machines”, European Journal of Operational Research, Vol. 183, No. 3, pp. 1466-1476, 2007 [67] S. Luo, B. Cheng, C. Hsieh, “Prediction model building with clustering- launched classification and support vector machines in credit scoring”, Expert Systems with Applications, Vol. 36, No. 4, pp. 7562-7566, 2009 [68] C. F. Tsai, “Financial decision support using neural networks and support vector machines”, Expert Systems, Vol. 25, No. 4, pp. 380-393, 2008 [69] J. Huang, G. Tzeng, C. Ong, “Two-stage genetic programming (2SGP) for the credit scoring model”, Applied Mathematics and Computation, Vol. 174, No. 2, pp. 1039-1053, 2006 [70] H. Van Sang, N. H. Nam, N. D. Nhan, “A novel credit scoring prediction model based on Feature Selection approach and parallel random forest”, Indian Journal of Science and Technology, Vol. 9, No. 20, 2016 [71] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, J. Vanthienen, “Benchmarking state-of-the-art classification algorithms for credit scoring”, Journal of the Operational Research Society, Vol. 54, No. 6, pp. 627-635, 2003 [72] C. Tsai, “Feature selection in bankruptcy prediction”, Knowledge-Based Systems, Vol. 22, No. 2, pp. 120-127, 2009 [73] C. Huang, M. Chen, C. Wang, “Credit scoring with a data mining approach based on support vector machines”, Expert Systems with Applications, Vol. 33, No. 4, pp. 847-856, 2007 [74] Y. Ping, L. Yongheng, “Neighborhood rough set and SVM based hybrid credit scoring classifier”, Expert Systems with Applications, Vol. 38, No. 9, pp. 11300-11304, 2011 [75] F. Chen, F. Li, “Combination of feature selection approaches with SVM in credit scoring”, Expert Systems with Applications, Vol. 37, No. 7, pp. 4902-4909, 2010 [76] F. Hoffmann, B. Baesens, C. Mues, T. Van Gestel, J. Vanthienen, “Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms”, European Journal of Operational Research, Vol. 177, No. 1, pp. 540-555, 2007 Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2073-2082 2082 www.etasr.com Armaki et al.: A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers [77] X. Liu, H. Fu, W. Lin, “A modified support vector machine model for credit scoring”, International Journal of Computational Intelligence Systems, Vol. 3, No. 6, pp. 797-804, 2010 [78] L. Nanni, A. Lumini, “An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring”, Expert Systems with Applications, Vol. 36, No. 2, pp. 3028-3033, 2009 [79] Y. Lan, D. Janssens, G. Chen, G. Wets, “Improving associative classification by incorporating novel interestingness measures”, Expert Systems with Applications, Vol. 31, No. 1, pp. 184-192, 2006 [80] S. Oreski, D. Oreski, G. Oreski, “Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment”, Expert Systems with Applications, Vol. 39, No. 16, pp. 12605-12617, 2012