Microsoft Word - Volume 12, Issue 3-1 Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 https://jracr.com/ ISSN Print: 2210-8491 ISSN Online: 2210-8505 DOI: https://doi.org/10.54560/jracr.v12i3.332 110 Article CUS-RF-Based Credit Card Fraud Detection with Imbalanced Data Wei Li 1, Cheng-shu Wu 1 and Su-mei Ruan 1,* 1 School of Finance, Anhui University of Finance and Economics, Bengbu (233030), Anhui, China * Correspondence: ruansumei0116@163.com Received: July 2, 2022; Accepted: September 4, 2022; Published: September 30, 2022 Abstract: With the continuous expansion of the banks' credit card businesses, credit card fraud has become a serious threat to banking financial institutions. So, the automatic and real-time credit card fraud detection is the meaningful research work. Because machine learning has the characteristics of non-linearity, automation, and intelligence, so that credit card fraud detection can improve the detection efficiency and accuracy. In view of this, this paper proposes a credit card fraud detection model based on heterogeneous ensemble, namely CUS-RF (cluster-based under-sampling boosting and random forest), based on clustering under-sampling and random forest algorithm. CUS-RF- based credit card fraud detection model has the following advantages. Firstly, the CUS-RF model can better overcome the issue of data imbalance. Secondly, based on the idea of heterogeneous ensemble learning, the clustering under-sampling method and random forest model are fused to achieve a better performance for credit card fraud detection. Finally, through the verification of real credit card fraud dataset, the CUS-RF model proposed in this paper has achieved better performance in credit card fraud detection compared with the benchmark model. Keywords: Credit Card Fraud Detection; Random Forest; Imbalanced Data; Heterogeneous Ensemble; Fintech 1. Introduction Credit card fraud has caused immense financial loss to both card-issuing banks and financial institutions. According to the statistical data from China Banking Association, by the end of 2018, credit card-based transaction in China has attained 38,200 billion Yuan RMB at a growth rate of 24.9%; 73.2% of credit cards remain activated, and unpaid balance is 6,850 billion Yuan RMB (on a year-on- year growth of 23.2%). It is especially noteworthy that credit card loss rate is 1.27%, slightly higher than 1.17% in previous year. The global credit card fraud-related loss climbs from 7.6 billion dollars in 2010 to 21.81 billion dollars in 2015, with a growth of 300% within 5 years. It is expected to reach 31.67 billion dollars by 2020. A conventional credit card fraud detection model is usually constructed using rules provided by experts. Nevertheless, the aforesaid fraud detection models often demand manual parameter tuning and supervision from experts, which makes it impossible for financial institutions to discover fraudulent behaviors in time. Moreover, it is a heavy task to check all the transactions one by one. To overcome the shortcomings in such work, financial institutions have employed machine learning algorithm and data mining methods in setting up an artificial intelligence (AI)-based credit card fraud detection model that is different from the traditional detection models. The machine learning algorithm can help financial institutions in constructing automated detection models to significantly Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 111 improve the fraud detection efficiency and speed. Driven by machine learning, the novel credit card fraud detection model is trained and parameter-tuned in order to gain the expected effect. However, the ability of fraud detection will be greatly impaired should severe data imbalance exist amid credit card fraud data, so that the model may fail to exert its due performance. Data and their features are the most critical factors affecting the operation of a fraud detection model. That said the category of imbalance perplexing data centralization is associated and demands analysis. Furthermore, in the real world where financial institutions apply strict censorship, fraudulent credit card swiping is far less frequent than the normal operation. Despite this, once such fraudulent transaction occurs, it will be hard for the financial institution involved to get back the money lost. Therefore, when studying credit card fraud, it is urgent to considered the data imbalance issue. Since automated credit card fraud detection system is investigated on the basis of balanced datasets when being established with machine learning, the model is hard to give a full play to its own strengths. The problem of credit card fraud detection for imbalanced data has been studied by scholars in different countries with different ideas. After a systematic study of category imbalance processing strategies, Singh et al. (2021) compared the effectiveness and efficiency of different category imbalance processing methods and state-of-the-art classification methods, evaluating metrics in terms of Precision, Recall, K-fold Cross-validation, AUC-ROC curve and execution time, and found that oversampling and under-sampling methods performed better for integrated classification models such as AdaBoost, XGBoost and Random Forest performed better [1]. El-Naby et al. (2022) addressed the fraud data imbalanced problem by using mixed sampling and oversampling preprocessing techniques, specifically, in oversampling, SMOTE, bounded SMOTE and ADASYN were selected. In mixed sampling, SMOTEEN and SMOTETomek to eliminate the data set imbalance problem, thereby improving credit card fraud detection accuracy [2]. As for the imbalanced classification problem for credit card fraud detection, Makki et al. (2019) found LR, C5.0 decision tree algorithm, SVM and ANN to be the best methods after comparing imbalanced classification methods and based on three performance metrics: accuracy, sensitivity and average precision (AUPRC). Although the above methods improve the performance of the classifier, when the data is extremely unbalanced, these methods may create a problem of false positives and major credit fraudulent cases may remain undetected [3]. As can be seen, the data imbalance problem interferes with credit card fraud detection and even false positives, and scholars in various countries are working to improve credit card fraud prediction performance and accuracy with imbalanced data. Therefore, in creating automated credit card fraud detection model, attention should be paid to the data imbalance issue to evade possible impaired predicting accuracy of the machine learning- based prediction model. In addition, most of sorting algorithms would be undermined in performance in such case. In view of the data imbalance issue, SMOTE (synthetic minority over- sampling technique) technology has been extensively applied to financial distress prediction, bankruptcy forecasting, and credit card fraud fields in recent years [4–8]. It exhibits better sorting performance when compared with traditional RUS (random under-sampling) and ROS (random over-sampling) technologies. In this paper, a brand-new sorting technique based on clustering under- sampling that targets imbalanced data is introduced into credit card fraud field. Related references indicate this sorting technique has demonstrated outstanding performance in dealing with financial distress prediction and so on [9, 10]. As revealed by the findings of our experiment here, the technique excels SMOTE in sorting imbalanced credit card data containing thousands of samples. It has proven application prospect in the financial field. Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 112 This paper establishes a heterogeneous ensemble model by innovatively ensemble CUS (cluster- based under-sampling) with RF (random forest) [11]. To further verify the excellency of the proposed model, CUS is blended with five classifiers respectively to form five heterogeneous ensemble models. Since each classifier has distinct theoretical backgrounds, the paper combines same imbalanced data processing technique with different base learner methods in the hope to find out which theoretical design stands out in sorting the imbalanced data concerning credit card fraud. Besides, in order to explore the performances of CUS and SMTOTE, the paper also combines SMOTE with the foresaid base learner and adopt the widely applied evaluation index system in financial field to find out which imbalanced data sorting technique is more suitable for automatic classification of credit card data. The goal of doing so is to propose the best solution for imbalanced data processing and base learner choice. Apart from that, another innovative aspect of this paper lies in its embedding of imbalance processing into base learner to form automatic sorting system. The resulting system could process imbalanced data and automatically sort base learners, which greatly cut down the time cost of learning task when compared with existing research achievements. This paper is arranged as follows. Section 2 offers a literature review on data imbalance issue and credit card fraud detection model. Section 3 describes the CUS background model and then proposes an improved model that could deal with imbalanced data. Section 4, the proposed model is experimentally testing, and the experimental findings are analyzed. And conclusions are drawn in Section 5. 2. Literature Review As fintech develops by leaps and bounds in recent years, a great number of fintech banks emerges [12, 13]. Consumer finance is influencing people’s life in a novel way. This also gives rise to fraud issue. Detecting credit card fraud has become one of the much-concerned topics in financial industry. However, public models available for use remain rather limited. One of the major underlying reasons is that credit card transaction data are exclusively kept by the card-issuing agencies. As data owner, card-issue agencies must protect the data security and avoid leaking users’ privacy, so they are not going to disclose the datasets and related models they are using [14]. There are two common types of credit card fraud: application fraud and behavioral fraud. The latter type consists of card stealing, card forgery and non-existing card [15, 16]. Fraudulent swiping is the most common type of credit card fraud [17]. In usual cases, after stealing the credit card or obtaining a temporary card, fraudster would use the card for consumption as much as possible. When committing fraud with a stolen card, the criminal usually transacts with the card at high frequency. Should the fraudster forge a fake card with the information he has collected, forgery fraud will occur. While the victim still holds his own card for legal transaction, the fraudster transacts also with the fake card. The fake card will be used only for a few times before being abandoned and realized by the victim. The third type of behavioral fraud is card not present fraud which occurs in case of remote transaction. In such case, the transaction is made on the basis of card information only such as cad number, holder name, and valid term [18, 19]. The distinction between fake card fraud and card not present fraud lies in use of solid card in the former case and use of card information only in the latter [20]. There are two primary means of data mining used in credit card fraud detection system creation, namely supervised learning and unsupervised learning [21]. Supervised learning aims to training dichotomic model, depending on the detection model trained with datasets marked as “normal” and “fraudulent” to tell fraudulent samples from normal Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 113 ones [22]. This is a most common way of fraud detection. Recently, supervise learning algorithm has been applied to the establishment of some fraud detection systems. For example, Soemers et al. proposed a dynamic model combining decision-making tree and context-based multi-arm gambler which demonstrated proven effect on identifying credit card fraud [23]. In the work by Zliobaite, an adaptive algorithm was put forward, which was able to update fraud detection model with time- dependent data flow in order to better adapt to the shifts in fraudulent transaction patterns [24]. Blending recursive feature elimination, hyper-parameter optimization and SMOTE technology, Naoufal Rtayli developed a mixed credit card fraud detection model [5]. Other supervised learning methods, including Bayes, artificial neural network [25] and support vector machine (SVM), are also frequently used in fraud detection [16, 26, 27]. Compared with semi-supervised and unsupervised fraud detection systems, supervised systems stand out with sufficient data training time that supports establishment of well-performing models [28]. The output from detection system trained with supervised learning technique has explicit meaning and can be directly applied to mode distinguishment. In unsupervised learning, the dataset samples for constructing fraud detection model carry no tag. Instead, the unsupervised machine learning sets out to analyze data from different dimensions and resist fraud by finding out the association or difference between data [21]. For example, the GAN model could learn normal data distribution and determine whether unknown test data are normal or fraudulent samples with the proposed abnormality scoring plan. In case of insufficient label data and severely imbalanced data, unsupervised learning model will be a better choice. In addition, unsupervised learning could update model with online unlabeled data from banks or financial institutions, thus rendering it possible to detect use of fraudulent credit card. For instance, an unsupervised learning model, called Self Organizing Map (SMO), is came up with for forming an unsupervised credit card fraud detection model [29, 30]. As SMO model requires no priori information, the automated system proposed may use newly added transaction data to keep updating the model. Besides, K-means clustering algorithm sorts of transaction data according to the similarity concerning credit card fraud features and thus gets used to fraud detection model creation [31]. We have analyzed the application of ensemble to credit card fraud and related fields in recent years. A few comprehensive algorithms have become popular base learners in ensemble algorithms, among which the most noteworthy ones are Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (K-NN) and Gradient Boosting Decision Tree (GBDT). Therefore, we have built some models based on those base learners for comparison purpose. Besides, the ensemble models proposed in previous studies are limited to SMOTE and Random Oversampling when imbalanced data processing is involved. They rarely employ new techniques. This paper has applied CUS to the processing of credit card-related imbalanced data for the first time. 3. Research Design 3.1. Data The dataset used in credit card fraud detection is provided by machine learning group (http://mlg.ulb.ac.be), which could be downloaded from Kaggle (https://www.kaggle.com/mlg- ulb/creditcardfraud). In the dataset there are data concerning credit card transactions completed by Europeans in September 2013. According to it, among 284,807 transactions within two days, 492 are fraudulent. The dataset appears quite imbalanced, as positive examples (fraudulent swiping) accounts for 0.172% of total transactions. Due to privacy protection, we cannot acquire the original Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 114 functions and more background information about related data. Features V1, V2, ..., V28 are the major constituents acquired by PCA (principal component analysis), while the features undergoing no PCA conversion are “Class” and “Amount”. The feature “Class” is a response variable, which is “1” when card fraud occurs and “0” otherwise. The goal of the task is to sort out normal transaction data from abnormal ones in the dataset and predict about the test data. Table 1. Credit Card Fraud Detection Dataset. Instance number Fraud samples Normal samples Feature number 284807 492 284315 30 As shown in Table 1, there are altogether 284,807 transaction samples in the dataset. Among them, only 492 are fraud samples, accounting for 0.17% of total dataset. the proportion of normal samples to fraud ones is as high as 578:1. In other words, this dataset features extremely imbalanced positive and negative samples. If no pre-treatment is made to improve the data imbalance here and data at primitive proportion are directly put into the classifier for training, the classifier is more like to view normal samples as white noise. This would impair the performance of whole combined fraud detection system. Our data reveals data imbalance stays as a primary issue in fraud detection process. As a matter of fact, in an imbalanced dataset, we could find the training examples for one class variable are far less than that for the other one. Accordingly, the first one is called minority set while the second as majority set. When sorting imbalanced fraud detection transaction dataset, most models perform well in identifying the majority set but much less accurate in the minority one, suggesting they are not good at detecting the minority samples. In order to effectively cope with the class imbalance issue in credit card fraud data, this paper introduces CUS and combines it with RF to form machine learning-based heterogeneous ensemble. It succeeds in effectively sorting the credit card fraud data. In addition to that, we use also SMOTE technology for comparison and compare data sorting performance by controlling the base learner. SMOTE is a way of oversampling that generates random examples instead of achieving oversampling by repetition or replacement alone. Furthermore, the technology can also progressively increase the learning process of fraud detection algorithm [32]. 3.2. CUSBoost CUSBoost is a combination of CUS and AdaBoost algorithm. Like RUSBoost and SMOTE-Boost, it contains key difference in sampling technique. SMOTE-Boost employs SMOTE to sample minority examples, whereas RUSBoost chooses random under-sampling over the majority ones. Based on comparison, the CUSBoost proposed by us selects the sampling from majority class based on clustering. CUSBoost separates first the majority and minority examples from the dataset first, and then applies k-means clustering algorithm to cluster majority examples to k clusters. Here, parameter k is determined through hyper-parameter optimization. Then, 50% of examples are randomized (or tuned as per field issue or dataset) with the rest being eliminated. Random under-sampling is executed to each cluster. Since clustering is applied in prior to sampling, theoretically speaking, the algorithm is expected to perform best when dataset is highly clustered. Next, those representative samples are combined with the minority ones to form a well-balanced dataset. The strength of our algorithm is displayed on the inclusion of all the subspace examples in considering the majority class, as k-means clustering contains each example in certain cluster. Other similar methods usually fail to proper represent the majority class. In Fig. 1, the CUS proposed is used to choose the majority Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 115 examples, in which red spots indicate the examples sorted out from the majority class while the black and red spots represent all the majority examples. Figure 1. Cluster-based under-sampling (CUS) approach. 3.3. Random Forest Model Random forest (RF) is an expansion variant of Bagging, which further introduces random attribution selection into decision-making tree training on the basis of base learner [33]. More specifically, traditional decision-making tree is to choose an optimal attribute from the attribute set (suppose there are d attributes) of current node when deciding to divide the attributes. In contrast, RF randomly selects one subset containing k attributes from the attribute set of each node on the base decision-making tree, and then selects one optimal attribute from the resulting subset for division purpose. Here the parameter k decides how much randomness should be introduced. If 𝑘 = 𝑑, the base decision-making tree constructed will be nothing different from the traditional one; and if 𝑘 = 1, one attribute will be randomly selected for division. In general cases, 𝑘 = log 𝑑 is recommended. Being easily and readily realizable at low computation cost, RF has exhibited strong performance in many practical tasks. Though it is adapted from Bagging by executing minor changes, its base learner is different from that in Bagging, as the diversity of its base learner comes from both sample perturbation and attribute perturbation while that in Bagging comes from sample perturbation (of initial training set sampling) alone. Therefore, it is made possible to improve the post-ensemble generalization performance by further differentiating individual learners. RF has similar convergence as Bagging. Its initial performance could be far from being satisfactory, especially when there is only one base learner in the ensemble. However, as the number of individual learners keeps growing, RF usually will converge to lower generalization error. Notably, the training efficiency of RF is more than usual superior to that of Bagging because Bagging employs “certain” tree when constructing individual decision-making tree and has to examine all the attributes of the node in attribute classification, whereas RF adopts “random” decision-making tree Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 116 and need examine one attribute subset only [31]. In view of the strengths displayed by RF model, it is widely applied to a variety of fields including credit card fraud prediction [35–38]. 3.4. Model Ensemble In consideration of the high imbalance perplexing the dataset used here, this paper attempts to construct a model with good predicting performance on a highly imbalanced dataset. Based on CUSBoost, we replace the target of improvement for CUS-AdaBoost with RF, and thus propose a novel CUS employing Boosting (RF) and name it as CUS-RF. Different from AdaBoost, RF keeps fitting new models during the learning to generate more accurate estimation about the response variable. When constructing the decision-making tree, RF algorithm trains the tree as per the residual error of previous tree in each iteration. Finally, the output is the accumulation of all the tree classifications. Based on Boosting thought, RF serially builds several decision-making trees to predict the data. In other words, it performs gradient boosting in the space where loss function is. In details, it views to-be-resolved decision-making tree model as parameter and fits the negative gradient of loss function in current model upon each iteration to renew the parameter to minimize the loss function. RF could be considered as an extension of AdaBoost. The latter identifies problem on the basis of mis-classified data point and improves the model by adjusting the weight of such mis-classified data point, whereas RF finds out problem by negative gradient and improves the model through negative gradient computation. In fact, examples containing higher absolute value of negative gradient is going to gain high attention in subsequent training, because the resulting loss is likely to account for a large portion in the final loss function. Therefore, it is more depended on to diminish the loss. This is something shred by RF and AdaBoost. Compared with AdaBoost, RF could invoke more types of loss function and render more problems resolved [39–41]. Serving as the basis of CUS-RF, CUSBoost algorithm is based on AdaBoost algorithm itself. By introducing CUS, CUSBoost does some improvement towards the AdaBoost to better balance the class distribution, and AdaBoost improves the classifier performance in virtue of those balanced data. CUS-RF does achieve the same goal, but what is improved here is RF instead of AdaBoost. Results indicate this algorithm features quick model training and satisfactory performance. Embedding CUS algorithm into RF algorithm is both implementable and effective. By following the thought of CUSBoost in algorithm improvement, we bring CUS process into RF. 3.5. Evaluation Metrics Unfortunately, some common classifier evaluation indexes turn out to be inapplicable to imbalanced dataset regardless of their performance in dealing with balanced dataset. This happens with accuracy rate also in credit card fraud detection, a most frequently used index. It does not consider sample distribution which stays as the key issue of imbalanced dataset. Besides, accuracy rate may lead to misleading conclusion. For instance, in an imbalanced dataset, 99% of observed values are normal (meaning there is no fraud) while 1% are negative (meaning there is fraud). If taking the positive class (or majority class) prediction as the standard, this model will be deemed as having 99% of accuracy rate, because this method opts to choose the majority class and thus yield the better outcome. The ratio indicates the classifier is accurate, however it ignores the prediction of minority class (or negative class) which should be attached with highest importance in case of dataset imbalance. Thus, we need to adjust the model performance evaluation and rely on the evaluation indexes insensitive to sample distribution. We choose four evaluation indexes that are widely used Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 117 in imbalanced datasets, namely Area Under the Receiver Operating Characteristic Curve (AUC), TPR, FPR, Specificity and maximum KS value. Here we are going to sequence the samples by classifier- derived prediction outcome and predict all the samples one by one as positive examples. Each computation brings in values of two important measures which are used as horizontal and vertical coordinates for mapping. Finally, a Receiver Operating Characteristic Curve (ROC Curve) is generated, which has its vertical shaft as True Positive Rate (TPR) and horizontal one as “False Positive Rate (FPR)”. Definitions of TPR and FPR are offered below: 𝑇𝑃𝑅 = (1) 𝐹𝑃𝑅 = (2) When classifiers are compared in term of performance, if one classifier’s ROC curve gets completely covered by that of another, it can be assumed that the latter outperforms the former. If two classifiers have their ROC curves overlap each other, it is hard to tell which one is better. In such case, we employ AUC (Area Under ROC Curve) to determine the inter-model difference in performance. AUC is the sum of all the areas under ROC. Suppose ROC is formed with points with coordinates {(𝑥 ,𝑦 ),(𝑥 ,𝑦 ),∙∙∙,(𝑥 ,𝑦 )} that are connected in order, then 𝐴𝑈𝐶 = ∑ (𝑥 − 𝑥 ) ∙ (𝑦 +𝑦 ) (3) Specificity is the ratio by which no fraud is predicted as non-fraud. It can be figured out according to following formula: 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁 (𝑇𝑁 + 𝐹𝑃)⁄ (4) At last, after all the models gain the optimal hyper-parameters after parameter tuning, we re- train the models with such hyper-parameters on training and development sets before testing again the retrained models on test set. Afterwards, five indexes are invoked to evaluate the model performance respectively, as this is an effective index for imbalanced dataset. 4. Experimental Results and Discussion 4.1. Feature Selection After finishing data preprocessing, we need to input meaningful features into the machine learning algorithm and model for training purpose. In general sense, features should be weighed from two aspects, whether the feature diverges and whether feature remains related to the goal. If a feature does not diverge, such as having a variance nearing 0, it means the samples don’t differ from each other in this feature, and the feature is of little use to sample distinguishment. As for the second aspect which is more apparent, the feature highly concerned with the goal should be prioritized. Except for variance method, all other methods introduced in this paper set out from relevance. The feature selection methods could be divided into three types by feature selection form, namely Filter, Wrapper and Embedded. Filter method scores every feature as per its divergence or relevance and sets threshold or number of threshold candidates to screen the features. Wrapper method selects a few features or eliminate some upon each time by referring to the objective function (or prediction effect score in usual cases). For Embedded method, features are trained first with certain machine learning algorithms and models to figure out their weight coefficients, and then features are selected based on the resulting coefficients in a descending order. Though having something common with Filter, the Embedded determines whether a feature is good or not through training. We select features based on XGBoost-based approach in the Feature Selection library of Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 118 Sklearn and then apply Filter method to generate a database that contains less samples but is more related to the sample type. 4.2. Parameter Tuning To train models with better performance, we perform further optimization with parameters from such models as CUS-GBDT, SMOTE-GBDT, CUS-RF and SMOTE-RF in order to exert their best predicting performance. The post-optimization parameters are listed in Table 2. Table 2. Optimized parameters of CUS-GBDT and XGBoost. Classifier Parameter Description CUS-GBDT SMOTE-GBDT max_depth Maximum depth of the individual regression estimators. n_estimators The number of boosting stages to perform. subsample The fraction of samples to be used for fitting the individual base learners. loss Loss function to be optimized. CUS-RF SMOTE-RF n_estimators The number of boosting stages to perform. max_depth Maximum depth of the individual regression estimators. scale_pos_weight The weight of positive samples. 4.3. Benchmark Models Table 3. A list of the proposed method and benchmark methods. No. Method Description 1 CUS-GBDT Based on CUS-GBDT, with CUS for preprocessing class imbalance data and GBDT for sample classification. 2 CUS-KNN Based on CUS-KNN, with CUS for preprocessing class imbalance data and KNN for sample classification. 3 CUS-LR Based on CUS-LR, with CUS for preprocessing class imbalance data and LR for sample classification. 4 CUS-RF Based on CUS-RF, with CUS for preprocessing class imbalance data and RF for sample classification. 5 CUS-SVM Based on CUS-GBDT, with CUS for preprocessing class imbalance data and GBDT for sample classification. 6 SMOTE-GBDT Based on SMOTE-GBDT, with SMOTE for preprocessing class imbalance data and GBDT for sample classification. 7 SMOTE-KNN Based on SMOTE-KNN, with SMOTE for preprocessing class imbalance data and KNN for sample classification. 8 SMOTE-LR Based on SMOTE-LR, with SMOTE for preprocessing class imbalance data and LR for sample classification. 9 SMOTE-RF Based on SMOTE-RF, with SMOTE for preprocessing class imbalance data and RF for sample classification. 10 SMOTE-SVM Based on SMOTE-SVM, with SMOTE for preprocessing class imbalance data and SVM for sample classification. Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 119 Table 3 briefly lists the proposed method and benchmark methods. In the experiment, we adopt GBDT, KNN, LR, RF and SVM as base learners to ensemble with CUS and SMOTE respectively to build heterogeneous ensemble models to determine which imbalanced data processing method or classifier performs well in dealing with imbalanced dataset containing credit card fraud data. 4.4. Results and Discussion The data screened with feature selection method are brought into the heterogeneous ensemble models composed of imbalanced data processing and classifier. The models are appraised in terms of five indexes, namely AUC, TPR, FPR, specificity and precision. The experimental findings are illustrated in Fig. 2 and 3 and Table 4. In this paper, some common classifiers in credit card fraud field are selected as component classifiers for the heterogeneous ensemble models and then combined with two ways of imbalanced data processing CUS and SMOTE in the hope to determine which processing method outperforms and which classifier is better for classifying the credit card fraud. Table 4. Performance between fraud detection models and benchmark models. Models AUC TPR FPR Specificity Precision CUS-GBDT 0.964477 0.938776 0.107574 0.892426 0.892426 CUS-KNN 0.9999 1 0.001565 0.998435 0.998435 CUS-LR 0.860443 1 1 0 0 CUS-RF 0.999912 1 0.001565 0.998435 0.998435 CUS-SVM 0.943549 0.785714 0.001811 0.998189 0.998189 SMOTE-GBDT 0.960775 0.816327 0.030758 0.969242 0.969242 SMOTE-KNN 0.999804 1 0.002 0.998048 0.998048 SMOTE-LR 0.85719 1 1 0 0 SMOTE-RF 0.999881 1 0.001864 0.998136 0.998136 SMOTE-SVM 0.580702 0.555556 0.389474 0.610526 0.610526 Figure 2. The column graph of the values of performance measures (CUS). In the evaluation index system established by this paper, AUC is used to measure model’s comprehensive performance in identifying credit card fraud samples, TPR could figure out the accuracy rate of model in determining sample company’s financial risk, FPR acts to assess the probability for a listed company sample free from financial risk to be wrongfully identified by the model, specificity measures the accuracy rate of a model in determining whether the sample company 0 0.2 0.4 0.6 0.8 1 1.2 CUS-gbdt CUS-KNN CUS-LR CUS-RF CUS-SVM AUC TPR FPR precision specificity Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 120 contains financial risk, and precision indicates the ratio of correctly classified positive samples in the positive samples identified by classifier. Among those five indexes, only FPR score is negatively related to the model’s performance in identifying list company samples free from financial risk, while the scores of rests six methods are all positively related to the model’s performance in the aspect involved. Judged from the experimental findings, among five CUS-based heterogeneous ensemble models, four models (incl. CUS-KNN, CUS-LR, CUS-RF and CUS-SVM) are either better than or equal to the SMOTE-based heterogeneous ensemble models in terms of five evaluation indexes, suggesting CUS enjoys great strengths in dealing with imbalanced credit card fraud data when compared with SMOTE. In the meanwhile, the findings reveal the CUS-RF model proposed in this paper harvests the best scores in five indexes, indicating the improved model has great potential for application to credit card fraud data classification. Figure 3. The column graph of the values of performance measures (SMOTE). 5. Conclusions It is still quite tough and challenging to identify the credit card fraud samples. Through the work here, we aim to introduce novel imbalanced data processing techniques to improving the machine learning model’s predicting performance during credit card fraud detection. Based on the heterogeneous ensemble principle, we have introduced CUS and RF to prediction model creation. To verify the experimental findings, we have also embedded the frequently used classifiers in this field into CUS to generate comparative models. In addition, by keeping classifier unchanged, we also create comparative models with SMOTE to prove the superiority of CUS over SMOTE in classifying imbalanced credit card fraud data. In the future, it is planned to further testify the reliability of the model with more complicated data from the real world and develop a self-adaptive credit card fraud detection system. Acknowledgments: We would like to thank Xu-dong Du, a doctoral candidate at the School of Management, Hefei University of Technology, for providing experimental support for this paper. Funding: This research was funded by the Philosophy and Social Science Planning Project of Anhui Province (National Social Science Fund Incubation Project), grant number AHSKF2021D07. 0 0.2 0.4 0.6 0.8 1 1.2 SMOTE-gbdt SMOTE-KNN SMOTE-LR SMOTE-RF SMOTE-SVM AUC TPR FPR precision specificity Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 121 Conflicts of Interest: The authors declare that they have no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. References [1] Singh A, Ranjan RK, Tiwari A. Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms[J]. Journal of Experimental and Theoretical Artificial Intelligence, 2021,34:571-598. DOI: https://doi.org/10.1080/0952813X.2021.1907795. [2] El-Naby AA, Hemdan EED, El-Sayed A. An efficient fraud detection framework with credit card imbalanced data in financial services[J]. Multimedia Tools and Applications, 2022. DOI: https://doi.org/10.1007/s11042-022-13434-6. [3] Makki S, Assaghir Z, Taher Y, Haque R, Hacid M, Zeineddine H. An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection[J]. IEEE Access, 2019,7:93010-93022. DOI: https://doi.org/10.1109/ACCESS.2019.2927266. [4] Sun J, Lang J, Fujita H, Li H. Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Information Sciences, 2018,425:76–91. DOI: https://doi.org/10.1016/j.ins.2017.10.017. [5] Rtayli N, Enneya N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization[J]. Journal of Information Security and Applications, 2020,55:102596. DOI: https://doi.org/10.1016/j.jisa.2020.102596. [6] Sun J, Fujita H, Zheng Y, Ai W. Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods[J]. Information Sciences, 2021,559:153–170. DOI: https://doi.org/10.1016/j.ins.2021.01.059. [7] Shen F, Liu Y, Wang R, Zhou W. A dynamic financial distress forecast model with multiple forecast results under unbalanced data environment[J]. Knowledge-Based Systems, 2020,192:105365. DOI: https://doi.org/10.1016/j.knosys.2019.105365. [8] Sun J, Li H, Fujita H, et al. Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion, 2020,54:128–144. DOI: https://doi.org/10.1016/j.inffus.2019.07.006. [9] Du X, Li W, Ruan S, Li L. CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection[J]. Applied Soft Computing Journal, 2020,97. DOI: https://doi.org/10.1016/j.asoc.2020.106758. [10] Li W, Ding S, Chen Y, et al. Transfer learning-based default prediction model for consumer credit in China[J]. Journal of Supercomputing, 2019,75:862–884. DOI: https://doi.org/10.1007/s11227-018-2619-8. [11] Khan A, Rehman HU, Habib U, Ijaz U. Detecting N6-methyladenosine sites from RNA transcriptomes using random forest[J]. Journal of Computational Science, 2020,47:101238. DOI: https://doi.org/10.1016/j.jocs.2020.101238. [12] Laidroo L, Koroleva E, Kliber A, et al. Business models of FinTechs – Difference in similarity[J]. Electronic Commerce Research and Applications, 2021,46:101034. DOI: https://doi.org/10.1016/j.elerap.2021.101034. [13] Bollaert H, Lopez-de-Silanes F, Schwienbacher A. Fintech and access to finance[J]. Journal of Corporate Finance, 2021,68:101941. DOI: https://doi.org/10.1016/j.jcorpfin.2021.101941. [14] West J, Bhattacharya M. Intelligent financial fraud detection: A comprehensive review[J]. Computers and Security, 2016,57:47–66. DOI: https://doi.org/10.1016/j.cose.2015.09.005. [15] Azevedo C da S, Gonçalves RF, Gava VL, Spinola M de M. A Benford’s Law based methodology for fraud detection in social welfare programs: Bolsa Familia analysis[J]. Physica A: Statistical Mechanics and its Applications, 2021,567:125626. DOI: https://doi.org/10.1016/j.physa.2020.125626. [16] Forough J, Momtazi S. Ensemble of deep sequential models for credit card fraud detection[J]. Applied Soft Computing, 2021,99:106883. DOI: https://doi.org/10.1016/j.asoc.2020.106883. [17] Wang D, Chen B, Chen J. Credit card fraud detection strategies with consumer incentives[J]. Omega, 2019,88:179–195. DOI: https://doi.org/10.1016/j.omega.2018.07.001. [18] Soltani Halvaiee N, Akbari MK. A novel model for credit card fraud detection using Artificial Immune Systems[J]. Applied Soft Computing Journal, 2014,24:40–49. DOI: https://doi.org/10.1016/j.asoc.2014.06.042. Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 122 [19] Correa Bahnsen A, Aouada D, Stojanovic A, Ottersten B. Feature engineering strategies for credit card fraud detection[J]. Expert Systems with Applications, 2016,51:134–142. DOI: https://doi.org/10.1016/j.eswa.2015.12.030. [20] Zhang X, Han Y, Xu W, Wang Q. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture[J]. Information Sciences, 2021,557:302–316. DOI: https://doi.org/10.1016/j.ins.2019.05.023. [21] Carcillo F, le Borgne YA, Caelen O, et al. Combining unsupervised and supervised learning in credit card fraud detection[J]. Information Sciences, 2021,557:317–331. DOI: https://doi.org/10.1016/j.ins.2019.05.042. [22] Błaszczyński J, de Almeida Filho AT, Matuszyk A, et al. Auto loan fraud detection using dominance-based rough set approach versus machine learning methods[J]. Expert Systems with Applications, 2021,163. DOI: https://doi.org/10.1016/j.eswa.2020.113740. [23] Soemers DJNJ, Brys T, Driessens K, et al. Adapting to concept drift in credit card transaction data streams using contextual bandits and decision trees[C]. Proceedings of the 30th Innovative Applications of Artificial Intelligence Conference, New Orleans Louisiana, USA, February 2 – 7,2018, IAAI 2018 7831–7836. [24] Wang Z, Jiang C, Zhao H, Ding Y. Mining Semantic Soft Factors for Credit Risk Evaluation in Peer-to-Peer Lending[J]. Journal of Management Information Systems, 2020,37:282–308. DOI: https://doi.org/10.1080/07421222.2019.1705513. [25] Pang X, Zhou Y, Wang P, et al. An innovative neural network approach for stock market prediction[J]. Journal of Supercomputing, 2020,76:2098–2118. DOI: https://doi.org/10.1007/s11227-017-2228-y. [26] Craja P, Kim A, Lessmann S. Deep learning for detecting financial statement fraud[J]. Decision Support Systems, 2020,139:113421. DOI: https://doi.org/10.1016/j.dss.2020.113421. [27] Chen YJ, Wu CH, Chen YM, et al. Enhancement of fraud detection for narratives in annual reports[J]. International Journal of Accounting Information Systems, 2017,26:32–45. DOI: https://doi.org/10.1016/j.accinf.2017.06.004. [28] Baesens B, Höppner S, Verdonck T. Data engineering for fraud detection[J]. Decision Support Systems, 2021,150. DOI: https://doi.org/10.1016/j.dss.2021.113492. [29] Olszewski D. Fraud detection using self-organizing map visualizing the user profiles[J]. Knowledge-Based Systems, 2014,70:324–334. DOI: https://doi.org/10.1016/j.knosys.2014.07.008. [30] Zaslavsky V, Strizhak A. Credit Card Fraud Detection Using Self-Organizing Maps[J]. Information & Security: An International Journal, 2006,18:48–63. DOI: https://doi.org/10.11610/isij.1803. [31] Srivastava A, Kundu A, Sural S, Majumdar AK. Credit card fraud detection using Hidden Markov Model[J]. IEEE Transactions on Dependable and Secure Computing, 2008,5:37–48. DOI: https://doi.org/10.1109/TDSC.2007.70228. [32] Gianini G, Ghemmogne Fossi L, Mio C, et al. Managing a pool of rules for credit card fraud detection by a Game Theory based approach[J]. Future Generation Computer Systems, 2020,102:549–561. DOI: https://doi.org/10.1016/j.future.2019.08.028. [33] Akila S, Srinivasulu Reddy U. Cost-sensitive Risk Induced Bayesian Inference Bagging (RIBIB) for credit card fraud detection[J]. Journal of Computational Science, 2018,27:247–254. DOI: https://doi.org/10.1016/j.jocs.2018.06.009. [34] Zhou J, Li W, Wang J, et al. Default prediction in P2P lending from high-dimensional data based on machine learning[J]. Physica A: Statistical Mechanics and its Applications, 2019,534:122370. DOI: https://doi.org/10.1016/j.physa.2019.122370. [35] Jurgovsky J, Granitzer M, Ziegler K, et al. Sequence classification for credit-card fraud detection[J]. Expert Systems with Applications, 2018,100:234–245. DOI: https://doi.org/10.1016/j.eswa.2018.01.037. [36] Huang YP, Yen MF. A new perspective of performance comparison among machine learning algorithms for financial distress prediction[J]. Applied Soft Computing Journal, 2019,83:105663. DOI: https://doi.org/10.1016/j.asoc.2019.105663. [37] Ashraf S, Félix EGS, Serrasqueiro Z. Development and testing of an augmented distress prediction model: A comparative study on a developed and an emerging market[J]. Journal of Multinational Financial Management, 2020,57–58,100659. DOI: https://doi.org/10.1016/j.mulfin.2020.100659. [38] Petropoulos A, Siakoulis V, Stavroulakis E, Vlachogiannakis NE. Predicting bank insolvencies using machine learning techniques[J]. International Journal of Forecasting, 2020,36:1092–1113. DOI: https://doi.org/10.1016/j.ijforecast.2019.11.005. Wei Li, Cheng-shu Wu and Su-mei Ruan / Journal of Risk Analysis and Crisis Response, 2022, 12(3), 110-123 DOI: https://doi.org/10.54560/jracr.v12i3.332 123 [39] Pradeepkumar D, Ravi V. Forecasting financial time series volatility using Particle Swarm Optimization trained Quantile Regression Neural Network[J]. Applied Soft Computing Journal, 2017,58:35–52. DOI: https://doi.org/10.1016/j.asoc.2017.04.014. [40] Jones S, Johnstone D, Wilson R. An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes[J]. Journal of Banking and Finance, 2015,56:72–85. DOI: https://doi.org/10.1016/j.jbankfin.2015.02.006. [41] He Y, Zhang W. Probability density forecasting of wind power based on multi-core parallel quantile regression neural network[J]. Knowledge-Based Systems, 2020,209:106431. DOI: https://doi.org/10.1016/j.knosys.2020.10643. Copyright © 2022 by the authors. This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).