Microsoft Word - 3-V8N2(2023)-AITI#8364(111-120).docx Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 Short-Term Rainfall Prediction Using Supervised Machine Learning Nusrat Jahan Prottasha1,*, Anik Tahabilder2, Md Kowsher3, Md Shanon Mia1, Khadiza Tul Kobra1 1Department of Computer Science, Daffodil International University, Dhaka, Bangladesh 2Department of Computer Science, Wayne State University, Detroit, Michigan, USA 3Department of Computer Science, Stevens Institute of Technology, Hoboken, New Jersey, USA Received 30 August 2021; received in revised form 16 May 2022; accepted 02 June 2022 DOI: https://doi.org/10.46604/aiti.2023.8364 Abstract Floods and rain significantly impact the economy of many agricultural countries in the world. Early prediction of rain and floods can dramatically help prevent natural disaster damage. This paper presents a machine learning and data-driven method that can accurately predict short-term rainfall. Various machine learning classification algorithms have been implemented on an Australian weather dataset to train and develop an accurate and reliable model. To choose the best suitable prediction model, diverse machine learning algorithms have been applied for classification as well. Eventually, the performance of the models has been compared based on standard performance measurement metrics. The finding shows that the hist gradient boosting classifier has given the highest accuracy of 91%, with a good F1 value and receiver operating characteristic, the area under the curve score. Keywords: rain prediction, machine learning, supervised classification, agriculture resource, crops yield 1. Introduction Agriculture plays a vital role in the development of many developing countries [1]. IoT-based smart agriculture model is being implemented worldwide to increase crop yields. The use of intelligent tools in farming can increase the production of crops and also minimize the damage due to disasters. The economy of South Asian countries, including Bangladesh, India, China, and Pakistan, depends more on agriculture. But there are always some natural disasters, including rain and floods, that create huge demolition of crops and property. Therefore, a good rain prediction model is necessary to forecast the rain to reduce the risk to life and also to maintain the agriculture farms in a better way. In addition, a rain prediction model helps farmers take early flood measurements and properly manage water resources. Observing the significance of rain prediction, researchers have developed a lot of devices to predict rainfall, but none of them is worth noting in terms of short-term rain prediction. Hence, it has not been adopted eventually by the end-level user to forecast the rain situation. However, machine learning techniques can make a more accurate prediction because of their underlying technology. Researchers have implemented neural networks (NN) in rainfall prediction and showed that the NN-based model usually exceeds the performance of the numerical weather prediction model. This study aims to develop a short-term rain prediction model that can effectively and accurately predict rainfall. In this proposed work, several relevant machine learning models have been used to predict rainfall, and finally, a performance comparison has been made to determine the best suitable model. * Corresponding author. E-mail address: jahannusratprotta@gmail.com Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 112 In this project, the twenty-nine most optimistic classifiers have been used from eleven different categories. All these models have been trained and tested with a relevant rainfall dataset to implement this prediction model. The data was collected from a popular and recognized public repository and split into training, validation, and testing data. Since the raw data came from natural weather resources, it has been preprocessed before going to the training phase. A few preprocessing techniques have been implemented to prepare the raw data, such as missing value check, feature selection, features scaling, dimension reduction, etc. After analyzing all the models and comparing them, it was found that the hist gradient boosting classifier (HGBC) has shown the highest accuracy of 91%. A lot of other models have shown the second-highest accuracy of 90%. The contribution of this paper can be summarized below: (1) A pipeline for estimating rain prediction has been developed. (2) Diverse types of classifiers have been used to ensure the best model that suits diverse types of data. (3) A comparison among all trained models has been described to measure the comparative performance. The rest of the sections of this paper is organized as follows. Section 2 explains the related work of various classification techniques for rainfall prediction. Section 3 describes the major technology components used, including the dataset, preprocessing, and algorithm. Section 4 describes the methodology that has been used to solve the proposed problem. Later on, Section 5 contains the experiments and the results. This article is wrapped up in section 6 by discussing the conclusion and future works. 2. Related Work A country’s agriculture largely depends on rain, and there is a lot of research on forecasting rain. All the earlier methods of rain forecasting are mainly statistical and numerical analysis based [1]. Also, some methods predict the rain by analyzing radar images. Denœux and Rizand [2] have proposed a model that performs deep learning-based analysis on radar images to predict rainfall. However, with the recent advancement of machine learning, many new machine learning-based models have been proposed for rainfall prediction. Researchers like Shah et al. [3] have developed a simple polynomial regression-based model to predict the rain to benefit agricultural products. Asha et al. [4] have proposed a hybrid machine learning classification model for predicting rainfall, and it has shown better performance than the ordinary ml-based model. Sakthivel and Thailambal [5] have also demonstrated such a hybrid approach for rain prediction, predicting continuous long period rainfall. Naidu et al. [6] presented the changes in rainfall patterns in numerous agro-climatic zones using machine learning approaches. Besides, Dinh et al. [7] have used a support vector machine (SVM)-based method to measure the rain forecast and the soil erosion due to the rain. On the other hand, Abdel-Kader et al. [8] showed a vigorous hybrid technique by particle swarm optimization (PSO) and multi-layer perceptron (MLP) for the prediction of rainfall. Also, Samsiahsani et al. [9] evaluated many machine learning classifiers based on Malaysian data for rainfall prediction. Similar models have been developed to predict the flood forecast due to heavy rainfall and ocean waves. Luk et al. [10] mentioned data scarcity as a limitation in modeling such a predictive model. Abbot and Marohasy [11] have shown the application of NN in rainfall prediction based on a dataset from Queensland, Australia. Among the short-term prediction models, a model by Shah et al. [12], is a good invention for predicting very short-term rainfall. On the other hand, some researchers focused on heavy rainfall only. Research by Sangiorgioet al. [13] has made improvements in determining heavy rainfall based on water vapor measurement using a NN-based model. Han et al. [14] have mentioned the limitations of such a predictive model for forecasting rainfall and flood by determining the major uncertainties. Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 113 Unlike those works, in this project, several rain-forecasting models for upcoming rain prediction have been developed using twenty-nine different machine learning classifiers. Additionally, their performances have also been compared, and hence the best machine learning model for rain prediction has been determined. 3. Dataset and Algorithm Description The structural dataset and algorithms are a machine learning model’s two most essential parts. This section will provide a brief description of the dataset and algorithm. The features of the dataset will be discussed in more details form. Then the data preprocessing steps will also be explained in the subsequent section. Finally, all the algorithms that have been used to make the models will be described in brief. 3.1. Dataset To implement the proposed model, the rain prediction database from Kaggle has been used. The dataset [15] comprises the precipitation estimation from the years 1901 to 2015 for each state of Australia. Each observation contains 19 qualities (person-months, annual, and combinations of 3 continuous months) for subdivisions. The rain prediction unit mentioned in the dataset is measured in millimeters (mm). Table 1 summarizes the dataset and its features. The feature name and the description of the feature are shown in the left column and the right-side column, respectively. The dataset is robust and has an adequate number of observations and features, which ensures the quality of the data and guarantees a model with good accuracy if modeled properly. Table 1 The description of the dataset Feature Details description of the feature Location This is the name of the location MaxTemp Max temperature recorded in degrees centigrade MinTemp Min temperature recorded in degrees centigrade WindGustSpeed The speed of wind gust in km per hour WindGustDir The direction of the strongest wind gust WindSpeed9am Wind speed averaged over 10 minutes before 9 a.m. WindDir9am The direction of the wind gust at 9 a.m. WindSpeed3pm Averaged wind speed over 10 min before 3 p.m. WindDir3pm The direction of the wind gust at 3 p.m. Humidity9am Relative humidity at 9 a.m. measured in percentage Humidity3pm Relative humidity at 3 p.m. measured in percentage Temp9am The temperature at 9 a.m. measured in degrees Celsius Temp3pm The temperature at 3 p.m. measured in degrees Celsius Pressure9am Atmospheric pressure at 9 a.m. measured in hPa Pressure3pm Atmospheric pressure at 3 p.m. measured in hPa Rainfall The rainfall recorded for the day (mm) RainToday If precipitation in the 24 h to 9 a.m. exceeds 1 mm it will be 1, otherwise 0 (mm) Rain tomorrow If it will rain or not on the next day, it is output as binary target variable 3.2. Pre-processing Data preprocessing is a crucial step that helps improve the quality of data to advance the extraction of important bits of knowledge from the data. It refers to preparing the raw data in a format that will be neat, clean, and understandable to the machine learning model. Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 114 The flowchart of data preprocessing steps is illustrated in Fig. 1. in the subsequent section. After the data is collected, it goes through a cleaning process, and then the missing values are handled logically. Then the data encoding is performed, and important features get selected. Eventually feature scaling process is used to bring all the features on a similar scale, and 10-fold cross-validation (CV) is implemented. Each stage of data preprocessing is described elaborately below. Data cleaning is the method of preparing data for analysis by removing or adjusting incorrect, fragmented, unimportant, duplicated, or improperly organized data. It includes further tuning of data by fixing spelling, and syntax errors, and adjusting mistakes, such as dealing with empty data, invalid values, and recognizing duplicate data points. There were some missing, some duplicates, some invalid, and some incomplete values in this dataset. The following actions have been taken to correct those data. (1) A part of the data points is repeated in row and column segments. Hence, all the duplicate data was removed, and only a single instance was kept. (2) A few rows and columns were almost empty. That corresponding row or column has been removed from the dataset. (3) Some instances were shown to have some invalid values. That instance was also deleted to make the dataset solid and readable. Fig. 1 The entire data preprocessing procedure Missing values are a common occurrence in the dataset. Therefore, they need to be handled to prepare the dataset properly. If the data cannot pass the statistical test, then it is removed from the dataset. Besides, most predictive algorithms can't handle any missing value. Hence, this issue must be solved before the data is fed into the model. In most machine learning work, people utilize various techniques such as mean, median, and mode methods to handle missing values. But the most appropriate method for managing missing data is to remove the full row for categorical features and replace the missing data with the nearest neighbors for numerical data. The k-nearest neighbors (KNN)-based imputation method has been implemented in this work for a more accurate missing value imputation. The not-a-number (NaN) data has been replaced by getting the nearest value by considering the three neighbors. Categorical data could be a subjective feature whose values are taken by label encoding or one-hot representation. It implies that categorical data must be encoded into numbers before it is fed to the model. In the dataset, there are six categorical factors “Location,” “WindGustDir,” “WindDir9am,” “WindDir3pm,” “RainToday,” and “Rain tomorrow.” One hot encoding is one of the most popular methods for encoding categorical variables, which has been used in this project. It is one of the widespread approaches, and it works well unless the categorical variables count is too high. It makes a new binary column for each category, demonstrating the presence of each possible value from the categorical data. Feature selection is the strategy to figure out highly related input variables when creating a predictive model. Reducing the number of input variables is desirable to reduce the computational cost in modeling and increase the model’s performance. The dataset contains 21 components, and the p-value has been tested to check the probability of the null hypothesis. Features Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 115 with a p-value less than 0.05 have been discarded. After checking multicollinearity, a vital separation is maintained from those components which appear redundant and don’t back the p-value assumption. Besides, to handle the numerical feature, the Pearson correlation coefficient has been used, which is characterized within Condition-1, and for categorical features, ANOVA-F has been used, which is described as: ( ) ( ) ( ) ( ) 1 2 2 1 1 n i i i n n i i i i x x y y r x x y y = = = − − = − −    (1) After performing the feature selection, eighteen features were kept, including Location, MinTemp, MaxTemp, Rainfall, WindGustDir, WindGustSpeed, WindDir9am, WindSpeed9am, WindSpeed3pm, Humidity9am, Humidity3pm, Pressure9am, Pressure3pm, Temp9am, Temp3pm, RainToday, and Rain tomorrow. The numerical data are mostly skewed or nonstandard deviation in data analysis due to outliers, multi distributions, exceptionally exponential distributions, and more. This issue was solved by changing the numeric value into a categorical feature. To implement this method, a discretization process that changes over the numerical value into different distribution work has been applied as shown below: ( ) ( ) ( ) ( ) 2 1 2 1 / 1 / n K G i n K G i n x x K F x x N K = = − − = − −   (2) Feature scaling is one of the significant procedures required to standardize the independent features of the working dataset. There are different strategies for feature scaling such as min-max scaling, variance scaling, standardization, mean normalization, unit vectors, etc. In this work, min-max scaling has been used as a feature scaling method, and the exchange range has been set between 0 and 1. Mathematical formulla of the min-max scaling has been described as: ( ) ( ) ( ) min max min x x x x x − ′ = − (3) CV is a resampling method to train, test, and validate the model using different data in each iteration. There are lots of CV methods to perform this operation. This paper uses a 10-fold CV technique where the whole dataset is separated into ten folds. Each section is used either for training, validation, or test set. After data preprocessing, there were 142193 rows as samples and 17 columns as features. 3.3. Algorithms A few of the most suitable machine learning and deep learning-based models have been implemented to build the best model. Some of them are neighbor relationship-based, some are naive Bayes (NB) theory-based, some are based on SVM, and so on. Table 2 shows a summary of all the algorithms that have been used in this research. For the KNN, five neighbors have been considered to find the similarity. For MLP, the learning rate is 0.001 with two hidden layers of 56 units. For radius learning, the radius value was set to 0.1. The Gini index has been used to split decisions between the decision tree and the random forest tree. Table 2 The summary of the algorithm used Methodology Algorithms Methodology Algorithms Neighbors classifier k-nearest neighbors Discriminant analysis classifier Linear discriminant analysis Radius neighbors classifier Quadratic discriminant analysis Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 116 Table 2 The summary of the algorithm used (continued) Methodology Algorithms Methodology Algorithms Neighbors classifier Nearest centroid Support vector machine classifiers Linear SVC Ensemble classifiers AdaBoost classifier Linear SVC Bagging classifier Nu SVC Gradient boosting classifier Linear model Stochastic gradient descent classifier Hist gradient boosting classifier Ridge classifier Random forest classifier Ridge classifier CV Naive Bayes classifiers Bernoulli NB Passive aggressive classifier Multinomial NB Logistic regression CV Categorical NB Logistic regression Complement NB Perceptron Gaussian NB Impact learning [16] Semi-supervised classifiers Label propagation Gaussian process classifiers Radial basis function Label spreading Neural network MLP classifier 4. Methodology To complete the whole workflow, a total of four steps have been executed: data collection and preprocessing, training model using supervised learning methods, testing, and performance analysis. A popular and acceptable dataset from the Kaggle platform [11] has been adopted for this project. This dataset was split into three parts: the training part, the validation part, and the testing part. After gathering all raw data, the dataset goes through data preprocessing steps, which have been used to make the dataset outliers-free and more solid. These data preprocessing steps also help in increasing the performance of the models [17]. As a result, diverse preprocessing methods such as cleaning data, missing value checks, handling the categorical data, feature selection, and feature scaling have been applied. The machine learning models are made by training with the data preprocessed earlier. The training and testing methods of all those models are different. From all the training methods, 29 classifiers have been used so that the performance can be compared and the best suitable model can be selected. Most of the models listed in the table showed good performance, but some didn’t fit very well. The complete methodology of this proposed model is shown in Fig. 2. Fig. 2 The overview of the methodology of the proposed work 5. Experiments and Results The model was built and then trained with the preprocessed dataset. In this section, the performances of all the algorithms have been compared. Besides, various experimental parameter has been tuned for performance analysis and evaluation. In addition, the experimental setup to accomplish the entire task has also been described. For this model, 11 statistical performance metrics have been considered for performance analysis and comparison. Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 117 The whole work has been completed in Google colab, and python has been provided as a simulation environment by Google. A machine learning framework named sci-kit learns and deep learning framework Keras have also been used to implement the classification algorithm. In addition, the Matplotlib library for data visualization, graphical representation, and data analysis has been used in this project. Table 3 The summary of the algorithms that have been used in this research Model Accuracy F1 score Rs PS FBS HL JS MC AUC BAC CKS Neighbors classifier KNC 0.868 0.752 0.672 0.749 0.809 0.202 0.604 0.468 0.749 0.689 0.411 NC 0.813 0.719 0.663 0.661 0.749 0.257 0.561 0.378 0.74 0.68 0.338 RNC 0.803 0.708 0.654 0.648 0.736 0.267 0.55 0.356 0.731 0.671 0.316 Ensemble classifiers ADC 0.893 0.802 0.72 0.791 0.856 0.177 0.656 0.559 0.797 0.737 0.507 BC 0.894 0.802 0.72 0.793 0.857 0.176 0.657 0.561 0.797 0.737 0.508 GBC 0.904 0.818 0.734 0.812 0.875 0.166 0.675 0.594 0.811 0.751 0.54 HGBC 0.91 0.833 0.751 0.817 0.885 0.16 0.692 0.619 0.828 0.768 0.569 RFC 0.907 0.823 0.737 0.819 0.881 0.163 0.68 0.604 0.814 0.754 0.549 Naive Bayes classifiers MNB 0.811 0.718 0.662 0.659 0.747 0.259 0.56 0.376 0.739 0.679 0.335 CoNB 0.733 0.686 0.679 0.63 0.712 0.337 0.512 0.36 0.756 0.696 0.297 CNB 0.731 0.686 0.683 0.632 0.713 0.339 0.512 0.365 0.76 0.7 0.299 GNB 0.695 0.66 0.669 0.618 0.692 0.375 0.482 0.336 0.746 0.686 0.263 CC 0.899 0.813 0.731 0.799 0.866 0.171 0.668 0.58 0.808 0.748 0.529 Semi-supervised classifiers LP 0.884 0.787 0.712 0.758 0.832 0.186 0.64 0.521 0.789 0.729 0.476 LS 0.902 0.816 0.733 0.808 0.872 0.168 0.673 0.589 0.81 0.75 0.536 Discriminant analysis classifier LDA 0.901 0.817 0.736 0.802 0.869 0.169 0.673 0.588 0.813 0.753 0.537 QDA 0.708 0.656 0.641 0.603 0.684 0.362 0.483 0.294 0.718 0.658 0.237 SVM classifiers LSVC 0.902 0.812 0.727 0.811 0.872 0.168 0.669 0.586 0.804 0.744 0.529 NuSVC 0.899 0.809 0.726 0.802 0.865 0.171 0.665 0.576 0.803 0.743 0.522 SGDC 0.894 0.795 0.709 0.8 0.857 0.176 0.65 0.555 0.786 0.726 0.496 RdC 0.9 0.802 0.714 0.813 0.867 0.17 0.658 0.572 0.791 0.731 0.511 RdCV 0.9 0.802 0.714 0.813 0.867 0.17 0.659 0.572 0.791 0.731 0.511 PAC 0.85 0.623 0.568 0.767 0.709 0.22 0.506 0.322 0.645 0.585 0.202 LRCV 0.9 0.815 0.733 0.801 0.868 0.17 0.671 0.584 0.81 0.75 0.533 LR 0.901 0.815 0.733 0.802 0.869 0.169 0.672 0.585 0.81 0.75 0.534 Pr 0.804 0.716 0.666 0.653 0.742 0.266 0.556 0.373 0.743 0.683 0.332 IL 0.902 0.813 0.727 0.811 0.872 0.168 0.669 0.586 0.804 0.744 0.53 Gaussian process classifiers GPC 0.903 0.818 0.736 0.807 0.873 0.167 0.675 0.592 0.813 0.753 0.54 Neural network classifier MLPC 0.906 0.833 0.757 0.805 0.879 0.164 0.691 0.613 0.834 0.774 0.568 Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 118 Here, the twenty-nine most suitable machine learning models have been used to predict the rainfall possibility. A total of 11 statistical measurements have been considered and listed in Table 3. The table shows that the HGBC has predicted the best accuracy of 0.91, and the F1 score is 0.833. The random forest tree classifier has obtained the best accuracy of 0.907 with an F1 score is 0.823 from the branch of ensemble classifiers. The MLPC has acquired good accuracy, which is 0.906, along with an F1 score of 0.833 from the section of the NN classifier. Moreover, from the section on the neighbor’s classifier, it can be noticed that the KNC has shown the best accuracy of 0.868, and the F1 score is 0.752. Also, from the section on NB algorithms, CC has shown the best accuracy of 0.899 and its F1 score is 0.813 among all NB classifiers. After that, LS also has placed the best accuracy of 0.902, and its F1 score is 0.816 from the branch of semi-supervised classifiers. Besides, it can also be seen from the discriminant analysis section that the linear discriminant analysis has proved the best position of accuracy, 0.901, with an F1 score of 0.817. Fig. 3 Comparison of accuracy among the models Next, LSVC also has the best accuracy of 0.902, and its F1 score is 0.812 from the branch of SVM classifiers. Finally, GPC has figured out the best accuracy result is 0.903 with an F1 score of 0.818 from the section of Gaussian process classifiers. All-inclusive, by analyzing all the sections of algorithms for the rain prediction model, it came out that the HGBC is the winner with an accuracy of 0.91, and the F1 score is 0.833. Fig. 3 shows the accuracy comparison among all the models. A ROC curve is a graph showing the performance of a classification model at all classification thresholds. Most of the algorithms show a high chance that the classifier will be able to distinguish the positive class values from the negative class values. 6. Conclusion and Future Work In this work, a machine learning model has been presented that can determine whether it will rain or not on the very next day. Real data from Australia has been adopted from the Kaggle platform and implemented in the model. This data-driven model is more accurate than any other statistical or numerical-based model. The primary purpose here is to find the best classifiers for predicting rainfall. For this reason, various machine learning classifiers have been implemented. Eventually, the most significant performance metrics have been compared, including accuracy, F1 scores, ROC, AUC, and HGBC have shown the best accuracy with a good score. All the models used here have been trained based on the dataset of a particular zone. Therefore, rain prediction accuracy may vary depending on the dataset characteristics. Training this model with a dataset that has a sample collected from diverse places can make this model more general and suitable for all locations in this world. This proposed model will be able to forecast the rain for the short-term, specifically for the next day. A new model can be built that will predict the rain for the long term in the future. Combining this two may build a complete solution for rain prediction. Moreover, this machine learning-based model may not be easily useable for general people. Therefore, mobile and Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 119 computer apps can be built so those general people can easily use them. A deep learning and NN model approach can be used to improve the result. Undoubtedly, there is a plan to evaluate the other country’s data for forecasting the rain using this model. Thus, this model is expected to be a universal and easy rain prediction tool for ordinary people. Nomenclature HGBC Hist gradient boosting classifier BC Bagging classifier ROC Receiver operating characteristic GBC Gradient boosting classifier AUC Area under the curve RFC Random forest classifier NN Neural networks MNB Multinomial naive Bayes SVM Support vector machine CoNB Complement naive Bayes PSO Particle swarm optimization CNB Categorical naive Bayes classifier MLP Multi-layer perceptron GNB Gaussian naive Bayes mm Measured in millimeters CC Calibration classifier CV Cross-validation LP Label propagation KNN K-nearest neighbors LS Label spreading NaN Not-a-number LDA Linear discriminant analysis NB Naive Bayes QDA Quadratic discriminant analysis Rs Recall score LSVC Linear support vector classifier PS Precision score NuSVC Nu support vector classification FBS F-beta score SGDC Stochastic gradient descent classifier HL Hamming loss RdC Ridge classifier JS Jaccard score RdCV Ridge classifier CV MC Matthew’s correlation PAC Passive aggressive classifier BAC Balanced accuracy LRCV Logistic regression CV CKS Cohen’s kappa LR Logistic regression KNC K neighbors’ classifier Pr Perceptron classifier NC Nearest centroid IL Impact learning RNC Radius neighbor’s classifier GPC Gaussian process classifier ADC AdaBoost classifier MLPC Multi-layer perceptron classifier Conflicts of Interest The authors declare no conflict of interest. References [1] K. T. Sohn, J. H. Lee, and S. H. Lee, “Statistical Prediction of Heavy Rain in South Korea,” Advances in Atmospheric Sciences, vol. 22, no. 5, pp. 703-710, 2005. [2] T. Denœux and P. Rizand, “Analysis of Radar Images for Rainfall Forecasting Using Neural Networks,” Neural Computing and Applications, vol. 3, no. 1, pp. 50-61, March 1995. [3] B. K. Shah, S. Thapa, R. S. Diyali, S. Hk, and S. Maharjan, “Rain Prediction Using Polynomial Regression for the Field of Agriculture Prediction for Karnatakka,” International Journal of Advances in Engineering and Management, vol. 2, no. 3, pp. 62-66, March 2020. [4] P. Asha, A. Jesudoss, S. Prince Mary, K. V. Sai Sandeep, and K. Harsha Vardha, “An Efficient Hybrid Machine Learning Classifier for Rainfall Prediction,” Journal of Physics: Conference Series, vol. 1770, no. 1, article no. 012012, March 2021. [5] S. Sakthivel and G. Thailambal, “Effective Procedure to Predict Rainfall Conditions Using Hybrid Machine Learning Strategies,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 6, pp. 209-216, April 2021. Advances in Technology Innovation, vol. 8, no. 2, 2023, pp. 111-120 120 [6] D. Naidu, B. Majhi, and S. K. Chandniha, “Development of Rainfall Prediction Models Using Machine Learning Approaches for Different Agro-Climatic Zones,” Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science, IGI Global, 2021. [7] T. V. Dinh, H. Nguyen, X. L. Tran, and N. D. Hoang, “Predicting Rainfall-Induced Soil Erosion Based on a Hybridization of Adaptive Differential Evolution and Support Vector Machine Classification,” Mathematical Problems in Engineering, vol. 2021, article no. 6647829, 2021. [8] H. Abdel-Kader, M. Abd-El Salam, and M. Mohamed, “Hybrid Machine Learning Model for Rainfall Forecasting,” Journal of Intelligent Systems and Internet of Things, vol. 1, no. 1, pp. 5-12, 2021. [9] N. Samsiahsani, I. Shlash, M. Hassan, A. Hadi, and M. Aliff, “Enhancing Malaysia Rainfall Prediction Using Classification Techniques,” Journal of Applied Environmental and Biological Sciences, vol. 7, no. 2S, pp. 20-29, April 2017. [10] K. C. Luk, J. E. Ball, and A. Sharma, “An Application of Artificial Neural Networks for Rainfall Forecasting,” Mathematical and Computer Modeling, vol. 33, no. 6-7, pp. 683-693, March 2001. [11] J. Abbot and J. Marohasy, “Application of Artificial Neural Networks to Rainfall Forecasting in Queensland, Australia,” Advances in Atmospheric Sciences, vol. 29, no. 4, pp. 717-730, June 2012. [12] C. Shah, C. Hendahewa, and R. Gonzalez-Ibanez, “Rain or Shine? Forecasting Search Process Performance in Exploratory Search Tasks,” Journal of the Association for Information Science and Technology, vol. 67, no. 7, pp. 1607-1623, July 2016. [13] M. Sangiorgio, S. Barindelli, R. Biondi, E. Solazzo, E. Realini, G. Venuti, et al., “Improved Extreme Rainfall Events Forecasting Using Neural Networks and Water Vapor Measures,” 6th International Conference on Time Series and Forecasting, pp. 820-826, September 2019. [14] D. Han, T. Kwong, and S. Li, “Uncertainties in Real-Time Flood Forecasting with Neural Networks,” Hydrological Processes: An International Journal, vol. 21, no. 2, pp. 223-228, January 2007. [15] J. Young, “Rain in Australia,” https://www.kaggle.com/jsphyg/weather-dataset-rattle-package, October 30, 2007. [16] M. Kowsher, A. Tahabilder, and S. A. Murad, “Impact-Learning: A Robust Machine Learning Algorithm,” Proceedings of the 8th International Conference on Computer and Communications Management, pp. 9-13, July 2020. [17] C. V. Z. Zelaya, “Towards Explaining the Effects of Data Preprocessing on Machine Learning,” IEEE 35th International Conference on Data Engineering, pp. 2086-2090, April 2019. Copyright© by the authors. Licensee TAETI, Taiwan. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC) license (https://creativecommons.org/licenses/by-nc/4.0/).