Operational Research in Engineering Sciences: Theory and Applications Vol. 5, Issue 1, 2022, pp. 152-168 ISSN: 2620-1607 eISSN: 2620-1747 DOI: https://doi.org/10.31181/oresta240322136m * Corresponding author. d.mladenovic@sf.bg.ac.rs (D. Mladenović), s.jankovic@sf.bg.ac.rs (S. Janković), s.zdravkovic@sf.bg.ac.rs (S. Zdravković), snezanam@sf.bg.ac.rs (S. Mladenović), ana.uzelac@sf.bg.ac.rs (A. Uzelac) NIGHT TRAFFIC FLOW PREDICTION USING K-NEAREST NEIGHBORS ALGORITHM Dušan Mladenović *, Slađana Janković, Stefan Zdravković, Snežana Mladenović, Ana Uzelac University of Belgrade, Faculty of Transport and Traffic Engineering, Serbia Received: 31 December 2021 Accepted: 14 February 2022 First online: 24 March 2022 Original scientific paper Abstract: The aim of this research is to predict the total and average monthly night traffic on state roads in Serbia, using the technique of supervised machine learning. A set of data on total and average monthly night traffic has been used for training and testing of predictive models. The data set was obtained by counting the traffic on the roads in Serbia, in the period from 2011 to 2020. Various classification and regression prediction models have been tested using the Weka software tool on the available data set and the models based on the K-Nearest Neighbors algorithm, as well as models based on regression trees, have shown the best results. Furthermore, the best model has been chosen by comparing the performances of models. According to all the mentioned criteria, the model based on the K-Nearest Neighbors algorithm has shown the best results. Using this model, the prediction of the total and average nightly traffic per month for the following year at the selected traffic counting locations has been made. Keywords: machine learning, traffic flow, prediction, K-Nearest Neighbors, Weka. 1. Introduction The accelerated urban development is faced with mobility challenges caused by increased transport of passengers and goods. The development of smart cities is based on the analysis of traffic data. They are used in dimensioning of road sections, connections and intersections, as well as dimensioning of road structures, environmental protection measures, economic and financial evaluation of projects, planning of management and maintenance of road infrastructure (Public Enterprise "Roads of Serbia", 2012). Monitoring the road network is one way to collect real-time traffic data. Various sensor technologies prevail in this type of data collection, such as technologies based on inductive loop detectors, laser radar sensors, etc. (Magalhaes et al., 2021). Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 153 The monitoring of traffic flows is important, both because of monitoring of the traffic conditions in real time, and because of predicting the characteristics of traffic flows in the future (Janković et al., 2020). Time determinants, such as: a day of the week, an hour of the day, the dates of state and religious holidays, holiday vacations, and so on, are some of the factors that permanently influence the formation of the usual intensity of traffic flows. Some other factors, such as: weather conditions, road conditions, maintenance of road infrastructure (Sénquiz-Díaz, 2021), use of alternative routes and traffic accidents can influence the characteristics of traffic flows to change for the observed time interval. In the situation where the flow of vehicles exceeds the capacity of the road congestion occurs. Traffic congestion leads to: prolongation of time spent in transport, increase in transport costs, increase in emissions of harmful gases, passenger delays, as well as delays in the delivery of goods. Therefore, the prevention of traffic congestion is one of the most important goals of predicting the characteristics of traffic flows. Supervised machine learning is a method of predictive analysis that enables prediction of future values of a target variable for independent attributes in the future, based on known values of the same target variable and known values of the same attributes in the past. Collection of traffic data provides opportunities for the development of supervised machine learning models which are going to be used to predict the characteristics of future traffic flows (Zhang et al., 2020; Park et al., 2018; Xu et al., 2013). The forecasting of traffic flows has been the subject of numerous studies over the last two decades. The second section of this paper contains an overview of the most significant studies related to this subject. The authors of this paper have limited their research to detection of night traffic patterns and the prediction of night traffic (i.e. traffic in the time period from 22.00 hours to 06.00 hours). The purpose of this research is to examine the possibilities of short - term prediction of night traffic volume using the technique of supervised machine learning. The methodology according to which this research has been performed and the basic characteristics of the algorithm that has shown the best results in prediction (K-Nearest Neighbors, K- NN) are presented in the third section of this paper. The fourth section of the paper describes a case study realized within this research. In the case study predictive models have been created and the prediction of the total and average amounts of night traffic per month has been performed on selected road sections in Serbia. The data collected by automatic traffic counters (ATC) have been used in training and testing of machine learning models. The most significant results of the case study and discussion on the results are presented in the fifth section of the paper, while the last sixth section concludes the paper. 2. Literature Review All models developed for traffic prediction can be broadly classified into three categories: parametric, nonparametric and hybrid types of models. Parametric models are e.g. historical average (Williams et al., 1998) time series models and Kalman filter (Guo & Williams, 2010). Seasonal autoregressive integrated moving average (ARIMA) is a classic parametric time series model used in the study (Williams & Hoel, 2003). In contrast, nonparametric models are mostly data-driven and use empirical prediction methods, including primarily Neural Networks models (Vlahogianni et al., 2005; Yasin Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 154 Çodur & Tortum, 2015), nonparametric regression (Marković et al., 2010; Cai et al., 2016), and Support Vector Machine (Zhang & Xie, 2008; Peng & Tang, 2015). In addition, the hybrid approach combines two or more models to generate predictions, e.g. non-linear chaotic prediction model (Wang & Shi, 2013), multiagent prediction model (Ma et al., 2001), modular network model (Vlahogianni et al., 2007), etc. The Karlaftis & Vlahogianni study (2011) compares traffic forecasting models based on parametric (statistical) methods and neural network-based models. Boukerche & Wang (2020) provide a classification and an overview of machine learning models used in traffic flow prediction. According to these authors, the mentioned models are divided into regression models, instance-based models (such as K-NN), kernel-based models (such as Support Vector Machine - SVM and Radial Basis Function - RBF), neural network models (such as Feed Forward Neural Network - FFNN, Recurrent Neural Network - RNN, Convolutional Neural Network - CNN) and hybrid models (combinations of two or more different models). Shamshad & Sarwr (2020) developed a model for predicting traffic volume at an hourly level, using two machine learning algorithms: Artificial Neural Network (ANN) and SVM. Traffic data obtained with the help of road sensors, as well as data on meteorological conditions have been used to train and later test different machine learning models. This study shows that ANN-based machine learning models show good results in long-term predictions, while SVM-based models show good results in short-term predictions. Zhang et al. (2013) have developed a nonparametric regression model, based on the K-NN algorithm on the MATLAB platform. The experimental results of this study show that the prediction accuracy of the highway traffic volume, using the K-NN method, is over 90 percent accurate. In the study (Zou et al., 2015) the authors show that, when applying K-NN methods in short-term traffic prediction, a much more accurate prediction is achieved if, in addition to temporal attributes, spatial attributes are included in independent attributes as well. In some studies, the basic K-NN method for short-term traffic prediction has been improved, in some way. For example “Specifically, two screening layers based on shape similarity were introduced in the K- nearest Neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. ”(Pang et al., 2016). Zheng & Su (2014) have introduced a time limit when selecting the nearest Neighbors. In the study (Liu et al., 2018), a short-term prediction of traffic volume has been performed using a hybrid model, based on the ANN and K-NN algorithms. Four types of ANN have been used: back-propagation (BP) neural network, radial basis function (RBF) neural network, generalized regression (GR) neural network, and Elman neural network. The K-NN method has been used to reconstruct a data set on which artificial neural networks have been trained, combining similar traffic flow patterns. By applying these ANNs to real traffic data two important conclusions have been reached: BP and GR neural networks show better prediction performance than the other two types of networks, but are sensitive to changing the scope of the training data set. On the other hand, the RBF and Elman neural networks show prediction results that are fairly stable when increasing the data set for training. The study (Toan & Truong, 2020) shows that applying K-NN methods to a training data set can significantly reduce the size of this data set, thus achieving faster model training using SVM methods, without affecting prediction performance. Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 155 In the research (Filipovska & Mahmassani, 2020) different models of machine learning for predicting traffic interruption have been developed and tested and their results have been compared to the results of traditional probabilistic approach. Stojčić (2018) has given an overview of research in which the ANFIS (Adaptive Neuro-Fuzzy Inference System) model has been used in the prediction of traffic congestion. Zaki et al. (2016), as well as Shankar et al. (2012) take velocity and density as independent attributes and congestion level as a dependent variable in the prediction of congestion using the ANFIS model. Kukadapwar & Parbat (2015), among others, use traffic volume to roadway capacity ratio as an independent variable, while the target variable in their study is congestion index. Recent research includes the application of deep learning methods in the prediction of traffic flow intensity (Wang et al., 2018). In the study (Lv et al., 2015) the application of a deep learning approach is demonstrated with stacked autoencoders (SAEs) to traffic data sets that have Big Data features. Alshaykha & Shaban (2021) combine the K-NN method and the Broad Learning System (KNN-BLS). “The basic structure of BLS is built on the traditional RVFLNN (Random Vector Functional-Link Neural Network), but unlike RVFLNN that directly uses the original input data to build an enhanced node, BLS first maps the input into a series of mapping nodes, and then uses the mapping node to build an enhanced node, and the mapping node and the enhanced node form joint Nodes, and finally combine the nodes and the output layer to establish a linear connection.“ (Alshaykha & Shaban, 2021). Mohammed & Kianfar (2018) have investigated the application of four categories of predictive methods in traffic flow prediction. The results obtained using distributed random forest method slightly exceed the results obtained using other methods. 3. Methodology The machine learning process takes place in the following stages: data preparation, model training, model validation, model testing and prediction. It is an iterative process in which all of the above mentioned phases are repeated as many times as necessary. The repetition of these phases ends when all attribute combinations, all available algorithms and algorithm parameter values are exhausted, or when a satisfactory model performance is reached. Once the model testing shows that the model is successful, the use of the model in the prediction of the selected variable can begin. The data preparation consists of: cleaning raw data from incomplete records or records with incorrect values, converting data into the appropriate format, etc. The construction of the prediction models consists of: 1. Selection of the target variable, i.e. an attribute whose value should be projected using a machine learning model; 2. Selection of an algorithm, in accordance with the nature of the target variable and attributes; 3. Selection of relevant attributes of the data set; 4. Preparation of data sets for learning and testing of models, according to the requirements of the selected algorithm; Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 156 5. Model adjustment, i.e. values of hyperparameters specific to each type of machine learning algorithm; 6. Model learning – implies obtaining model’s hyper-parameters through applying a training data set algorithm on the training data set. Since the target variables of the data set used in this study are continuous, machine learning models based on the most popular regression algorithms have been built: Linear Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machines for Regression (SMOreg), Neural Network. In addition to model training and testing, a model validation has been performed in order to select the best type of model among multiple candidates, determine the optimal configuration of model parameters, and avoid problems known as overfitting and underfitting. Excessive matching refers to a situation in which prediction for instances from the training set has been perfectly learned through the model, but there is a very weak ability to predict instances that are slightly different from those learned. Insufficient matching refers to a case when there is failure to approximate training data through the model, so it shows poor performance even on a training data set. An approach known as cross-validation has been used to validate a model. This approach to model performance evaluation uses only training data and consists of the following phases: 1. The available data set for model training is divided into K equal parts - folds. It is usually divided into 10 subsets (10-fold cross-validation). 2. The model is trained on K-1 subsets of data (e.g. on the first of K-1 subsets). 3. The model is evaluated on the only remaining (K-th) subset of data. 4. Steps 2 and 3 are repeated K times. In each iteration one part of the data is taken for the purpose of model validation, while the rest (K-1 parts) is used for learning. A different subset is always selected to be used for model validation. 5. Model performances are calculated as the arithmetic mean of the performances obtained in K iteration. Success of the numerical prediction can be evaluated using different metrics (Witten et al., 2017). The projected values of the target variable, obtained for the set of instances for model validation are: p1, p2, …, pn; while the actual values of the target variables are: a1, a2, …, an. Mean-squared error - Eq. (1), is the average error. 𝑀𝑒𝑎𝑛 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 = (𝑝1−𝑎1) 2+⋯+(𝑝𝑛−𝑎𝑛) 2 𝑛 (1) Mean-absolute error – Eq. (2), is the mean of the absolute value of the errors. 𝑀𝑒𝑎𝑛 − 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑟𝑟𝑜𝑟 = |𝑝1−𝑎1|+⋯+|𝑝𝑛−𝑎𝑛| 𝑛 (2) Root mean-squared error – Eq. (3), is calculated in an obvious way. 𝑅𝑜𝑜𝑡 𝑚𝑒𝑎𝑛 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 = √ (𝑝1−𝑎1) 2+⋯+(𝑝𝑛−𝑎𝑛) 2 𝑛 (3) Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 157 Relative-squared error – Eq. (4) is the square root of the mean of the squared errors. 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 = (𝑝1−𝑎1) 2+⋯+(𝑝𝑛−𝑎𝑛) 2 (𝑎1−�̅�) 2+⋯+(𝑎𝑛−�̅�) 2 (4) Root relative-squared error – Eq. (5), is calculated in an expected way. 𝑅𝑜𝑜𝑡 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 = √ (𝑝1−𝑎1) 2+⋯+(𝑝𝑛−𝑎𝑛 ) 2 (𝑎1−�̅�) 2+⋯+(𝑎𝑛−�̅�) 2 (5) Relative-absolute error – Eq. (6), is the total absolute error, with the same type of normalization. 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 − 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑟𝑟𝑜𝑟 = |𝑝1−𝑎1|+⋯+|𝑝𝑛−𝑎𝑛| |𝑎1−�̅�|+⋯+|𝑎𝑛−�̅�| (6) The last measure of prediction accuracy is the correlation coefficient - Eq. (7), which measures the statistical correlation between the values of a and p. The correlation coefficient takes values from 1 for results that are completely correlated, over 0 when there is no correlation, to -1 when the results are in perfect negative correlation. 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 𝑆𝑃𝐴 √𝑆𝑃𝑆𝐴 (7) where SPA, SP and SA are calculated as shown in (8): 𝑆𝑃𝐴 = ∑ (𝑝𝑖−�̅�)(𝑎𝑖−�̅�) 𝑛 𝑖=1 𝑛−1 , 𝑆𝑃 = ∑ (𝑝𝑖−�̅�) 2𝑛 𝑖=1 𝑛−1 , 𝑆𝐴 = ∑ (𝑎𝑖−�̅�) 2𝑛 𝑖=1 𝑛−1 (8) In the great number of empirical examples, the predictive model which is the best according to one measure is also the best in all other measures of error. In order to predict the performances of models using unknown data, it is necessary to determine measures of their performance on a data set that did not play any role in model training. This previously uknown data set is entitled as the test data set. The next phase is comparing the performances of models obtained on the test data set with the performances obtained on the training data set. This type of comparison enables to avoid a problem known as overfitting. If the performance of a model is good on training data but bad on the test data, then there is overfitting. In order to predict the values of the selected target variables in the future, it is necessary to prepare an appropriate set of data and apply to it the machine learning model chosen as the best. In this research, the best results have been shown by machine learning models based on the K-NN algorithm. The K-NN algorithm belongs to a class of supervised machine learning algorithms in model learning based on instances (Instance-Based Learning). In this class of algorithms, the classification of a new instance is done by comparing it with the most similar (the closest) instances in the training set (Aha et al., 1991). K is a parameter that indicates the number of most similar instances in the training set, with which the new instance is being compared. The K-NN algorithm belongs to the group of so-called lazy methods, because the decision on classification is postponed until the moment a new instance appears. Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 158 The main advantage of lazy methods is that they construct a different approximation of the objective function for each new instance that needs to be classified. Such local assessment of the objective function is suitable for complex objective functions. Because their models are slower to train than some other classes of algorithms, this algorithm is suitable for relatively “small” data sets. This feature of the K-NN algorithm has made it a good candidate for prediction in a case study conducted as part of this research. In the Weka (Waikato Environment for Knowledge Analysis) software tool used in this study, the K-Nearest Neighbors algorithm has been implemented under the name IBk. Target variable (class), as well as attributes, with this algorithm can be: nominal, numerical, date or binary and missing values of class, as well as missing values of attributes are allowed. Thus, the K-NN algorithm is applicable both in solving classification problems and regression prediction problems. In this research, it has been applied to regression predictive analysis. 4. Case study A Total of 391 automatic traffic counters have been installed on the network of state roads of the 1st category in the Republic of Serbia. Through automatic traffic counters vehicles are detected and classified in real-time, using inductive loops that are placed in the asphalt layer of the road structure. One such traffic counter is shown in Figure 1. Figure 1. Automatic traffic counter based on inductive loops The QLTC-10C counters continuously count and classify vehicles into ten categories, while QLTC-8C counters classify vehicles into eight categories. The QLTC- 10C counters, classify vehicles into the following categories: A0 - Motorcycles, A1 - Passenger cars and Passenger cars with trailer, A2 - Combined vehicles and Combined vehicles with trailer, B1 - Light trucks and Light trucks with trailer, B2 – Medium heavy Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 159 trucks, B3 - Heavy goods vehicles, B4 - Heavy goods vehicles with trailer, B5 - Semi- trailer trucks, C1 - Buses, C2 - Articulated buses, X - Uncategorized (other) vehicles. For each vehicle it detects, the counter records: date, time, direction of vehicle movement, ordinal number of the vehicle on that day for the observed direction, traffic lane, vehicle category and vehicle speed. The obtained data is stored on SD (Secure Digital) memory cards. In this case study data used have been obtained by automatic counting of traffic on state roads in Serbia at 21 counting points (Figure 2), in the period from 1.1.2011 to 31.12.2020. The research was done on 4 sections of the road (IA category (road 1) and IB category (roads 22, 23 and 46)). Selected counting places have the following marks, i.e. names: 1025 (Kraljevo 2), 1026 (Trstenik), 1027 (Pojate), 1046 (Vodice), 1050 (Prijanovci), 1052 (Pridvorica), 1057 (Prijepolje), 1156 (Mojsinje) , 1157 (Mrčajevci), 1183 (Trupale Bg-Ni), 1191 (Ineks), 1193 (Kneževići), 1194 (Zlatibor), 1195 (Kokin Brod 2), 1196 (Nova Varoš), 1198 (Gorjani), 1202 (Međuvršje ), 1207 (Prijepolje 2), 1208 (Velika Župa), 1225 (Lučina) and 1270 (Preljina). Figure 2. Traffic counting locations The purpose of the case study has been to predict two traffic intensity indicators: total monthly night traffic (TMNT) and average monthly night traffic (AMNT), at selected counting locations, using the method of supervised machine learning. The instances of the available data set are described by the following attributes: counter, year, month, TMNT and AMNT. The TMNT attribute represents the total number of vehicles that are registered by ABS at night (from 22.00 hours to 06.00 hours) during the period of one month. The AMNT attribute represents the average daily number of vehicles that are registered by ABS at night, on a monthly basis. In order to predict the total amount of night traffic per month models of machine learning, whose target variable is the TMNT attribute, have been created, while models whose target variable is the AMNT attribute have been created to predict the average night traffic per month. In both groups of machine learning models, the independent attributes are counter and month. The attribute year is used to classify the instances of the existing data set into two parts: for model training and for model testing. Instances relating to period Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 160 from 2011 to 2017 have been selected as a set of data for model training, while instances relating to the period from 2018 to 2020 have been used for model testing. Training, validation and testing of machine learning models have been performed in the data mining software Weka 3.9.5. This particular software represents a collection of machine learning algorithms used in discovery operations concerning data validity (Witten et al., 2017). It enables the performance of various data mining tasks, such as: data preparation for analysis, classification, regression analysis, clustering, learning through rules of association, selection of relevant attributes and data visualization. Each of these tasks is performed in a separate graphical user interface window of Weka software (Weka Explorer) and is opened by selecting the appropriate tab of Weka Explorer (Figure 3). The Preprocess window, shown in Figure 3, allows you to load and prepare the available data set for later analysis. Figure 3. Weka 3.9.5 software tool graphical user interface - data preparation window 5. Results and Discussion The following eight machine learning algorithms were to predict TMNT on the training data set in the Weka software tool: Linear Regression, Multilayer Perceptron, SMOreg, IBk (K-NN), M5P, Random Forest, Random Tree and REPTree. A 10-fold cross-validation, implemented in Weka software, has been applied to validate the model. The performance of the prediction model, measured on the training data set is shown in Table 1. Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 161 Table 1. The performance of eight TMNT prediction models measured on a training data set Algorithm Correlation coefficient Mean- absolute error Root mean- squared error Relativ- absolute error (%) Root relativ- squared error (%) LinearRegression 0.6417 9718.52 14687.1 73.729 76.637 MultilayerPerceptron 0.6168 10197.2 15161.7 77.360 79.114 SMOreg 0.6373 9430.72 14931.6 71.546 77.914 IBk 0.9803 1985.10 3784.06 15.06 19.745 M5P 0.9434 4124.44 6840.50 31.29 35.694 Random Forest 0.9799 2004.84 3818.77 15.209 19.926 Random Tree 0.9803 1990.11 3784.91 15.098 19.749 REPTree 0.9701 2456.30 4650.40 18.634 24.266 Models based on Multilayer Perceptron, SMOreg algorithms, and Linear Regression have been rejected due to undoubtedly unsatisfactory performance (they had a correlation coefficient of 0.6417, 0.6168 and 0.6373, respectively). Therefore, in the next phase – in testing the machine learning model, the remaining five algorithms have been applied. The performance of these five prediction models, measured on a test data set is shown in Table 2. Comparing the metrics of the selected models, shown in Table 1 and Table 2, it is concluded that none of these models have a problem of overfitting. In addition, in all five models on the test data set, the correlation coefficient has high value. Table 2. The performances of the top five TMNT prediction models measured on a test data set Algorithm Correlation coefficient Mean- absolute error Root mean- squared error Relativ- absolute error (%) Root relativ- squared error (%) IBk 0.9391 4473.81 7373.93 32.8912 35.4526 M5P 0.8854 6238.81 10205.8 45.8673 49.0681 Random Forest 0.9382 4495.17 7438.49 33.0482 35.763 Random Tree 0.9391 4473.58 7374.1 32.8895 35.4534 REPTree 0.9303 4893.33 7833.42 35.9755 37.6618 Li & Xu (2021) propose a model for short-term traffic prediction based on the Support Vector Regression (SVR) method. The SVR method is based on the basic principles of the SVM method and is generalized for regression problems. The SVM method is implemented in Weka software called LibSVM. The SVR method in the Weka software tool is obtained by selecting the LibSVM classifier and one of its types: epsilon-SVR or nu-SVR. However, the LibSVM classifier applied to the training data set, in this case study, gave poor results (correlation coefficient: 0.0644 (epsilon-SVR) and 0.0281 (nu-SVR), respectively)). Therefore, the SVR algorithm was rejected in the first phase of this research. In the research (Filipovska & Mahmassani, 2020) the best performance has been shown by models based on neural networks and SVM, if it is a case of class balancing. Without class balancing, the model based on a Random Forest algorithm has shown Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 162 the best results. In this case study, the neural network model (MultilayerPerceptron) was rejected in the first phase because it showed worse results than all other models (Table 1). In contrast, the Random Forest algorithm showed excellent results in this case study, along with the IBk, Random Tree, and REPTree algorithms (Table 1 and Table 2). The visualization of the prediction results received on the test data set has revealed that the model based on the IBk algorithm (K-NN) gives the results closest to the actual values. Therefore, the model based on the IBk algorithm has been selected as the best prediction model for TMNT. This case study confirmed the results of numerous studies, such as: Zhang et al. (2013), (Zou et al., 2015) and Zheng & Su (2014), which agree that the K-NN (IBk in Weka) algorithm gives excellent results in the short-term prediction of traffic flows. Figure 4. Actual and projected total monthly night traffic (TMNT), at selected counters (ID: 1193 and ID: 1208), for the three selected years (2018, 2019 and 2020) Figure 5. Projected total monthly night traffic (TMNT) at selected counters for 2021 0 10000 20000 30000 40000 50000 60000 70000 80000 Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r 2018 2019 2020 N u m b e r o f ve h ic le s Actual TMNT - 1193 Actual TMNT - 1208 Projected TMNT - IBk - 1193 Projected TMNT - IBk - 1208 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 N u m b e r o f ve h ic le s 1025 1026 1046 1183 1191 1193 1194 1208 Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 163 The graph shown in Figure 4 shows the ratio of actual and projected TMNT for two selected traffic counting locations (1193 - Kneževići and 1208 - Velika Župa) and the period from 2018 to 2020. The TMNT projection has been performed using a model based on the IBk algorithm. The graph clearly shows that the TMNT prediction performed on the test data set closely follows the actual TMNT values in the observed period (Figure 4). The results of the TMNT prediction at eight selected traffic counting locations for 2021 are shown in Figure 5. For AMNT prediction, the same eight machine learning algorithms have been applied to the training data set. The performance of the prediction models, measured on the training data set is shown in Table 3. Table 3. The performances of eight AMNT prediction models measured on a training data set Algorithm Correlation coefficient Mean- absolute error Root mean- squared error Relativ- absolute error (%) Root relativ- squared error (%) LinearRegression 0.6346 317.812 478.415 74.3936 77.2268 MultilayerPerceptron 0.608 334.371 494.495 78.2698 79.8224 SMOreg 0.6303 308.949 486.324 72.3191 78.5034 IBk 0.9801 64.9018 122.953 15.1922 19.8474 M5P 0.9445 133.975 220.096 31.3612 35.5284 Random Forest 0.9797 65.5395 124.075 15.3415 20.0285 Random Tree 0.9801 65.069 122.985 15.2314 19.8525 REPTree 0.9694 80.585 152.082 18.8634 24.5494 Models based on the Linear Regression, Multilayer Perceptron and SMOreg algorithms have been rejected due to unsatisfactory performance (correlation coefficients of 0.6346, 0.608 and 0.6303, respectively, have been recorded). Therefore, the remaining five algorithms have been applied in testing the machine learning model. The performance of these five prediction models, measured on the test data set is shown in Table 4. The best AMNT prediction model has been chosen in an identical manner as the best type of TMNT prediction model. The model based on the IBk algorithm has shown the best results this time, as well. Table 4. The performances of the top five AMNT prediction models measured on a test data set Algorithm Correlation coefficient Mean- absolute error Root mean- squared error Relativ- absolute error (%) Root relativ- squared error (%) IBk 0.939 146.428 239.814 33.1472 35.5591 M5P 0.8851 204.294 332.391 46.2465 49.2862 Random Forest 0.9381 147.128 241.896 33.3058 35.8678 Random Tree 0.939 146.420 239.819 33.1455 35.5599 REPTree 0.9286 161.299 257.180 36.5137 38.1342 The graph shown in Figure 6 shows the ratio of actual and projected AMNT for two selected traffic counting locations (1026 - Trstenik and 1046 - Vodice) and the period Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 164 from 2018 to 2020. The AMNT projection has been performed using a model based on the IBk algorithm. The results of the AMNT prediction at eight selected traffic counting locations for 2021 are shown in Figure 7. Figure 6. Actual and projected average monthly night traffic (AMNT), at selected counters (ID: 1026 and ID: 1046), for the three selected years (2018, 2019 and 2020) Figure 7. Projected average monthly night traffic (AMNT) at selected counters for 2021 In all the diagrams shown from Figure 4 to Figure 7, it is easy to see that the extreme values of TMNT, as well as of AMNT, occur for the months of July and August. This is because almost all counting places are located on the roads leading to popular tourist destinations, and July and August are the months when most people are on vacation and traveling. 0 200 400 600 800 1000 1200 1400 1600 1800 Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r Ja n u a ry M a rc h M a y Ju ly S e p te m b e r N o v e m b e r 2018 2019 2020 N u m b e r o f ve h ic le s Actual AMNT - 1026 Actual AMNT - 1046 Projected AMNT - IBk - 1026 Projected AMNT - IBk - 1046 0 1000 2000 3000 4000 5000 6000 N u m b e r o f ve h ic le s 1025 1026 1046 1183 1191 1193 1194 1270 Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 165 6. Conclusion The aim of this research has been to train and test predictive models on the existing data set on the volume of night traffic on state roads in Serbia and to predict the total and average amounts of night traffic per month for the following year. In the conducted case study, using the Weka software tool, machine learning models for prediction of total monthly night traffic (TMNT) and average monthly night traffic (AMNT) have been trained, based on algorithms: Linear Regression, Multilayer Perceptron, SMOreg, IBk, M5P, Random Forest, Random Tree and REPTree. In the training data set, the IBk (K-NN) algorithm-based model and the models based on regression trees have shown a considerably better performance than the models from the functions category (Linear Regression, Multilayer Perceptron and SMOreg). Therefore, only these models have been tested on the test data set. The best performances have been shown by models based on the K-NN algorithm, so the prediction of TMNT and AMNT has been performed using these models. The case study has shown that the K-NN algorithm can be effectively applied in solving the problem of regression analysis of traffic data, even on relatively small data sets. Future research will include the cluster analysis of traffic flows, especially the analysis of clusters in total and average monthly night traffic. As a result of this analysis, different patterns are expected in the volume of night traffic, on different sections of roads, at different periods of the year. Acknowledgement: This paper has been partially supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia, within the project number 036012. The data used in the research have been provided by Public Enterprise "Roads of Serbia". References Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine learning, 6(1), 37-66. https://doi.org/10.1007/BF00153759 Alshaykha, A. M., & Shaban, A. I. (2021). Short-Term Traffic Flow Prediction Model Based On K-Nearest Neighbors and Deep Learning Method. Journal of Mechanical Engineering Research and Developments, 44(6), 113-122. Boukerche, A., & Wang, J. (2020). Machine Learning-based traffic prediction models for Intelligent Transportation Systems. Computer Networks, 181, 107530. https://doi.org/10.1016/j.comnet.2020.107530 Cai, P., Wang, Y., Lu, G., Chen, P., Ding, C., Sun, J. (2016). A spatiotemporal correlative k- nearest neighbor model for shortterm traffic multistep forecasting. Transportation Research Part C: Emerging Technologies, 62, 21-34. https://doi.org/10.1016/j.trc.2015.11.002 Filipovska, M., & Mahmassani, H. S. (2020). Traffic flow breakdown prediction using machine learning approaches. Transportation research record, 2674(10), 560-570. https://doi.org/10.1177%2F0361198120934480 Guo, J., & Williams, B. M. (2010). Real-time short-term traffic speed level forecasting and uncertainty quantification using layered Kalman filters. Transportation Research Record, 2175(1), 28-37. https://doi.org/10.3141%2F2175-04 https://doi.org/10.1007/BF00153759 https://doi.org/10.1016/j.comnet.2020.107530 https://doi.org/10.1016/j.trc.2015.11.002 https://doi.org/10.1177%2F0361198120934480 https://doi.org/10.3141%2F2175-04 Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 166 Janković, S., Zdravković, S., Mladenović, D., Mladenović, S., Uzelac, A. (2020). Traffic Volume Prediction Using Regression Decision Trees. Proceedings of the XLVII International Symposium on Operational Research - SYM-OP-IS ’20, Belgrade, Serbia, 287-292. Karlaftis, M. G., Vlahogianni, E. I. (2011). Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transportation Research Part C: Emerging Technologies, 19(3), 387-399. https://doi.org/10.1016/j.trc.2010.10.004 Kukadapwar, S. R., & Parbat, D. K. (2015). Modeling of traffic congestion on urban road network using fuzzy inference system. American Journal of Engineering Research, 4(12), 143-148. Li, C., & Xu, P. (2021). Application on traffic flow prediction of machine learning in intelligent transportation. Neural Computing and Applications, 33(2), 613-624. https://doi.org/10.1007/s00521-020-05002-6 Liu, Z., Guo, J., Cao, J., Wei, Y., & Huang, W. (2018). A Hybrid Short-term Traffic Flow Forecasting Method Based on Neural Networks Combined with K-Nearest Neighbor. PROMET – Traffic & Transportation, 30(4), 445–456. https://doi.org/10.7307/ptt.v30i4.2651 Lv, Y., Duan, Y., Kang, W., Li, Z., & Wang, F. Y. (2015). Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2), 865-873. https://doi.org/10.1109/TITS.2014.2345663 Ma, S. F., He, G. G., Wang, S. T. (2001). A traffic flow forecast supported system based multi-agent. Intelligent Transportation Systems. IEEE Intelligent Transportation Systems Conference Proceedings, 25-29 Aug 2001, Oakland, CA, USA, 620-624. Magalhaes, R. P., Lettich, F., Macedo, J. A., Nardini, F. M., Perego, R., Renso, C., & Trani, R. (2021). Speed prediction in large and dynamic traffic sensor networks. Information Systems, 98, 101444. https://doi.org/10.1016/j.is.2019.101444 Marković, H., Dalbelo Bašić, B., Gold, H., Dong, F., Hirota, K. (2010). GPS Data-based Non-parametric Regression for Predicting Travel Times in Urban Traffic Networks. Promet – Traffic & Transportation, 22(1), 1-13. https://doi.org/10.7307/ptt.v22i1.159 Mohammed, O., & Kianfar, J. (2018). A Machine Learning Approach to Short-Term Traffic Flow Prediction: A Case Study of Interstate 64 in Missouri. 2018 IEEE International Smart Cities Conference (ISC2). Pang, X., Wang, C., & Huang, G. (2016). A short-term traffic flow forecasting method based on a three-layer k-nearest neighbor non-parametric regression algorithm. Journal of Transportation Technologies, 6(4), 200-206. http://dx.doi.org/10.4236/jtts.2016.64020 Park, H., Haghani, A., Samuel, S., & Knodler, M. A. (2018). Real-time prediction and avoidance of secondary crashes under unexpected traffic congestion. Accident Analysis & Prevention, 112, 39–49. https://doi.org/10.1016/j.aap.2017.11.025 Peng, T., Tang, Z. (2015). A small scale forecasting algorithm for network traffic based on relevant local least squares support vector machine regression model. Applied Mathematics & Information Sciences, 9(2L), 653-659. http://dx.doi.org/10.12785/amis/092L41 https://doi.org/10.1016/j.trc.2010.10.004 https://doi.org/10.1007/s00521-020-05002-6 https://doi.org/10.7307/ptt.v30i4.2651 https://doi.org/10.1109/TITS.2014.2345663 https://doi.org/10.1016/j.is.2019.101444 https://doi.org/10.7307/ptt.v22i1.159 http://dx.doi.org/10.4236/jtts.2016.64020 https://doi.org/10.1016/j.aap.2017.11.025 http://dx.doi.org/10.12785/amis/092L41 Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm 167 Public Enterprise “Roads of Serbia”. (2012). Manual for road design in the Republic of Serbia. Belgrade: Public Enterprise “Roads of Serbia”. Shamshad, N., Sarwr, D. (2020). A review of Traffic Flow Prediction Based on Machine Learning approaches. International Journal of Scientific & Engineering Research, 11(3), 126-130. Sénquiz-Díaz, C. (2021). Transport infrastructure quality and logistics performance in exports. Economics-Innovative and Economic Research, 9(1), 107-124. https://doi.org/10.2478/eoik-2021-0008 Shankar, H., Raju, P. L. N., & Rao, K. R. M. (2012). Multi model criteria for the estimation of road traffic congestion from traffic flow information based on fuzzy logic. Journal of Transportation Technologies, 2(01), 50. Stojčić, M. (2018). Application of ANFIS model in road traffic and transportation: a literature review from 1993 to 2018. Operational Research in Engineering Sciences: Theory and Applications, 1(1), 40-61. https://doi.org/10.31181/oresta19012010140s Toan, T. D., & Truong, V.-H. (2020). Support Vector Machine for Short-Term Traffic Flow Prediction and Improvement of Its Model Training using Nearest Neighbor Approach. Transportation Research Record: Journal of the Transportation Research Board, 2675(4), 362–373. https://doi.org/10.1177%2F0361198120980432 Vlahogianni, E. I., Karlaftis, M. G., Golias, J. C. (2005). Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach. Transportation Research Part C: Emerging Technologies, 13(3), 211-234. https://doi.org/10.1016/j.trc.2005.04.007 Vlahogianni, E. I., Karlaftis, M. G., Golias, J. C. (2007). Spatio-temporal short-term urban traffic volume forecasting using genetically optimized modular networks. Computer- Aided Civil and Infrastructure Engineering, 22(5), 317-325. https://doi.org/10.1111/j.1467-8667.2007.00488.x Wang, J. & Shi, Q. (2013). Short-term traffic speed forecasting hybrid model based on chaos-wavelet analysis-support vector machine theory. Transportation Research Part C: Emerging Technologies, 27, 219-232. https://doi.org/10.1016/j.trc.2012.08.004 Wang, Y., Zhang, D., Liu, Y., Dai, B., & Lee, L. H. (2019). Enhancing transportation systems via deep learning: A survey. Transportation research part C: emerging technologies, 99, 144-163. https://doi.org/10.1016/j.trc.2018.12.004 Williams, B. M., Durvasula, P. K., & Brown, D. E. (1998). Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transportation Research Record, 1644(1), 132-141. https://doi.org/10.3141%2F1644-14 Williams, B. M. & Hoel, L. A. (2003). Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of Transportation Engineering, 129(6), 664-672. https://doi.org/10.1061/(ASCE)0733- 947X(2003)129:6(664) Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. (2017). Data Mining: Practical Machine Learning Tools and Techniques. (4th ed.). Burlington, USA: Morgan Kaufmann. Xu, Y., Kong, Q. & Liu, Y. (2013). Short-term traffic volume prediction using classification and regression trees. Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, Australia, 493-498. https://doi.org/10.2478/eoik-2021-0008 https://doi.org/10.31181/oresta19012010140s https://doi.org/10.1177%2F0361198120980432 https://doi.org/10.1016/j.trc.2005.04.007 https://doi.org/10.1111/j.1467-8667.2007.00488.x https://doi.org/10.1016/j.trc.2012.08.004 https://doi.org/10.1016/j.trc.2018.12.004 https://doi.org/10.3141%2F1644-14 https://doi.org/10.1061/(ASCE)0733-947X(2003)129:6(664) https://doi.org/10.1061/(ASCE)0733-947X(2003)129:6(664) Mladenović et al./Oper. Res. Eng. Sci. Theor. Appl. 5(1) (2022) 152-168 168 Yasin Çodur, M., Tortum, A. (2015). An artificial neural network model for highway accident prediction: A case study of Erzurum, Turkey. Promet – Traffic & Transportation, 27(3), 217-225. https://doi.org/10.7307/ptt.v27i3.1551 Zaki, J. F., Ali-Eldin, A. M. T., Hussein, S. E., Saraya, S. F., & Areed, F. F. (2016). Framework for Traffic Congestion Prediction. International Journal of Scientific & Engineering Research, 7(5), 1205-1210. https://hdl.handle.net/1887/46907 Zhang, L., Liu, Q., Yang, W., Wei, N., & Dong, D. (2013). An Improved K-nearest Neighbor Model for Short-term Traffic Flow Prediction. Procedia - Social and Behavioral Sciences, 96, 653–662. https://doi.org/10.1016/j.sbspro.2013.08.076 Zhang, Y., & Xie, Y. (2007). Forecasting of short-term freeway volume with v-support vector machines. Transportation Research Record, 2024(1), 92-99. https://doi.org/10.3141%2F2024-11 Zhang, Y., Zhou, Y., Lu, H., & Fujita, H. (2020). Traffic Network Flow Prediction Using Parallel Training for Deep Convolutional Neural Networks on Spark Cloud. IEEE Transactions on Industrial Informatics, 16(12), 7369-7380. https://doi.org/10.1109/TII.2020.2976053 Zheng, Z. & Su, D. (2014) Short-term traffic volume forecasting: a k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm. Transportation Research Part C: Emerging Technologies, 43, 143-157. https://doi.org/10.1016/j.trc.2014.02.009 Zou, T., He, Y., Zhang, N., Du, R., & Gao, X. (2015). Short-Time Traffic Flow Forecasting Based on the K-Nearest Neighbor Model. Fifth International Conference on Transportation Engineering - ICTE 2015. September 26–27, 2015, Dailan, China. © 2022 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.7307/ptt.v27i3.1551 https://hdl.handle.net/1887/46907 https://doi.org/10.1016/j.sbspro.2013.08.076 https://doi.org/10.3141%2F2024-11 https://doi.org/10.1109/TII.2020.2976053 https://doi.org/10.1016/j.trc.2014.02.009 Night Traffic Flow Prediction Using K-Nearest Neighbors Algorithm Dušan Mladenović *, Slađana Janković, Stefan Zdravković, Snežana Mladenović, Ana Uzelac 1. Introduction 2. Literature Review 3. Methodology 4. Case study 5. Results and Discussion 6. Conclusion References