Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 6, No 1, April 2023, pp. 92–102 eISSN 2597-4637 https://doi.org/10.17977/um018v6i12023p92-102 ©2023 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Long-Term Traffic Prediction Based on Stacked GCN Model Atkia Akila Karim 1,*, Naushin Nower 2 Institute of Information and Technology, University of Dhaka, Suhrawardi Udyan Rd, Dhaka 1200, Bangladesh 1 msse1760@iit.du.ac.bd*; 2 naushin@iit.du.ac.bd * corresponding author I. Introduction Traffic flow prediction is a crucial research domain focused on anticipating forthcoming traffic patterns within a road network [1]. Recently, this field has garnered increasing interest due to the rapid advancements and adoption of Intelligent Transportation Systems (ITS). Traffic flow prediction plays a fundamental role within the framework of ITS, serving as a pivotal component, plays a critical role in traffic management and planning, and aims to provide better transport management by avoiding congestion. In most megacities, traffic congestion is a significant issue that hinders residents' daily lives and the nation's economic progress [2]. The significant causes of traffic congestion include rising population, urbanization, poor traffic management, and inadequate transportation infrastructure [3]. The economic burden of traffic congestion in urban centers is steadily increasing globally, affecting nearly every major city. For instance, in Dhaka, traffic congestion results in the loss of five million working hours daily, translating to an annual economic toll ranging from 200 to 550 billion takas [4]. Such severe traffic congestion can harm a nation's economy, hinder foreign investments, disrupt the supply and demand dynamics, and contribute to heightened emotional stress among the population [5]. Consequently, timely and precise traffic flow forecasting is immensely valuable to urban residents. Travelers can create better trip arrangements with accurate traffic flow forecasting, reducing traffic congestion, fuel consumption, and carbon emissions [6]. However, because of its intricate spatial and temporal connections and abrupt accidents, traffic flow prediction has always been a complex problem. Numerous specialists and academics have dedicated their research to studying traffic flow prediction and have developed numerous prediction ARTICLE INFO A B S T R A C T Article history: Received 20 August 2023 Revised 10 September 2023 Accepted 18 September 2023 Published online 24 September 2023 With the recent surge in road traffic within major cities, the need for both short and long-term traffic flow forecasting has become paramount for city authorities. Previous research efforts have predominantly focused on short-term traffic flow estimations for specific road segments and paths. However, applications of paramount importance, such as traffic management and schedule routing planning, demand a deep understanding of long-term traffic flow predictions. However, due to the intricate interplay of underlying factors, there exists a scarcity of studies dedicated to long-term traffic prediction. Previous research has also highlighted the challenge of lower accuracy in long-term predictions owing to error propagation within the model. This model effectively combines Graph Convolutional Network (GCN) capacity to extract spatial characteristics from the road network with the stacked GCN aptitude for capturing temporal context. Our developed model is subsequently employed for traffic flow forecasting within urban road networks. We rigorously compare our method against baseline techniques using two real-world datasets. Our approach significantly reduces prediction errors by 40% to 60% compared to other methods. The experimental results underscore our model's ability to uncover spatiotemporal dependencies within traffic data and its superior predictive performance over baseline models using real-world traffic datasets. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Traffic flow prediction Long-term prediction Graph Convolutional Network Segment http://u.lipi.go.id/1502081730 http://u.lipi.go.id/1502081046 http://journal2.um.ac.id/index.php/keds mailto:keds.journal@um.ac.id https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ 93 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 techniques that can be categorized depending on the model: parametric or nonparametric. Parametric models derive their parameters by analyzing the original data, and traffic forecasts are subsequently executed based on predefined regression functions. Various traditional parametric models like the ARIMA model [7], the KF model [8][9], and different variations of ARIMA have been utilized for traffic flow prediction. Nevertheless, due to traffic flow's nonlinear and stochastic nature, these models often struggle to provide accurate predictions. Consequently, nonparametric models, including random forest [10], support vector machine [11][12], fuzzy logic models [13], Bayesian networks [14], K-Nearest Neighbors methods [15][16], neural network models [17][18], and hybrid combinations of these algorithms, have been introduced. These models can handle spatiotemporal data, although their effectiveness may vary depending on the application and dataset size. Despite their superior performance, these models encounter challenges when dealing with extensive traffic datasets. To address these challenges, recent advancements in deep learning networks have become increasingly prevalent, as they can handle large datasets and improve prediction accuracy by utilizing multiple layers to extract intricate traffic characteristics. For instance, Wu and Tan [19] introduced a model featuring a one-dimensional Convolutional Neural Network (CNN) for capturing spatial features and incorporated two Long Short-term Memory (LSTM) layers to capture temporal patterns. Duan et al. [20] adopted CNN for spatial features and combined it with LSTM for temporal feature extraction. Additionally, they employed a greedy training policy to reduce training time and enhance accuracy, especially in deeper networks. However, CNN has inherent limitations when dealing with complex topological structures, as it was initially designed for Euclidean spaces like images and regular grids, making it less suitable for adequately characterizing the spatial intricacies and dependencies within road networks. The Graph Convolutional Network (GCN) [21] was introduced to address this limitation. GCN represents the traffic network as a graph and effectively captures spatial attributes from neighboring nodes. In another study [22], a combination of GCN was utilized for traffic flow prediction, incorporating LSTM and multitask learning to capture global and local traffic flow correlations along road segments. This model leveraged GCN within an undirected graph framework to depict the spatial distribution patterns of taxi trips and used LSTMs to capture temporal features. Additionally, the implementation of multitask learning enhanced the model's generalizability. In [23], an approach called Hierarchical Graph Convolution Networks (HGCN) was proposed, operating on both micro and macro traffic graphs. This study recognized the hierarchical structure of traffic systems, comprising microlayers (road networks) and macro layers (region networks). In [24], the authors emphasized the importance of learning node-specific patterns without relying on predefined graphs. To achieve this, they introduced two adaptive modules: the Node Adaptive Parameter Learning (NAPL) module, capturing node-specific patterns, and the Data Adaptive Graph Generation (DAGG) module, inferring interdependencies among traffic series automatically. These modules were integrated with recurrent networks to create the Adaptive Graph Convolutional Recurrent Network (AGCRN), effectively capturing fine-grained spatial and temporal correlations in traffic data. However, it is worth noting that these innovative methods predominantly focused on short-term traffic prediction despite the increased complexity associated with long-term prediction. Long-term traffic prediction is particularly challenging due to its essential applications in traffic management and schedule routing planning. Consequently, research is scarce in this domain, primarily because predicting the distant future presents more considerable difficulties compared to short-term forecasting. Long-term traffic flow prediction is a less frequently explored research area, and achieving accurate long-term predictions poses challenges due to performance degradation over extended timeframes compared to short-term predictions. A previous study [25] employed a Recurrent Neural Network (RNN) with GPU acceleration to forecast long-term traffic flow in Odense and Beijing. However, it is worth noting that RNNs are susceptible to the vanishing gradient problem, which can impact their performance. In another study [26], a spatial-temporal graph attention network was introduced, designed to capture the data's dynamic graph structure and spatial-temporal dependencies. Their model is tested using two public datasets gathered in California. In their study, Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 94 Wang et al. [27] introduced a deep learning architecture comprising two main components: a bottom- up LSTM encoder-decoder structure and a top-down calibration layer. On the other hand, Li et al. [28] proposed a hybrid model for forecasting next-day traffic flow. This model incorporates wavelet decomposition, CNN, and LSTM techniques. In [29], CNN and BiLSTM are incorporated to predict long-term traffic flow. However, CNN is unsuitable for capturing the complex traffic road network structure since it is based on Euclidean distance. Moreover, as those prediction techniques do not use separate models, errors can propagate quickly, and those models find difficulties in handling sudden incidents. Accurately predicting traffic patterns beyond short time frames remains challenging due to the inherent complexities of error accumulation in existing models, which undermines long-term forecasting precision. To solve those problems, we proposed a stacked GCN that can handle sudden incidents, and as there is a GCN for every segment, the error does not propagate. Most models use RNN or its variant to capture the temporal feature and CNN or GCN to capture the spatial feature. However, using separate models has drawbacks; it cannot capture the inherent interrelationship between temporal and spatial features. To overcome this, we used stacked GCN, where segmented modules inherit the temporal feature that helps GCN capture both the spatial and temporal features simultaneously. In the proposed architecture, we design a segmented module that segments input data to extract the temporal features and then incorporates a GCN for every segment to give day-long predictions. Thus, we use stacked GCN to get the final prediction outcome based on the segment, and as a result, because of stacked architecture, the error from the previous outcome is not propagated in the next prediction. GCN is utilized in the proposed method since it improves CNN, which can directly handle graphs and non-Euclidian distance and thus works better in road networks. Our contributions to this paper are briefly summarized below: ● We proposed a stacked GCN predictive model for traffic flow over extended periods and applied segments to improve the prediction performance without accumulating errors. ● We used two publicly available datasets to evaluate our model and perform a whole-day prediction. We conducted a comparative analysis of our model against the baseline methods, and our model shows superiority in traffic forecasting. II. Method This section introduces the proposed Stacked GCN model designed for long-term traffic flow prediction. Our architecture leverages GCN to extract intricate spatial relationships within the road network. The road network, represented as the graph G = (V, E), serves as the input to GCN, encapsulating the topological structure of the road network. Each road is treated as a node, illustrated in Figure 1, and the edges denote connections between the roads. (a) (b) Fig. 1. Real road structure transformation into graph road network where (a) Road map (b) Graph structure of the road map Within the graph, individual roads are node representations, with V being the set of road nodes V = {v1, v2,· · ·,vN }, N signifying the total number of nodes, and E representing the set of edges. The adjacency matrix A ∈ R N×N characterizes road linkages, with entries in the matrix being 0 for unrelated roads and 1 for connected ones. The feature matrix X ∈ R N×F, with F corresponding to the 95 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 historical traffic flow data length. Our primary objective is to predict traffic flow for the next T time steps, relying on historical data. The proposed Stacked GCN model comprises two essential modules: i) a segmented module and ii) a graph convolutional network module. GCN effectively captures spatial traffic data characteristics, while the segments module divides historical data into S segments, enabling the model to learn temporal patterns. The primary goal of our suggested model is to create a more accurate forecast, and the divergence from the actual value should be minimized. As a result, our goal is to reduce prediction error, which can be expressed as in (1). 𝑚𝑖𝑛⁡(||𝑌𝑖 −⁡𝑌_𝑖⁡||) (1) 𝑌𝑖 represents the actual observed value of traffic flow, while 𝑌_𝑖 signifies the predicted output. The methods of the modules are described in the following subsections. A. Segmented Module To capture the periodic information embedded within the historical data, we employ the segmented module, which transforms the full-length historical traffic data (X) into a collection of periodic segments denoted as S = {S1, S2, . . . , Sd}, where d represents the number of segments. Each of these segments encapsulates historical data from a distinct period, with Si representing a sub- time series conveying information about a specific period. Here, l signifies the length of each segment, and Si is composed of temporal features about the corresponding time interval. Figure 2 illustrates an illustrative example of this data segmentation process, where the previous four days' twenty-four-hour data is segmented into six segments. Each segment consists of four hours. So, the value of d is six, and the value of l is four hours. The fifth day's data were predicted using the previous four days' data segments. Fig. 2. Segmentation mechanism of input data In this proposed method to predict a time stamp, we have considered the same time segment from the historical data rather than the whole historical data. Typically, traffic behavior within a region exhibits a consistent pattern during the same periods across different days. As a result, historical daily patterns can be characterized as recurring weekly patterns within specific time windows. For instance, the traffic speed observed on a Wednesday at 8:00 AM and 9:00 AM will resemble the corresponding time slots on previous days. Consequently, the repetitive patterns in traffic data from preceding days within a specific time window can serve as a valuable reflection of the historical daily Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 96 trends. Thus, we have extracted the temporal features from the segmented module from the historical data, and from the stacked GCN module, we have considered the traffic speed for that particular time segment. B. Graph Convolutional Networks The GCN model collects spatial features from its first-order neighborhood. As depicted in Figure 3, node a represents a central road, while nodes b and c signify the roads connected to this central road. Spatial features are extracted by establishing the topological relationships between the central and neighboring roads. The GCN model generates a Fourier domain filter utilizing the adjacency matrix 'A' and the feature matrix 'X.' This filter, applied to the nodes within the graph, gathers spatial characteristics from the first-order neighborhood of each node. The GCN model is constructed by stacking multiple convolutional layers, allowing it to capture increasingly complex spatial relationships among the nodes, as in (2). 𝐻𝑙+1 = 𝜎⁡(𝐷 − 1 2⁡𝐴⁡𝐷 − 1 2⁡𝐻(𝑙)⁡𝑊(𝑙)⁡) (2) 𝐻(𝑙) represents the node feature matrix at layer 𝑙, 𝐴 = 𝐴 + 𝐼 is the adjacency matrix of the graph with self-connections added, D is the degree matrix, W(l) denotes the learnable weight matrix at layer 𝑙, and 𝜎 represents a nonlinear activation function. The number of layers in the model determines the maximum distance over which node characteristics can propagate and interact within the graph structure. With one layer GCN, for instance, each node can only obtain information from its neighbors. Each node's information-gathering operation runs simultaneously and independently. We repeat the process of obtaining information when we layer another layer on top of the original one. However, GCN suffers from a vanishing gradient problem if more layers are added, precisely more than four layers, causing limited performance [30]. To avoid this problem, we used two layers in GCN that can better handle non-euclidean road networks compared to CNN without suffering from the vanishing gradients problem. We utilize historical traffic data as our input, segment the input data, and then use a two-layered Graph Convolution Network (GCN) for every segment. Fig. 3. Graph Convolutional Network (GCN) C. The Proposed Stacked-based GCN Model The architecture of our proposed model, as depicted in Figure 4, incorporates a segmented module responsible for preprocessing the input time series data X and converting it into periodic segments denoted as S. For day-long prediction, we have segmented twenty-four hours into S segments. We have generated results for different numbers of segments. 97 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 Fig. 4. Graph Convolutional Network (GCN) Table 1 demonstrates that an increase in the number of segments leads to reduced error. Twenty- four segments give less error than others (2, 3, 4, 6, 8, 12). In twenty-four segments, each segment consists of one-hour timestamps. In the GCN model, the processed segments are utilized to generate the final predictions for traffic speed data. As depicted in Figure 1, the initial raw historical data is initially input into the system, and from there, the segments are extracted for further processing. In Fig. 3, we have demonstrated the segmentation of historical data for our model. Previous four days, particular segments (suppose 4:00 PM - 5:00 PM) have been considered to predict the fifth day's 4:00 PM - 5:00 PM. Every day is divided into S segments. After that, in the stacked GCN models, GCN models are used to process each segment separately. Every GCN used for the segment is two-layered. The outputs of these modules are then merged to produce the final prediction sequence Y. Historical data in the segment module helps the model inherit the temporal feature and GCN helps capture spatial features. The proposed method does not incorporate any other model to capture the temporal feature separately, as using separate models cannot capture the inherent interrelationship between temporal and spatial features. The stacked GCN model can effectively capture temporal and spatial features by employing segmentation. Table 1. Day-long (Twenty-four hours) prediction performance for different segments on the SZ-taxi dataset Number of Segments RMSE MAE R2 24 5.5843 4.4322 0.6995 12 5.6228 4.2262 0.6940 8 5.6179 4.2211 0.6930 6 5.6421 4.2171 0.6876 4 5.6465 4.2279 0.6955 3 5.7155 4.3007 0.6897 2 5.8001 4.3659 0.6808 III. Experimental Results A. Dataset Description In this section, we evaluate the predictive performance of our proposed model using two publicly available real-world datasets: the SZ-taxi dataset and the PeMSD7 dataset. These datasets have gained popularity in traffic forecasting research and have been employed for performance benchmarking in prior studies. Those datasets have both speed and connection data that are needed for GCN. SZ-taxi: The SZ-taxi dataset, covering taxi trajectories in Shenzhen from January 1 to January 31, 2015, is centered on the Luohu District's 156 highways. This dataset is structured into two essential components: a 156x156 adjacency matrix illustrating highway connections and a feature matrix capturing the time-varying traffic speeds for each road. Each row in the feature matrix corresponds to a unique route, while columns represent traffic speeds at fifteen-minute intervals. The dataset is split into two segments for research purposes, allocating twenty days for training and ten days for testing, facilitating effective model development and evaluation. PeMSD7: The PeMSD7 dataset provides traffic speed data collected from 228 sensors in California's District Seven during weekdays in May and June 2012. It includes two critical Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 98 components: a 228x228 adjacency matrix representing sensor connections within the network and a feature matrix depicting the time-varying traffic speeds for each sensor. Each row in the feature matrix corresponds to an individual sensor, while columns represent five-minute intervals of traffic speed measurements. The dataset is segmented into a training set, consisting of the first month's data encompassing 6,336 timestamps and a test set with an equal number of timestamps, enabling practical model training and evaluation for traffic flow prediction research. Table 2 illustrates the learning parameters employed in our proposed model. We utilized the Adam optimizer during the training process to minimize the RMSE. The Adam optimizer dynamically adjusts the model's real-time parameters, enhancing its accuracy and computational efficiency. The L2 Regularization technique is used to reduce model overfitting. As we have memory limitations, we used 1000 epochs,64 hidden units, batch size 32, and 0.001 learning rate. As per Table. 1, we can see that twenty-four segments give better performance. Thus, we used twenty-four segments for a one-day prediction. Table 2. Learning Parameters Parameter Description Learning Rate 0.001 Number of Epoch 1000 Loss Function RMSE Hidden Unit 64 Optimizer Adam Regularization Techniques L2 Regularization B. Evaluation Metrics To assess prediction performance, we utilize three metrics. Three commonly used performance measurements for model evaluation in various fields are the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE). In particular, the RMSE is an important metric to evaluate the effectiveness of the proposed model. The RMSE value indicates the average magnitude of the differences between actual and predicted data values. In general, a smaller RMSE suggests that the model and its predictions perform better, reflecting reduced errors in prediction accuracy. The eqaution of RMSE as in (3). 𝑅𝑀𝑆𝐸 = √ 1 𝑛 ∑ (𝑦1 − 𝑦1̂ 2 )𝑛𝑖=1 (3) The absolute mathematical operation turns a negative integer into a positive number. Indeed, when calculating the MAE, the absolute difference between an expected (actual) value and a predicted value is always taken, ensuring that the result is positive regardless of whether the prediction overestimates or underestimates the actual value. The formula of MAE as in (4). 𝑅𝑀𝑆𝐸 =⁡ 1 𝑛 ∑ |𝑦1 − 𝑦1̂| 𝑛 𝑖=1 (4) The coefficient of determination, often referred to as R-squared, quantifies the proportion of variation in the dependent variable that can be accounted for by the independent variable (s) in a regression model. It has a value between 0 and 1, with higher values suggesting that the model fits the data more closely as in (5). 𝑅2 = ⁡1 −⁡ ∑ (𝑦�̂�−𝑦𝑖) 2𝑛 𝑖=1 ∑ (𝑦𝑖−𝑦�̅�) 2𝑛 𝑖=1 (5) C. Compared Methods We conducted a comparative analysis of our proposed model against several widely recognized models for traffic flow prediction. We selected four commonly employed approaches, encompassing both traditional time-series prediction methods and deep learning techniques. First is Autoregressive Integrated Moving Average (ARIMA). ARIMA represents a conventional statistical method that captures temporal dependencies within data by employing autoregression, 99 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 differencing, and moving average techniques. Researchers have extensively used it for traffic flow estimation [31]. Second is Support Vector Regression (SVR). SVR is a model that forecasts future traffic data by leveraging existing data to train the model and establish the relationship between input and output variables [32]. This model employs a linear kernel function. The next is K-nearest Neighbor (KNN). KNN is a widely recognized supervised learning approach used for data classification based on the proximity of data points to their neighbors [15]. KNN retains all available instances and classifies new cases using a similarity score. The last is Graph Convolutional Network (GCN). GCN represents a semi-supervised deep learning method that captures the spatial characteristics of nodes within a graph. It operates effectively in non-Euclidean spaces, making it suitable for modeling road networks IV. Result and Discussion Table 3 shows the performance of the four approaches outlined above and our suggested model on two frequently used datasets. First, we calculate RMSE, MAE, and R2 for a whole day (twenty-four hour) prediction. Table 3 reveals that our proposed approach surpasses the other four methods across both datasets regarding RMSE, MAE, and R2. Lower error values imply higher accuracy, except for R2, where higher values indicate superior performance. The error calculations are conducted twenty- four hours ahead of predictions. In the sz-taxi dataset, our proposed method demonstrates a remarkable 16.9% reduction in RMSE compared to ARIMA and a 9.17% decrease compared to GCN. Moving to the PeMSD7 dataset, our proposed model achieves a substantial 60.4% reduction in RMSE compared to ARIMA, 55.5% reduction compared to SVR, 45.7% reduction to KNN, and 53% reduction to GCN. Our proposed model exhibits superior performance, particularly in the PeMSD7 dataset. This is attributed to the larger size of the PeMSD7 dataset, allowing our model to learn more effectively by relying on historical data for predicting future traffic trends. Notably, ∗ it indicates negligible values, signifying poor prediction performance for the model in those cases. Table 3. Prediction performance of the proposed model and other baseline models using SZ-taxi data and PeMSD7 datasets for a day (24 hours) Model Name SZ-taxi PeMSD7 RMSE MAE R2 RMSE MAE R2 ARIMA 6.7963 4.6757 * 11.3038 9.1818 * SVR 6.56454 4.55313 0.6552 10.0653 4.9432 0.5032 KNN 5.96454 4.25313 0.6752 8.2546 4.8241 0.5134 GCN 6.2163 4.6581 0.6451 9.5362 6.8600 0.5162 Proposed Model 5.6463 4.2197 0.6871 4.4808 3.2734 0.8105 The poor results of the baseline methods are because of the difficulty for ARIMA, KNN, and SVR in dealing with complex, irregular time series data. That is why they performed poorly in long datasets like the PeMSD7. Despite utilizing GCN within the model, its predictive performance is subpar. GCN primarily focuses on spatial characteristics, neglecting the temporal nature inherent in traffic data, which is fundamentally time series data. Our proposed model addresses this limitation by segmenting the data, enhancing GCN's ability to handle time series data. Consequently, our proposed model exhibits superior day-long traffic flow speed prediction capabilities. Additionally, ARIMA, a well-established traffic forecasting method, suffers from reduced prediction accuracy when confronted with extended and irregular data patterns. ARIMA computes its predictions by calculating and averaging errors across individual nodes, and any anomalies in the data can consequently inflate the final total error. On the other hand, in our proposed long-term prediction, error does not propagate, resulting in better results when compared to others. In Figure 5, we visualize traffic prediction and actual traffic flow for an entire day on one road for the SZ-taxi dataset. The yellow line indicates actual traffic flow, and the blue dotted line indicates predicted traffic flow. The model demonstrates an ability to capture the daily traffic flow data trends. Utilizing GCN for each segmented dataset allows for capturing temporal and spatial characteristics throughout the day. Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 100 Fig. 5. The visualization results for a prediction horizon of twenty-four hours in the SZ-taxi dataset Our model has shortcomings as it does not account for external variables such as weather conditions, accidents, or holidays, which can result in limitations in accurately capturing traffic flow dynamics. Our plans involve integrating attention mechanisms to detect abrupt incidents and adopting a dynamic adjacency matrix instead of a static one to enhance the information supplied to the GCN. In addition, we aim to integrate weather conditions and holiday data into our analysis alongside speed data. V. Conclusion In this research paper, we introduced the concept of a stacked GCN, a deep learning methodology aimed at tackling the complexities associated with long-term traffic flow prediction. Accurate long- term prediction is essential in traffic management and sustainable urban planning, particularly as urbanization and population growth exacerbate traffic congestion issues. The proposed Stacked GCN model overcomes traditional error accumulation issues by employing a segmented module for temporal feature extraction and leveraging Graph Convolutional Networks' capabilities. Incorporating historical data in segmentation helps our model learn the historical pattern. In a comparison between the ARIMA, SVR, KNN, and GCN models using two real-world traffic datasets, it is evident that the stacked GCN model outperforms the others and yields the most accurate prediction results. Our model can reduce error from 40% to 60% compared to other methods that we used for comparison. This produces accurate day-long traffic forecasts, providing travelers with preemptive route planning information. Moreover, our model does not use hybrid models like other long-term prediction models, ensuring faster results. In the future, our strategy includes integrating attention mechanisms to detect unexpected events and employing a dynamic adjacency matrix instead of a fixed one to enhance the information available to the GCN. We aim to integrate weather conditions and holiday data into our analysis alongside speed data. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence the work reported in this paper. Additional information Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. http://journal2.um.ac.id/index.php/keds 101 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 Publisher’s Note: Department of Electrical Engineering and Informatics - Universitas Negeri Malang remains neutral with regard to jurisdictional claims and institutional affiliations. References [1] M. M. Rahman and N. Nower, “Attention based Deep Hybrid Networks for Traffic Flow Prediction using Google Maps Data,” in Proceedings of the 2023 8th International Conference on Machine Learning Technologies, Mar. 2023, pp. 74–81. [2] M. M. Rahman, A. R. M. Jamil, and N. Nower, “Uncertainty-Aware Traffic Prediction using Attention-based Deep Hybrid Network with Bayesian Inference,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023. [3] D. Rukmana, “Rapid urbanization and the need for sustainable transportation policies in Jakarta,” IOP Conf. Ser. Earth Environ. Sci., vol. 124, p. 012017, Mar. 2018. [4] A. A. Haider, “Traffic jam: The ugly side of Dhaka’s development,” Dly. Star, vol. 13, 2018. [5] M. Sweet, “Does Traffic Congestion Slow the Economy?,” J. Plan. Lit., vol. 26, no. 4, pp. 391–404, Nov. 2011. [6] T. Peng, X. Yang, Z. Xu, and Y. Liang, “Constructing an Environmental Friendly Low-Carbon-Emission Intelligent Transportation System Based on Big Data and Machine Learning Methods,” Sustainability, vol. 12, no. 19, p. 8118, Oct. 2020. [7] T. Alghamdi, K. Elgazzar, M. Bayoumi, T. Sharaf, and S. Shah, “Forecasting Traffic Congestion Using ARIMA Modeling,” in 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Jun. 2019, pp. 1227–1232. [8] C. P. I. J. van Hinsbergen, T. Schreiter, F. S. Zuurbier, J. W. C. van Lint, and H. J. van Zuylen, “Localized Extended Kalman Filter for Scalable Real-Time Traffic State Estimation,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 1, pp. 385–394, Mar. 2012. [9] J. Guo, W. Huang, and B. M. Williams, “Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,” Transp. Res. Part C Emerg. Technol., vol. 43, pp. 50–64, Jun. 2014. [10] Y. Liu and H. Wu, “Prediction of Road Traffic Congestion Based on Random Forest,” in 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Dec. 2017, pp. 361–364. [11] X. Feng, X. Ling, H. Zheng, Z. Chen, and Y. Xu, “Adaptive Multi-Kernel SVM With Spatial–Temporal Correlation for Short-Term Traffic Flow Prediction,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 6, pp. 2001–2013, Jun. 2019. [12] Z. Mingheng, Z. Yaobao, H. Ganglong, and C. Gang, “Accurate Multisteps Traffic Flow Prediction Based on SVM,” Math. Probl. Eng., vol. 2013, pp. 1–8, 2013. [13] B. Sharma, V. Kumar Katiyar, and A. Kumar Gupta, “Fuzzy Logic Model for the Prediction of Traffic Volume in Week Days,” Int. J. Comput. Appl., vol. 107, no. 17, pp. 1–6, 2014. [14] Y. Gu, W. Lu, X. Xu, L. Qin, Z. Shao, and H. Zhang, “An Improved Bayesian Combination Model for Short -Term Traffic Prediction With Deep Learning,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 3, pp. 1332–1342, Mar. 2020. [15] L. Zhang, Q. Liu, W. Yang, N. Wei, and D. Dong, “An Improved K-nearest Neighbor Model for Short-term Traffic Flow Prediction,” Procedia - Soc. Behav. Sci., vol. 96, pp. 653–662, Nov. 2013. [16] D. Xu, Y. Wang, P. Peng, S. Beilun, Z. Deng, and H. Guo, “Real-time road traffic state prediction based on kernel- KNN,” Transp. A Transp. Sci., vol. 16, no. 1, pp. 104–118, Dec. 2020. [17] K. Kumar, M. Parida, and V. K. Katiyar, “Short Term Traffic Flow Prediction for a Non Urban Highway Using Artificial Neural Network,” Procedia - Soc. Behav. Sci., vol. 104, pp. 755–764, Dec. 2013. [18] A. Koesdwiady, R. Soua, and F. Karray, “Improving Traffic Flow Prediction With Weather Information in Connected Cars: A Deep Learning Approach,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9508–9517, Dec. 2016. [19] Y. Wu and H. Tan, “Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework,” pp. 1–14, 2016. [20] Z. Duan, Y. Yang, K. Zhang, Y. Ni, and S. Bajgain, “Improved Deep Hybrid Networks for Urban Traffic Flow Prediction Using Trajectory Data,” IEEE Access, vol. 6, pp. 31820–31827, 2018. [21] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., pp. 1–14, 2017. [22] Z. Chen, B. Zhao, Y. Wang, Z. Duan, and X. Zhao, “Multitask Learning and GCN-Based Taxi Demand Prediction for a Traffic Road Network,” Sensors, vol. 20, no. 13, p. 3776, Jul. 2020. [23] K. Guo, Y. Hu, Y. Sun, S. Qian, J. Gao, and B. Yin, “Hierarchical Graph Convolution Network for Traffic Forecasting,” Proc. AAAI Conf. Artif. Intell., vol. 35, no. 1, pp. 151–159, May 2021. [24] Y. Xu, Y. Lu, C. Ji, and Q. Zhang, “Adaptive Graph Fusion Convolutional Recurrent Network for Traffic Forecasting,” IEEE Internet Things J., no. NeurIPS, pp. 1–12, 2023. [25] A. Belhadi, Y. Djenouri, D. Djenouri, and J. C.-W. Lin, “A recurrent neural network for urban long-term traffic flow forecasting,” Appl. Intell., vol. 50, no. 10, pp. 3252–3265, Oct. 2020. [26] X. Kong, J. Zhang, X. Wei, W. Xing, and W. Lu, “Adaptive spatial-temporal graph attention networks for traffic flow forecasting,” Appl. Intell., vol. 52, no. 4, pp. 4300–4316, Mar. 2022. [27] Z. Wang, X. Su, and Z. Ding, “Long-Term Traffic Prediction Based on LSTM Encoder-Decoder Architecture,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 10, pp. 6561–6571, Oct. 2021. [28] Y. Li, S. Chai, Z. Ma, and G. Wang, “A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction,” IEEE Access, vol. 9, pp. 11264–11271, 2021. [29] M. Méndez, M. G. Merayo, and M. Núñez, “Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model,” Eng. Appl. Artif. Intell., vol. 121, p. 106041, May 2023. https://doi.org/10.1145/3589883.3589894 https://doi.org/10.1145/3589883.3589894 https://doi.org/10.1145/3589883.3589894 https://doi.org/10.14569/IJACSA.2023.01406132 https://doi.org/10.14569/IJACSA.2023.01406132 https://doi.org/10.1088/1755-1315/124/1/012017 https://doi.org/10.1088/1755-1315/124/1/012017 https://scholar.google.com/scholar?hl=id&as_sdt=0%2C5&q=Traffic+jam%3A+The+ugly+side+of+dhaka%27s+development&btnG= https://doi.org/10.1177/0885412211409754 https://doi.org/10.3390/su12198118 https://doi.org/10.3390/su12198118 https://doi.org/10.3390/su12198118 https://doi.org/10.1109/IWCMC.2019.8766698 https://doi.org/10.1109/IWCMC.2019.8766698 https://doi.org/10.1109/IWCMC.2019.8766698 https://doi.org/10.1109/TITS.2011.2175728 https://doi.org/10.1109/TITS.2011.2175728 https://doi.org/10.1109/TITS.2011.2175728 https://doi.org/10.1016/j.trc.2014.02.006 https://doi.org/10.1016/j.trc.2014.02.006 https://doi.org/10.1109/ISCID.2017.216 https://doi.org/10.1109/ISCID.2017.216 https://doi.org/10.1109/TITS.2018.2854913 https://doi.org/10.1109/TITS.2018.2854913 https://doi.org/10.1155/2013/418303 https://doi.org/10.1155/2013/418303 https://doi.org/10.5120/18840-0026 https://doi.org/10.5120/18840-0026 https://doi.org/10.1109/TITS.2019.2939290 https://doi.org/10.1109/TITS.2019.2939290 https://doi.org/10.1016/j.sbspro.2013.08.076 https://doi.org/10.1016/j.sbspro.2013.08.076 https://doi.org/10.1080/23249935.2018.1491073 https://doi.org/10.1080/23249935.2018.1491073 https://doi.org/10.1016/j.sbspro.2013.11.170 https://doi.org/10.1016/j.sbspro.2013.11.170 https://doi.org/10.1109/TVT.2016.2585575 https://doi.org/10.1109/TVT.2016.2585575 http://arxiv.org/abs/1612.01022 http://arxiv.org/abs/1612.01022 https://doi.org/10.1109/ACCESS.2018.2845863 https://doi.org/10.1109/ACCESS.2018.2845863 https://arxiv.org/abs/1609.02907 https://arxiv.org/abs/1609.02907 https://doi.org/10.3390/s20133776 https://doi.org/10.3390/s20133776 https://doi.org/10.1609/aaai.v35i1.16088 https://doi.org/10.1609/aaai.v35i1.16088 https://doi.org/10.1109/JIOT.2023.3244182 https://doi.org/10.1109/JIOT.2023.3244182 https://doi.org/10.1007/s10489-020-01716-1 https://doi.org/10.1007/s10489-020-01716-1 https://doi.org/10.1007/s10489-021-02648-0 https://doi.org/10.1007/s10489-021-02648-0 https://doi.org/10.1109/TITS.2020.2995546 https://doi.org/10.1109/TITS.2020.2995546 https://doi.org/10.1109/ACCESS.2021.3050836 https://doi.org/10.1109/ACCESS.2021.3050836 https://doi.org/10.1016/j.engappai.2023.106041 https://doi.org/10.1016/j.engappai.2023.106041 Karim et al. / Knowledge Engineering and Data Science 2023, 6 (1): 92–102 102 [30] G. Li, M. Muller, A. Thabet, and B. Ghanem, “DeepGCNs: Can GCNs Go As Deep As CNNs?,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, vol. 2019-Octob, pp. 9266–9275, doi: 10.1109/ICCV.2019.00936. [31] X. Lin and Y. Huang, “Short‐Term High-Speed Traffic Flow Prediction Based on ARIMA-GARCH-M Model,” Wirel. Pers. Commun., vol. 117, no. 4, pp. 3421–3430, Apr. 2021. [32] G. Lin, A. Lin, and D. Gu, “Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient,” Inf. Sci. (Ny)., vol. 608, pp. 517–531, Aug. 2022. https://doi.org/10.1109/ICCV.2019.00936 https://doi.org/10.1109/ICCV.2019.00936 https://doi.org/10.1109/ICCV.2019.00936 https://doi.org/10.1007/s11277-021-08085-z https://doi.org/10.1007/s11277-021-08085-z https://doi.org/10.1016/j.ins.2022.06.090 https://doi.org/10.1016/j.ins.2022.06.090