Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 6, No 2, October 2023, pp. 170–187 eISSN 2597-4637 https://doi.org/10.17977/um018v6i22023p170-187 ©2023 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Deep Learning Approaches with Optimum Alpha for Energy Usage Forecasting Aji Prasetya Wibawa a,1,*, Agung Bella Putra Utama a,2, Ade Kurnia Ganesh Akbari a,3, Akhmad Fanny Fadhilla a,4, Alfiansyah Putra Pertama Triono a,5, Andien Khansa’a Iffat Paramarta a,6, Faradini Usha Setyaputri a,7, Leonel Hernandez b,8 a Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang Jl. Semarang no. 5, Malang 65145, Indonesia b Institución Universitaria de Barranquilla IUB Cra. 45 #48-31, Nte. Centro Historico, Barranquilla 080020, Colombia 1aji.prasetya.ft@um.ac.id*; 2agungbpu02@gmail.com; 3ade.kurniaganesh.1905356@students.um.ac.id; 4akhmadfadhil512@gmail.com; 5alfiansyah.putrapt.1905356@student.um.ac.id; 6khansaandien@gmail.com; 7faradini.usha@gmail.com; 8lhernandezc@unibarranquilla.edu.co * corresponding author I. Introduction Energy usage is a critical factor in various human activities, ranging from individual to industrial scales. It plays a vital role in supporting economic growth, social welfare, and technological development [1]. However, with the increasing global demand for energy and the challenges posed by environmental changes, understanding energy usage patterns has become increasingly important. Accurate predictions about future energy use can provide significant benefits in decision-making [2], demand and supply stability [3], and energy efficiency [4]. Energy usage data often exhibits a time series nature, where information is recorded over a specific time span [5]. For example, hourly energy consumption data may be challenging to interpret directly due to its temporal nature [6]. Additionally, energy usage data can involve various attributes that contribute to the patterns and fluctuations of energy usage. Therefore, accurately forecasting future energy use poses a complex task. To overcome the complexity of analyzing energy usage data, Deep Learning (DL) has emerged as a practical approach [7]. DL is a branch of machine learning that utilizes neural networks with multiple layers and parameters to learn complex data representations [8]. Various DL models have been developed for time series analysis, including Convolutional Neural Networks (CNN) [9], Recurrent Neural Networks (RNN) [10], Long Short-term Memory (LSTM) [11], Bidirectional LSTM (Bi- LSTM) [12], and Gated Recurrent Unit (GRU) [13]. CNNs have been widely used in image ARTICLE INFO A B S T R A C T Article history: Received 17 October 2023 Revised 17 October 2023 Accepted 17 October 2023 Published online 20 October 2023 Energy use is an essential aspect of many human activities, from individual to industrial scale. However, increasing global energy demand and the challenges posed by environmental change make understanding energy use patterns crucial. Accurate predictions of future energy consumption can greatly influence decision-making, supply-demand stability and energy efficiency. Energy use data often exhibits time- series patterns, which creates complexity in forecasting. To address this complexity, this research utilizes Deep Learning (DL), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Gated Recurrent Unit (GRU) models. The main objective is to improve the accuracy of energy usage forecasting by optimizing the alpha value in exponential smoothing, thereby improving forecasting accuracy. The results showed that all DL methods experienced improved accuracy when using optimum alpha. LSTM has the most optimal MAPE, RMSE, and R2 values compared to other methods. This research promotes energy management, decision-making, and efficiency by providing an innovative framework for accurate forecasting of energy use, thus contributing to a sustainable and efficient energy system. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Energy Efficiency Forecasting Deep Learning Exponential Smoothing Optimum Alpha http://u.lipi.go.id/1502081730 http://u.lipi.go.id/1502081046 http://journal2.um.ac.id/index.php/keds mailto:keds.journal@um.ac.id https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ 171 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 recognition tasks, but they can also be applied to time series data analysis. They can automatically extract essential features from time series data, such as seasonal patterns, trends, cycles, and irregularities. Unlike 2D-CNNs, which require converting time series data into image format, 1D- CNNs [14] can directly process time series data without the need for image conversion. RNNs, particularly LSTM, are well-suited for modeling temporal dependencies in time series data [15]. RNNs maintain a hidden state that captures information about previous time steps, allowing them to capture long-term dependencies. LSTM, in particular, addresses the vanishing gradient problem commonly encountered in traditional RNNs. The vanishing gradient problem occurs when the gradient approaches zero, preventing updates to the network weights and causing the loss of time series data characteristics. LSTM overcomes this issue by using memory cells and gates to store and control the temporary state of the network [16]. Bi-LSTM is an extension of the LSTM model that incorporates information from both past and future time steps. It consists of two LSTM layers, one processing the input sequence in the forward direction and the other in the backward direction. By considering information from both directions, Bi-LSTM can capture more comprehensive temporal dependencies in the data [17]. This bidirectional nature makes Bi-LSTM particularly effective in tasks where future information is crucial for accurate predictions, such as energy usage forecasting. GRU, on the other hand, is a simplified version of LSTM that aims to address the computational. In this study, we aim to explore the application of DL models with optimum alpha for energy usage forecasting. We will compare the performance of different DL models and evaluate their effectiveness in capturing the complex patterns and fluctuations in energy usage data. Additionally, we will investigate the impact of data normalization techniques on the performance of DL models. The findings of this research will contribute to the development of accurate and efficient energy usage forecasting models, which can aid in decision-making and promote energy efficiency in various sectors. Overall, this study aims to address the challenges in analyzing energy usage data by leveraging the power of DL models. By utilizing DL models, we can extract meaningful features and capture temporal dependencies in the data, leading to improved energy usage forecasting. The results of this research will provide valuable insights for energy management and planning, contributing to a more sustainable and efficient energy future. II. Methods To facilitate a more systematic research approach, experiments were devised, as illustrated in Figure 1. In essence, a comparison was made between the Smoothed Deep Learning (S-DL) method using optimum alpha and the primary DL method. Various evaluation metrics were also employed to assess the performance of the optimum alpha-enhanced results. Further details regarding Figure 1 will be expounded upon in the following subsections. A. Dataset The dataset used in this study uses the Hourly Energy Demand Time Series Forecast dataset from kaggle [18]. This dataset encompasses a span of four years (January 2015 to December 2018) and encompasses information regarding electricity usage, production, pricing, and meteorological conditions in Spain. Specifically, data on electricity consumption and generation was sourced from ENTSOE, a publicly accessible platform for Transmission Service Operator (TSO) data. Settlement prices, on the other hand, were acquired from the Spanish TSO, Red Electric España. Additionally, weather data for the five largest cities in Spain was procured as part of a personal project, and it was subsequently made available to the public through the Open Weather API. What sets this dataset apart is its inclusion of detailed hourly records for electricity consumption, alongside forecasts provided by the TSO for both consumption and pricing. This dataset consists of 29 attributes that have 35064 instances with float data type. The target attribute used in this study is the actual total load attribute whose data visualization can be seen in Figure 2. As for the total load forecast attribute, it is not used in the research because the presence of this attribute in the data serves as a benchmark or comparison attribute with the target attribute. In addition, there are 2 attributes that are deleted because they have NaN values. The total load forecast attribute was also not used in the study because the presence of this attribute in the data serves as a A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 172 benchmark or comparison attribute with the target attribute. Therefore, the total attributes to be used are 26. Fig. 1. Experimental schema Fig. 2. Total load actual B. Exponential Smoothing with Optimum α Exponential smoothing is a widely used technique in time series forecasting that aims to eliminate noise and capture underlying patterns in data [19]. It achieves this by assigning weights to previous observations, with higher weights given to more recent data points. The smoothing factor, denoted as α (alpha), determines the weight assigned to the most recent observation [20]. The concept of optimum α arises from the need to find the best value for the smoothing factor that maximizes the accuracy of the forecasting model [21]. The choice of α depends on the specific characteristics of the time series data and the desired forecasting task. The goal is to select the value of α that minimizes the forecasting error or maximizes the accuracy of the predictions. To determine the optimum α, various approaches can be employed. One standard method is to perform a grid search or optimization algorithm to evaluate different values of α and select the one that yields the lowest forecasting error. The process of finding the optimum α involves balancing the 173 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 trade-off between responsiveness to recent changes in the data and the level of smoothing applied. A higher α value gives more weight to recent observations, making the model more responsive to short- term fluctuations but potentially less stable. Conversely, a lower α value places more emphasis on historical data, resulting in a smoother forecast but potentially slower to adapt to changes. This process considers the characteristics of the time series data and the specific forecasting objectives, striking a balance between responsiveness and stability in the model's predictions. Equation (1) and (2) offer the single exponential smoothing [22] when 𝑡 = 0. The smoothed data 𝑆𝑡 is the result of smoothing the raw data {𝑋𝑡 }. The smoothing factor, 𝛼 is a value that determines the level of smoothing. The range of 𝛼 is between 0 and 1 (0 ≤ 𝛼 ≤1). When 𝛼 close to 1, the learning process is fast because it has a less smoothing effect. In contrast, values of 𝛼 closer to 0 have a more significant smoothing effect and are less responsive to recent changes (slow learning). 𝑆𝑡 = 𝛼𝑋𝑡 + (1 − 𝛼) 𝑆𝑡−1 , 𝑡 > 0 (1) 𝑆𝑡 = 𝑆𝑡−1 + 𝛼( 𝑋𝑡 − 𝑆𝑡−1) (2) 𝑂𝑝𝑡𝑖𝑚𝑢𝑚 𝛼 = ( 𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛) − 1 𝑛 ∑ 𝑋𝑡 𝑛 𝑖=1 𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛 (3) The substitution of Equation (3) to (2) results in the following Equation (4). We use the optimum smoothed result (𝑆𝑡 ) to improve the DL methodperformance [21]. Pseudocode 1 show how to find the optimum alpha for exponential smoothing 𝑆𝑡 = 𝑆𝑡−1 + ( 𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛) − 1 𝑛 ∑ 𝑋𝑡 𝑛 𝑖=1 𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛 ( 𝑋𝑡 − 𝑆𝑡−1) (4) PSEUDOCODE 1. Find the optimum alpha for exponential smoothing Input: - Data time series Output: - Optimum value of alpha Procedure FindOptimumAlpha(data): Set alpha_min = 0.1 // minimum value of alpha Set alpha_max = 0.9 // maximum value of alpha Set alpha_step = 0.1 // increment step for alpha Set error_min = infinity // minimum error value Set alpha_optimum = 0 // optimum value of alpha For alpha = alpha_min to alpha_max step alpha_step: Apply exponential smoothing with alpha to the data Calculate the error by comparing the predicted values with the actual data If error < error_min: Set error_min = error Set alpha_optimum = alpha Return alpha_optimum as the optimum value of alpha End Procedure A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 174 C. Data Normalization In this research, preprocessing is done by changing the original data so that it can be processed for further testing [23]. The inherent characteristics of the majority of time-series data exhibit dynamic and non-linear behavior [24]. The preprocessing carried out in this study is by normalize the data. Data normalization is an essential preprocessing step in energy usage forecasting to ensure that the input data is standardized and comparable across different scales. Normalization techniques transform the data into a standard range, typically between 0 and 1, without distorting the original distribution. This process helps to eliminate the influence of outliers and extreme values, making the data more suitable for training DL models. The choice of normalization technique depends on the characteristics of the energy usage data and the specific requirements of the forecasting task. It is essential to experiment with different normalization techniques and evaluate their impact on the performance of DL models. Proper data normalization can improve the convergence speed of the models, prevent numerical instability, and enhance the overall accuracy of energy usage forecasting. One commonly used normalization technique in this research is using Min-Max scaling, also known as feature scaling. This method rescales the data by subtracting the minimum value and dividing by the range (maximum value minus minimum value). The resulting values are then within the range of 0 to 1 [25]. Min-Max scaling preserves the relative relationships between the data points and is particularly useful when the distribution of the data is known to be bounded as in (5). Pseudocode 2 present the process for the normalization. 𝑋𝑡(𝑛𝑜𝑟𝑚) = 𝑋𝑡 − 𝑋 𝑚𝑖𝑛 𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛 (5) 𝑋𝑡(𝑛𝑜𝑟𝑚) is the result of normalization, 𝑋𝑡 is the data to be normalized, while 𝑋 𝑚𝑖𝑛 and 𝑋 𝑚𝑎𝑥 stand for the minimum and maximum value of the entire data. PSEUDOCODE 2. Normalization using Min-Max Input: -Data to be normalized (X), minimum value of the data (X_min), maximum value of the data (X_max) Output: -Normalized data (X_norm) Procedure Min-Max Normalization Calculate the range of the data: a. Set X_range = X_max - X_min Normalize the data: a. For each data point X_t in X: i. Calculate the normalized value X_norm_t using the formula: X_norm_t = (X_t - X_min) / X_range ii. Append X_norm_t to the normalized data X_norm Return the normalized data X_norm End Procedure D. PSO Hyperparameter Tuning Particle Swarm Optimization (PSO) is a metaheuristic optimization algorithm inspired by the social behavior of bird flocking or fish schooling [26]. It is commonly used to tune the hyperparameters of machine learning models, including Deep Learning (DL) models [27]. In this section, we will discuss the application of PSO for hyperparameter tuning in DL models for energy usage forecasting. Hyperparameters are parameters that are not learned directly from the data but are set by the user before training the model. They control the behavior and performance of the DL model, such as the learning rate, number of hidden layers, and number of neurons in each layer. Finding the optimal values for these hyperparameters is crucial for achieving the best performance of the DL model. 175 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 PSO works by simulating the movement of particles in a multidimensional search space. Each particle represents a potential solution, and its position in the search space corresponds to a set of hyperparameters. The particles move towards the best solution found so far, called the global best, and are influenced by their own best solution, called the personal best. Through iterations, the particles explore the search space and converge towards the optimal solution. A general outline of the PSO hyperparameter tuning process can be seen in the Pseudocode 3. In the context of DL models for energy usage forecasting, PSO can be used to tune hyperparameters such as the number of DL layers, the number of neurons in each layer, the batch size, and the dropout rate like in Table 1. By searching the hyperparameter space using PSO, we can find the combination of hyperparameters that leads to the best performance of the DL model in terms of accuracy and prediction error. PSEUDOCODE 3. PSO hyperparameter tuning Input: - Data for training and validation - Hyperparameter search space Output: - Best hyperparameter settings Procedure PSO_Hyperparameter_Tuning(data): Set population_size = 50 // Number of particles in the swarm Set max_iterations = 100 // Maximum number of iterations Set c1 = 2.0 // Cognitive parameter Set c2 = 2.0 // Social parameter Set w = 0.7 // Inertia weight // Initialize the swarm Initialize_swarm(population_size) // Evaluate initial particle positions Evaluate_particles(data) // Set the global best position and fitness Set_global_best() // Main PSO loop for iteration = 1 to max_iterations do: for each particle in the swarm do: // Update particle velocity Update_velocity(particle, global_best) // Update particle position Update_position(particle) // Evaluate new particle position Evaluate_particle(data, particle) // Update personal best position and fitness Update_personal_best(particle) // Update global best position and fitness Update_global_best(particle) // Return the best hyperparameter settings Return global_best_position End Procedure To apply PSO for hyperparameter tuning, we need to define the fitness function that evaluates the performance of the DL model with a specific set of hyperparameters. The PSO algorithm then iteratively updates the positions of the particles based on their personal best, global best, and the inertia weight, which controls the balance between exploration and exploitation. By searching the hyperparameter space using PSO, we can find the optimal combination of hyperparameters that leads to improved performance and accurate predictions. This approach can enhance the effectiveness of A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 176 DL models in energy usage forecasting and contribute to better decision-making and energy management. Table 1. PSO hyperparameter tuning search space Parameter Search Space Batch Size ‘100’, ‘1000’ Epoch ’50’, ‘100’ Hidden Layer ‘2’, ‘5’, ‘10’ Loss Function ‘MSE’, ‘MAE’, ’huberloss’ Neuron ‘32’, ‘64’ Optimizer ‘adam’, ‘rmsprop’ E. Performance Analysis To measure the performance analysis in the study, we used methods in DL. DL is a subset of machine learning algorithms. DL itself is often called a deep neural network [28]. Neural networks are computational models that work by mimicking the behavior of the human brain [29]. Basically, DL is a neural network that has many layers and parameters [30]. The number of layers in DL allows the model to be able to analyze large amounts of data and have complex relationships. Early layers are used to learn simple features, while deeper layers learn more complex features [31]. • Convolutional Neural Network (CNN) CNN, especially 2D-CNN, have revolutionized picture classification. However, one dimension (1D-CNN), excel at time-series data classification [14]. 1D-CNN can automatically learn the internal representation of time-series data and detect essential characteristics without operator intervention [21]. The internal representation of time-series data includes seasonality, trends, cycles, and abnormalities. These properties are essential for time-series data analysis and prediction. These internal representations can be recorded and used for classification by 1D-CNN. 1D-CNN operate directly on time-series data, unlike 2D-CNN, which convert input data into numbers. This simplifies workflow by eliminating preprocessing processes. 1D-CNNs may capture temporal connections and identify significant patterns by directly examining sequential data [27] Overall, 1D-CNN for time- series data categorization have many benefits. This allows automatic feature extraction for more efficient and accurate time-series data analysis. Direct processing of time-series data eliminates complex data transformations, simplifying modeling. Thus, 1D-CNN are useful for time-series data analysis and classification. The 1D-CNN architecture present in Figure 3 and for the pseudocde of CNN forecasting process can be seen in Pseudocode 4. Fig. 3. 1D-CNN architecture 177 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 PSEUDOCODE 4. CNN forecasting process Input: - Energy Dataset - Setting parameters according to the results of PSO hyperparameter tuning Output: - Trained CNN model Procedure Train_CNN(training_data, validation_data, num_conv_layers, num_filters, filter_size, num_fc_layers, num_neurons, learning_rate, num_epochs): Initialize CNN model // Add convolutional layers for i = 1 to num_conv_layers do: Add convolutional layer with num_filters[i] filters and filter_size[i] filter size // Flatten the output from convolutional layers Flatten() // Add fully connected layers for i = 1 to num_fc_layers do: Add fully connected layer with num_neurons[i] neurons // Compile the model Compile model with appropriate loss function and optimizer // Train the model Train model on training_data with validation_data, using learning_rate and num_epochs // Return the trained model Return trained CNN model End Procedure • Recurrent Neural Network (RNN) The RNN developed by Paul Werbos and Ronald J. Williams in the 1980s and 1990s is the most commonly used model in deep learning [32]. RNNs are a class of deep learning models designed to process sequential data. The main characteristic of RNNs is the presence of recurrent connections in the network, which allows them to maintain a hidden state that captures information about previous time steps [33]. This hidden state makes RNNs particularly suitable for modeling temporal dependencies in time series data. The architecture includes a series of recurrent cells, each processing input data and updating the hidden state using recurrent connections. This recurrent structure allows the RNN to cope with sequences of varying length. The RNN architecture can be seen in Figure 4. Pseudocde 5 present the RNN forecasting process. Fig. 4. RNN architecture A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 178 PSEUDOCODE 5. RNN forecasting process Input: - Energy Dataset - Setting parameters according to the results of PSO hyperparameter tuning Output: - Trained RNN model Procedure RNN_Training(training_data, num_hidden_units, learning_rate, num_epochs): Initialize weights and biases for input-to-hidden and hidden-to-hidden connections Initialize the hidden state for epoch = 1 to num_epochs do: for each training example in training_data do: // Forward pass for t = 1 to sequence_length do: Update hidden state using the current input and previous hidden state // Backward pass for t = sequence_length to 1 do: Calculate the gradient of the loss with respect to the output Update the weights and biases of the hidden-to-hidden connections Calculate the gradient of the loss with respect to the hidden state Update the weights and biases of the input-to-hidden connections // Return the trained RNN model Return trained_model End Procedure • Long Short-term Memory (LSTM) Vanishing gradient found in RNN is a condition when the gradient approaches 0 so that the gradient cannot provide updates to the weights in the network and make the time series data lose its characteristics [34]. Vanishing gradient is caused by using the same weight at each time-step. LSTM can overcome the vanishing gradient problem in RNN. The concept of Long Short-term Memory (LSTM) was first published in 1997 by Hochreiter and Schmidhuber [35]. LSTM analyzes time series data for the long term by applying a collection of short-term memories. This model develops the information storage capacity of RNNs by using "memory cells" [36]. Memory cells have connections that store the temporary state of the network and are controlled through 3 "gates", namely forget gate, input gate, and output gate [37]. Figure 5 represents the memory cell of LSTM and for the pseudocde of LSTM forecasting process can be seen in Pseudocode 6. Fig. 5. Memory cell LSTM 179 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 PSEUDOCODE 6. LSTM forecasting process Input: - Energy Dataset - Setting parameters according to the results of PSO hyperparameter tuning Output: - Trained LSTM model Procedure LSTM_Model(training_data, testing_data, num_layers, num_hidden_units, num_output_units, learning_rate, num_epochs): // Initialize LSTM model Initialize_LSTM(num_layers, num_hidden_units, num_output_units) Set_Num_Epochs(num_epochs) // Train LSTM model for epoch = 1 to num_epochs do: // Forward pass for each training example in training_data do: // Reset LSTM hidden state Reset_Hidden_State() // Iterate through each time step for t = 1 to length(training_example) do: // Perform LSTM forward pass LSTM_Forward_Pass(training_example[t]) // Backward pass for each training example in training_data do: // Reset LSTM gradients Reset_Gradients() // Iterate through each time step in reverse order for t = length(training_example) to 1 do: // Perform LSTM backward pass LSTM_Backward_Pass(training_example[t]) // Update LSTM weights Update_Weights() // Test LSTM model for each testing example in testing_data do: // Reset LSTM hidden state Reset_Hidden_State() // Iterate through each time step for t = 1 to length(testing_example) do: // Perform LSTM forward pass LSTM_Forward_Pass(testing_example[t]) // Return trained LSTM model Return LSTM_Model End Procedure • Bidirectional LSTM (Bi-LSTM) The Bi-LSTM model is a variant of the LSTM model that incorporates bidirectional processing. It consists of two LSTM layers, one processing the input sequence in the forward direction and the other processing it in the backward direction. This bidirectional processing allows the model to capture both past and future dependencies in the data, making it particularly effective for time series analysis [38]. In the forward LSTM layer, the input sequence is processed from the beginning to the end, capturing the temporal dependencies and patterns in the data. This layer maintains a hidden state that stores information about the past time steps. The backward LSTM layer, on the other hand, processes the input sequence in reverse order, capturing the dependencies and patterns in the opposite direction. This layer maintains a separate hidden state that stores information about the future time A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 180 steps. By combining the outputs of both LSTM layers, the Bi-LSTM model can effectively capture the dependencies in both directions [39]. This allows the model to have a more comprehensive understanding of the temporal dynamics in the data. The outputs of the Bi-LSTM layers are then fed into a fully connected layer, which performs a non-linear transformation on the data and produces the final forecasted values. The Bi-LSTM architecture can be seen in Figure 6. Pseudocde 7 show the Bi-LSTM forecasting process Fig. 6. Bi-LSTM Architecture PSEUDOCODE 7. Bi-LSTM forecasting process Input: - Energy Dataset - Setting parameters according to the results of PSO hyperparameter tuning Output: - Trained Bi-LSTM model Procedure Train_BiLSTM(training_data, validation_data, num_lstm_layers, num_lstm_units, learning_rate, num_epochs): Initialize Bi-LSTM model // Add LSTM layers for i = 1 to num_lstm_layers do: Add forward LSTM layer with num_lstm_units[i] units Add backward LSTM layer with num_lstm_units[i] units // Compile the model Compile model with appropriate loss function and optimizer // Train the model Train model on training_data with validation_data, using learning_rate and num_epochs // Return the trained model Return trained Bi-LSTM model End Procedure • Gated Recurrent Units (GRU) The GRU model is a sophisticated RNN variation used for sequential data processing and forecasting. It captures long-term time series dependencies well. Training the GRU model with the training set helps it understand data patterns and relationships. The GRU model uses gating techniques to preserve or discard past time step information, unlike RNNs [40] The reset and update gates control network information flow. The reset gate specifies which bits of the prior concealed state to forget, while the update gate determines how much new information to add. GRU models data temporal dependencies by selectively updating and forgetting information. This adaptive retention or discard capacity allows the model to capture short-term and long-term patterns, making it suitable for time series forecasting, speech recognition, and natural language processing. Its gating 181 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 features and capacity to identify long-term dependencies make the GRU model a formidable sequential data analysis and forecasting tool. It is used in many fields where understanding and predicting time series data patterns is crucial because to its flexibility and efficacy [41]. Figure 7 represents the structure of GRU cell and for the pseudocde of GRU forecasting process can be seen in Pseudocode 8. Fig. 7. GRU cell structure PSEUDOCODE 8. GRU forecasting process Input: - Energy Dataset - Setting parameters according to the results of PSO hyperparameter tuning Output: - Trained GRU model Procedure Train_GRU(training_data, validation_data, num_gru_layers, num_hidden_units, learning_rate, num_epochs): Initialize GRU model // Add GRU layers for i = 1 to num_gru_layers do: Add GRU layer with num_hidden_units[i] hidden units // Compile the model Compile model with appropriate loss function and optimizer // Train the model Train model on training_data with validation_data, using learning_rate and num_epochs // Return the trained model Return trained GRU model End Procedure F. Data Analysis Performance testing is an essential step in evaluating the effectiveness and efficiency of energy usage forecasting models [42]. It involves assessing the model's ability to accurately predict future energy usage based on historical data. In this section, we will discuss the performance testing process and metrics used to evaluate the DL models' forecasting performance. This research uses Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and the Coefficient of Determination (R2) as calculations. MAPE measures the extent to which forecasting or prediction distinguishes between predicted and actual energy values in percentage terms as in (6). A lower MAPE indicates a more accurate model A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 182 [43]. RMSE calculates the square root of the average squared difference between the predicted and actual energy usage values [44]. RMSE is used to determine how sensitive the existing DL model can detect outliers in the energy forecasting value compared to the original value, as in (7). Additionally, the R2 is often used to assess the goodness of fit of the model as in (8). R2 represents the proportion of the variance in the energy usage data that is predictable by the model. A higher R2 value indicates a better fit of the model to the data [45]. 𝑀𝐴𝑃𝐸 = 1 𝑛 ∑ |𝐴𝑖−𝐹𝑖| 𝐴𝑖 𝑛 𝑖=1 (6) 𝑀𝑆𝐸 = √ 1 𝑛 ∑ (𝐹𝑖 − 𝐴𝑖) 2𝑛 𝑖=1 (7) 𝑅2 = 1 − 𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑡𝑜𝑡 (8) 𝐴𝑖 is the actual value, 𝐹𝑖 is the predicted value, 𝑛 is the number of predictions, 𝑆𝑆𝑟𝑒𝑠 is the residual sum of squares, and SS𝑡𝑜𝑡 is the total sum of squares. We also logged the computational time for each method, which serves as an additional performance metric. We designate the best method with the shortest computational time expenditure. By conducting performance testing and evaluating the accuracy and efficiency of DL models, we can gain insights into their effectiveness in energy usage forecasting. This information can guide decision-making processes, improve energy management strategies, and contribute to the development of more sustainable and efficient energy systems. III. Result and Discussion Figure 8 through Figure 11 illustrate the comparison between DL and S-DL across all method, with a smoothing factor of α = 0.1 applied to the S-DL. The setting paramter of all method is used from the PSO hyperparameter tuning search space result as present in Table 2. Table 2. PSO hyperparameter tuning search space Parameter Search Space Result Batch Size 100 Epoch 50 Hidden Layer 2 Loss Function MSE Neuron 32 Optimizer Rmsprop Figure 8 provides a valuable comparative assessment of MAPE for various prediction methods in two scenarios: "Without Smoothing" and "Smoothing with Optimum Alpha," highlighting the impact of smoothing techniques, specifically optimized with an alpha value, on predictive accuracy. In both scenarios, MAPE values indicate that LSTM consistently outperforms other methods, exhibiting the lowest MAPE values (3.9065%) and demonstrating exceptional predictive accuracy. Conversely, Bi- LSTM continuously records the highest MAPE values (7.6464%), suggesting lower predictive accuracy regardless of smoothing. Overall, Figure 8 underscores the significance of optimizing smoothing techniques with an alpha value to enhance predictive accuracy in data analysis and forecasting tasks. Although the average increase in MAPE value across all methods was a modest 0.1385%, LSTM consistently proves to be the most accurate method. At the same time, Bi-LSTM always lags in predictive accuracy in both scenarios. These findings emphasize the importance of judiciously applying smoothing techniques for improved predictive performance. Figure 9 compares RMSE values for various prediction methods under two scenarios: "Without Smoothing" and "Smoothing with Optimum Alpha." A consistent trend observed in the data is the slight reduction in RMSE values across all methods when "Smoothing with Optimum Alpha" is applied, indicating improved prediction accuracy through smoothing. LSTM consistently outperforms other methods by achieving the lowest RMSE values in both scenarios (0.0624 and 0.0621), underscoring its accuracy. In contrast, Bi-LSTM consistently exhibits the highest RMSE values, 183 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 suggesting lower prediction accuracy regardless of smoothing (0.1252 and 0.1228). In summary, Figure 9 emphasizes the positive impact of smoothing techniques, particularly when optimized with an alpha value, on enhancing the predictive accuracy of these methods. The average decrease in RMSE of all methods after using smoothing with optimum alpha is 0.0061, this shows that smoothing with optimum alpha can detect outliers better and more sensitively. LSTM is the most accurate method, while Bi-LSTM consistently demonstrates the least accuracy, highlighting the significance of thoughtful smoothing application in data analysis and forecasting tasks. Fig. 8. MAPE evaluation result Fig. 9. RMSE evaluation result Figure 10 presents a comparative R2 values analysis for various prediction methods in two scenarios. The data consistently reveals that R2 values improve across all forms when "Smoothing with Optimum Alpha" is applied, indicating a superior fit to the dataset. LSTM consistently outperforms other methods by achieving the highest R2 values in both scenarios (0.9021 and 0.9027), confirming its strong alignment with the data. In contrast, Bi-LSTM continuously records the lowest R2 values, indicating a relatively weaker fit regardless of smoothing (0.6042 and 0.6195). Basically, Figure 10 underscores the positive impact of optimizing smoothing techniques with an alpha value on improving the goodness of fit. LSTM consistently excels at fitting the data, while Bi-LSTM A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 184 consistently demonstrates the weakest fit in both scenarios. These findings stress the importance of judiciously applying smoothing techniques to enhance the performance of these methods in data analysis and forecasting. Fig. 10. R2 evaluation result Figure 11 provides insights to assess how the application of smoothing techniques, optimized with an alpha value, impacts the computational efficiency of these methods when handling data. The data reveals that, in most cases, "Smoothing with Optimum Alpha" leads to reduced computational times compared to the "Without Smoothing" scenario. This suggests that smoothing can improve the computational efficiency of these methods. CNN consistently shows shorter computational times in both scenarios, highlighting its efficiency. Conversely, Bi-LSTM and GRU require more time for computations, particularly without smoothing. These findings emphasize the importance of considering computational efficiency when choosing prediction methods for data analysis and forecasting tasks. Fig. 11. Computational time evalution result Overall, the use of an optimum alpha value can significantly enhance the forecasting results for energy data in all DL method. In this study, the LSTM model consistently stands out as the top choice, yielding the lowest MAPE and RMSE values while achieving the highest R2 value. For computation time, LSTM is also in the middle, not too fast and not too long. This indicates that LSTM not only provides a high level of prediction accuracy but also offers the best fit to the existing data compared to the other evaluated methods. 185 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 The implications of these findings in the field of energy are that selecting the suitable model or method, especially when using an optimum alpha value, can significantly improve the accuracy of predictions in energy resource planning and management. In the energy sector, more accurate predictions can have a positive impact on optimizing energy usage, reducing waste, and supporting environmental sustainability. Furthermore, the use of DL and optimized methods like LSTM in energy forecasting opens up opportunities to develop more intelligent and more efficient solutions for energy supply management, especially in situations where energy sustainability and efficiency are becoming increasingly crucial. IV. Conclusions In conclusion, this study underscores the significant impact of optimizing smoothing techniques with an optimum alpha (𝛼) value on enhancing the accuracy of energy usage forecasting using DL models. Among the models tested, LSTM consistently outperforms others, displaying the lowest MAPE (3.9065%) and RMSE (0.0621) values and the highest R2 (0.9027), making it the top choice for accurate predictions. Notably, the application of optimum alpha values has proven to be more successful in terms of improving prediction accuracy across various metrics. Computational efficiency is also a critical consideration, with CNN demonstrating shorter computation times (57s). Limitations of this research include the specific dataset used, which may not be entirely representative of all energy usage scenarios, and the computational resources required for LSTM. Future research should explore the generalizability of these findings across diverse energy datasets and further investigate the computational optimization of LSTM. These findings have crucial implications for energy resource management, as more accurate predictions can aid in optimizing energy usage, reducing waste, and supporting environmental sustainability, emphasizing the relevance of thoughtful model selection and hyperparameter tuning. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence the work reported in this paper. Additional information Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. Publisher’s Note: Department of Electrical Engineering and Informatics - Universitas Negeri Malang remains neutral with regard to jurisdictional claims and institutional affiliations. References [1] A. Sharif, S. Kocak, H. H. A. Khan, G. Uzuner, and S. Tiwari, “Demystifying the links between green technology innovation, economic growth, and environmental tax in ASEAN-6 countries: The dynamic role of green energy and green investment,” Gondwana Res., vol. 115, pp. 98–106, Mar. 2023. [2] P. Ma, S. Cui, M. Chen, S. Zhou, and K. Wang, “Review of Family-Level Short-Term Load Forecasting and Its Application in Household Energy Management System,” Energies, vol. 16, no. 15, p. 5809, Aug. 2023. [3] L. Malka, F. Bidaj, A. Kuriqi, A. Jaku, R. Roçi, and A. Gebremedhin, “Energy system analysis with a focus on future energy demand projections: The case of Norway,” Energy, vol. 272, p. 127107, Jun. 2023. [4] S. Kapp, J.-K. Choi, and T. Hong, “Predicting industrial building energy consumption with statistical and machine- learning models informed by physical system parameters,” Renew. Sustain. Energy Rev., vol. 172, p. 113045, Feb. 2023. [5] Y. Zou, R. V. Donner, N. Marwan, J. F. Donges, and J. Kurths, “Complex network approaches to nonlinear time series analysis,” Phys. Rep., vol. 787, pp. 1–97, Jan. 2019. http://journal2.um.ac.id/index.php/keds https://doi.org/10.1016/j.gr.2022.11.010 https://doi.org/10.1016/j.gr.2022.11.010 https://doi.org/10.1016/j.gr.2022.11.010 https://doi.org/10.3390/en16155809 https://doi.org/10.3390/en16155809 https://doi.org/10.1016/j.energy.2023.127107 https://doi.org/10.1016/j.energy.2023.127107 https://doi.org/10.1016/j.rser.2022.113045 https://doi.org/10.1016/j.rser.2022.113045 https://doi.org/10.1016/j.rser.2022.113045 https://doi.org/10.1016/j.physrep.2018.10.005 https://doi.org/10.1016/j.physrep.2018.10.005 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 186 [6] A. Pranolo, Y. Mao, A. P. Wibawa, A. B. P. Utama, and F. A. Dwiyanto, “Robust LSTM With Tuned-PSO and Bifold- Attention Mechanism for Analyzing Multivariate Time-Series,” IEEE Access, vol. 10, pp. 78423–78434, 2022. [7] A. Pranolo, Y. Mao, A. P. Wibawa, A. B. P. Utama, and F. A. Dwiyanto, “Optimized Three Deep Learning Models Based-PSO Hyperparameters for Beijing PM2.5 Prediction,” Knowl. Eng. Data Sci., vol. 5, no. 1, p. 53, Nov. 2022. [8] J. Naskath, G. Sivakamasundari, and A. A. S. Begum, “A Study on Different Deep Learning Algorithms Used in Deep Neural Nets: MLP SOM and DBN,” Wirel. Pers. Commun., vol. 128, no. 4, pp. 2913–2936, 2023. [9] I. Koprinska, D. Wu, and Z. Wang, “Convolutional Neural Networks for Energy Time Series Forecasting,” in 2018 International Joint Conference on Neural Networks (IJCNN), Jul. 2018, pp. 1–8. [10] H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent Neural Networks for Time Series Forecasting: Current status and future directions,” Int. J. Forecast., vol. 37, no. 1, pp. 388–427, Jan. 2021. [11] G. Bathla, R. Rani, and H. Aggarwal, “Stocks of year 2020: prediction of high variations in stock prices using LSTM,” Multimed. Tools Appl., vol. 82, no. 7, pp. 9727–9743, Mar. 2023. [12] M. Yang and J. Wang, “Adaptability of Financial Time Series Prediction Based on BiLSTM,” Procedia Comput. Sci., vol. 199, pp. 18–25, 2022. [13] A. N. . F. Faisal, A. Rahman, M. T. M. Habib, A. H. Siddique, M. Hasan, and M. M. Khan, “Neural networks based multivariate time series forecasting of solar radiation using meteorological data of different cities of Bangladesh,” Results Eng., vol. 13, p. 100365, Mar. 2022. [14] A. R. F. Dewandra, A. P. Wibawa, U. Pujianto, A. B. P. Utama, and A. Nafalski, “Journal Unique Visitors Forecasting Based on Multivariate Attributes Using CNN,” Int. J. Artif. Intell. Res., vol. 6, no. 1, 2022. [15] F. Kurniawan, S. Sulaiman, S. Konate, and M. A. A. Abdalla, “Deep learning approaches for MIMO time-series analysis,” Int. J. Adv. Intell. Informatics, vol. 9, no. 2, p. 286, Jul. 2023. [16] Y. Mao, A. Pranolo, A. P. Wibawa, A. B. Putra Utama, F. A. Dwiyanto, and S. Saifullah, “Selection of Precise Long Short Term Memory (LSTM) Hyperparameters based on Particle Swarm Optimization,” in 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), May 2022, pp. 1114–1121. [17] X. Zhou, A. Pranolo, and Y. Mao, “AB-LSTM: Attention Bidirectional Long Short-Term Memory for Multivariate Time-Series Forecasting,” in 2023 International Conference on Computer, Electronics & Electrical Engineering & their Applications (IC2E3), Jun. 2023, pp. 1–6. [18] M. Elsaraiti, G. Ali, H. Musbah, A. Merabet, and T. Little, “Time Series Analysis of Electricity Consumption Forecasting Using ARIMA Model,” in 2021 IEEE Green Technologies Conference (GreenTech), Apr. 2021. [19] A. B. F. Khan, K. Kamalakannan, and N. S. S. Ahmed, “Integrating Machine Learning and Stochastic Pattern Analysis for the Forecasting of Time-Series Data,” SN Comput. Sci., vol. 4, no. 5, p. 484, Jun. 2023. [20] M. Skariah and C. D. Suriyakala, “Forecasting reservoir inflow combining Exponential smoothing, ARIMA, and LSTM models,” Arab. J. Geosci., vol. 15, no. 14, p. 1292, Jul. 2022. [21] A. P. Wibawa, A. B. P. Utama, H. Elmunsyah, U. Pujianto, F. A. Dwiyanto, and L. Hernandez, “Time-series analysis with smoothed Convolutional Neural Network,” J. Big Data, vol. 9, no. 1, p. 44, Dec. 2022. [22] V. Prema and K. U. Rao, “Development of statistical time series models for solar power prediction,” Renew. Energy, vol. 83, pp. 100–109, Nov. 2015. [23] S. Huber, H. Wiemer, D. Schneider, and S. Ihlenfeldt, “DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model,” Procedia CIRP, vol. 79, pp. 403–408, 2019. [24] A. Tealab, H. Hefny, and A. Badr, “Forecasting of nonlinear time series using ANN,” Futur. Comput. Informatics J., vol. 2, no. 1, pp. 39–47, 2017. [25] K. APARNA, “Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data,” Sādhanā, vol. 44, no. 2, p. 45, Feb. 2019. [26] L. Vanneschi and S. Silva, “Particle Swarm Optimization,” in Natural Computing Series, 2023, pp. 105–111. [27] A. B. P. Utama, A. P. Wibawa, Muladi, and A. Nafalski, “PSO based Hyperparameter tuning of CNN Multivariate Time-Series Analysis,” J. Online Inform., vol. 7, no. 2, pp. 193–202, 2022. [28] M. Abo-Tabik, N. Costen, J. Darby, and Y. Benn, “Towards a Smart Smoking Cessation App: A 1D -CNN Model Predicting Smoking Events,” Sensors, vol. 20, no. 4, p. 1099, Feb. 2020. [29] W. J. Zhang, G. Yang, Y. Lin, C. Ji, and M. M. Gupta, “On Definition of Deep Learning,” in 2018 World Automation Congress (WAC), Jun. 2018, pp. 1–5. [30] D. A. Bashar, “Survey on Evolving Deep Learning Neural Network Architectures,” J. Artif. Intell. Capsul. Netwo rks, vol. 2019, no. 2, pp. 73–82, Dec. 2019. [31] P. P. Shinde and S. Shah, “A Review of Machine Learning and Deep Learning Applications,” in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Aug. 2018, pp. 1–6. [32] H. Apaydin, H. Feizi, M. T. Sattari, M. S. Colak, S. Shamshirband, and K. W. Chau, “Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting,” Water (Switzerland), vol. 12, no. 5, pp. 1–18. [33] A. Zanfei, B. M. Brentan, A. Menapace, M. Righetti, and M. Herrera, “Graph Convolutional Recurrent Neural Networks for Water Demand Forecasting,” Water Resour. Res., vol. 58, no. 7, Jul. 2022. [34] Z. Hu, J. Zhang, and Y. Ge, “Handling Vanishing Gradient Problem Using Artificial Derivative,” IEEE Access, vol. 9, pp. 22371–22377, 2021. [35] K. Smagulova and A. P. James, “A survey on LSTM memristive neural network architectures and applications,” Eur. Phys. J. Spec. Top., vol. 228, no. 10, pp. 2313–2324, Oct. 2019. [36] X. Meng, M. Liu, and Q. Wu, “Prediction of Rice Yield via Stacked LSTM,” Int. J. Agric. Environ. Inf. Syst., vol. 11, no. 1, pp. 86–95, Jan. 2020. [37] F. Shahid, A. Zameer, and M. Muneeb, “Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM,” Chaos, Solitons & Fractals, vol. 140, p. 110212, Nov. 2020. [38] H. Wang, Y. Zhang, J. Liang, and L. Liu, “DAFA-BiLSTM: Deep Autoregression Feature Augmented Bidirectional LSTM network for time series prediction,” Neural Networks, vol. 157, pp. 240–256, Jan. 2023. [39] Q. Cheng, Y. Chen, Y. Xiao, H. Yin, and W. Liu, “A dual-stage attention-based Bi-LSTM network for multivariate time series prediction,” J. Supercomput., vol. 78, no. 14, pp. 16214–16235, Sep. 2022. https://doi.org/10.1109/ACCESS.2022.3193643 https://doi.org/10.1109/ACCESS.2022.3193643 https://doi.org/10.17977/um018v5i12022p53-66 https://doi.org/10.17977/um018v5i12022p53-66 https://doi.org/10.1007/s11277-022-10079-4 https://doi.org/10.1007/s11277-022-10079-4 https://doi.org/10.1109/IJCNN.2018.8489399 https://doi.org/10.1109/IJCNN.2018.8489399 https://doi.org/10.1016/j.ijforecast.2020.06.008 https://doi.org/10.1016/j.ijforecast.2020.06.008 https://doi.org/10.1007/s11042-022-12390-5 https://doi.org/10.1007/s11042-022-12390-5 https://doi.org/10.1016/j.procs.2022.01.003 https://doi.org/10.1016/j.procs.2022.01.003 https://doi.org/10.1016/j.rineng.2022.100365 https://doi.org/10.1016/j.rineng.2022.100365 https://doi.org/10.1016/j.rineng.2022.100365 https://doi.org/10.29099/ijair.v6i1.274 https://doi.org/10.29099/ijair.v6i1.274 https://doi.org/10.26555/ijain.v9i2.1092 https://doi.org/10.26555/ijain.v9i2.1092 https://doi.org/10.1109/ICAAIC53929.2022.9792708 https://doi.org/10.1109/ICAAIC53929.2022.9792708 https://doi.org/10.1109/ICAAIC53929.2022.9792708 https://doi.org/10.1109/IC2E357697.2023.10262559 https://doi.org/10.1109/IC2E357697.2023.10262559 https://doi.org/10.1109/IC2E357697.2023.10262559 https://doi.org/10.1109/GreenTech48523.2021.00049 https://doi.org/10.1109/GreenTech48523.2021.00049 https://doi.org/10.1007/s42979-023-01981-0 https://doi.org/10.1007/s42979-023-01981-0 https://doi.org/10.1007/s12517-022-10564-x https://doi.org/10.1007/s12517-022-10564-x https://doi.org/10.1186/s40537-022-00599-y https://doi.org/10.1186/s40537-022-00599-y https://doi.org/10.1016/j.renene.2015.03.038 https://doi.org/10.1016/j.renene.2015.03.038 https://doi.org/10.1016/j.procir.2019.02.106 https://doi.org/10.1016/j.procir.2019.02.106 https://doi.org/10.1016/j.fcij.2017.05.001 https://doi.org/10.1016/j.fcij.2017.05.001 https://doi.org/10.1007/s12046-018-1011-y https://doi.org/10.1007/s12046-018-1011-y https://doi.org/10.1007/978-3-031-17922-8_4 https://doi.org/10.15575/join.v7i2.858 https://doi.org/10.15575/join.v7i2.858 https://doi.org/10.3390/s20041099 https://doi.org/10.3390/s20041099 https://doi.org/10.23919/WAC.2018.8430387 https://doi.org/10.23919/WAC.2018.8430387 https://doi.org/10.36548/jaicn.2019.2.003 https://doi.org/10.36548/jaicn.2019.2.003 https://doi.org/10.1109/ICCUBEA.2018.8697857 https://doi.org/10.1109/ICCUBEA.2018.8697857 https://doi.org/10.3390/w12051500 https://doi.org/10.3390/w12051500 https://doi.org/10.1029/2022WR032299 https://doi.org/10.1029/2022WR032299 https://doi.org/10.1109/ACCESS.2021.3054915 https://doi.org/10.1109/ACCESS.2021.3054915 https://doi.org/10.1140/epjst/e2019-900046-x https://doi.org/10.1140/epjst/e2019-900046-x https://doi.org/10.4018/IJAEIS.2020010105 https://doi.org/10.4018/IJAEIS.2020010105 https://doi.org/10.1016/j.chaos.2020.110212 https://doi.org/10.1016/j.chaos.2020.110212 https://doi.org/10.1016/j.neunet.2022.10.009 https://doi.org/10.1016/j.neunet.2022.10.009 https://doi.org/10.1007/s11227-022-04506-3 https://doi.org/10.1007/s11227-022-04506-3 187 A. P. Wibawa et al. / Knowledge Engineering and Data Science 2023, 6 (2): 170–187 [40] C. Hu, S. Martin, and R. Dingreville, “Accelerating phase-field predictions via recurrent neural networks learning the microstructure evolution in latent space,” Comput. Methods Appl. Mech. Eng., vol. 397, p. 115128, Jul. 2022. [41] X. Wang, N. Xie, and L. Yang, “A flexible grey Fourier model based on integral matching for forecasting seasonal PM2.5 time series,” Chaos, Solitons & Fractals, vol. 162, p. 112417, Sep. 2022. [42] W. Sun and C. Huang, “A novel carbon price prediction model combines the secondary decomposition algorithm and the long short-term memory network,” Energy, vol. 207, p. 118294, Sep. 2020. [43] A. P. Wibawa, Z. N. Izdihar, A. B. P. Utama, L. Hernandez, and Haviluddin, “Min-Max Backpropagation Neural Network to Forecast e-Journal Visitors,” in 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Apr. 2021, pp. 052–058. [44] A. P. Wibawa, “Mean-Median Smoothing Backpropagation Neural Network to Forecast Unique Visitors Time Series of Electronic Journal,” J. Appl. Data Sci., vol. 4, no. 3, pp. 163–174, Sep. 2023. [45] Y. Yang, C. Yu, and R. Y. Zhong, “Generalized linear model-based data analytic approach for construction equipment management,” Adv. Eng. Informatics, vol. 55, p. 101884, Jan. 2023, doi: 10.1016/j.aei.2023.101884. https://doi.org/10.1016/j.cma.2022.115128 https://doi.org/10.1016/j.cma.2022.115128 https://doi.org/10.1016/j.chaos.2022.112417 https://doi.org/10.1016/j.chaos.2022.112417 https://doi.org/10.1016/j.energy.2020.118294 https://doi.org/10.1016/j.energy.2020.118294 https://doi.org/10.1109/ICAIIC51459.2021.9415197 https://doi.org/10.1109/ICAIIC51459.2021.9415197 https://doi.org/10.1109/ICAIIC51459.2021.9415197 https://doi.org/10.47738/jads.v4i3.97 https://doi.org/10.47738/jads.v4i3.97 https://doi.org/10.1016/j.aei.2023.101884 https://doi.org/10.1016/j.aei.2023.101884