Microsoft Word - 5-11100-Harits-LE3 Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 2, No 2, December 2019, pp. 90–100 eISSN 2597-4637 https://doi.org/10.17977/um018v2i22019p90-100 ©2019 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Comparison of Indonesian Imports Forecasting by Limited Period using SARIMA Method Harits Ar Rosyid a, 1, Mutyara Whening Aniendya a, 2, Heru Wahyu Herwanto a, 3, * a Electrical Engineering Department, Universitas Negeri Malang Jl. Semarang No. 5, Malang 65145, Indonesia 1 harits.ar.ft@um.ac.id; 2 mutyaraaniendya@gmail.com; 3 heru_wh@um.ac.id * * corresponding author I. Introduction Indonesia is a country with rapid economic growth. Good economic growth is one of the national benchmarks capable of giving welfare to its people. Economic growth in Indonesia, especially for international trade in exports and imports, is one of the largest. Import is an activity to enter goods from another country into the Indonesian customs area. Import has three types of materials that are often needed by Indonesian society such as, consumption goods, raw materials, and capital goods. Based on data of the import trade balance of the Ministry of Trade of the Republic of Indonesia starting from January 2002 until July 2019, it shows the results of imports experiencing an unstable increase and decrease. Indonesian imports in January 2019 – July 2019 amounting to 111.88 billion USD or decreased by 9.89 % when compared with the results of the import of January 2018 – July 2018 amounting to 124,167 billion USD. The necessity for import that is still very high could decrease the Indonesian income because of the domestic payments abroad, while exports can add money because there is a purchase from domestic to overseas. If the valuation of import is higher than the export’s, it can threatens the Indonesian economy, especially the local businesses. For instance, recent imports of rice looked to be overly performed. This have caused the decision to exterminate tons of local rice products just to maintain the market price. When the balance rate of trade is unstoppable, inflation could act like a time bomb to the Indonesian economy. ARTICLE INFO A B S T R A C T Article history: Received 9 December 2019 Revised 13 December 2019 Accepted 13 December 2019 Published online 23 December 2019 The development of Indonesia's imports fluctuate over years. Inability to anticipate such rapid changes can cause economic slump due to inappropriate policy. For instance, recent years imports in rice led to the extermination of rice reserves. The reason is to maintain the market price of rice in Indonesia. To overcome these changes, forecasting the amount of imports should assist the Government in determining the optimum policy. This can be done by utilizing an algorithm to forecast time series data, in this case the amount of imports in the next few months with a high degree of accuracy. This study uses data obtained from the official website of the Indonesian Ministry of Trade. Then, Seasonal Autoregressive Integrated Moving Average (SARIMA) method is applied to forecast the imports. This method is suitable for the interconnected dependent variables, as well as in forecasting seasonal data patterns. The results of the experiment showed that 6-period forecast is the most accurate results compared to forecasting by 16 and 24 periods. The research resulted in the best model, that is ARIMA (0, 1, 3)(0, 1, 1)12 produces forecasting with a MAPE value of 7.210 % or an accuracy rate of 92.790 %. By applying this imports forecast model, the government can have a forward strategic plans such as selectively imports products and carefully decide the amount of the incoming products to Indonesia. Hence, it could maintain or improve the economic condition where local businesses can grow confidently. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Import dataset Forecasting model Limited period SARIMA MAPE H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 91 These inconsistent import developments can be anticipated by forecasting imports in the future periods. By using the assistance of a forecasting method, the result of forecasting can then be used by the Government as a consideration material to take a new policy or step in reducing the outcome of imports so that the economy in Indonesia is better. The main thing to note in forecasting is the level of accuracy of the methods done. Several research to forecast import result has been that is, Forecasting Iron Ore Import and Consumption of China Using Grey Model Optimized by Particle Swarm Optimization Algorithm [1]. This research concluded that proposed hybrid-model performs better than the results obtained by a single method such as basic GM(1,1), PSO-GM(1,1), or rolling GM(1,1). The PSO-rolling GM(1,1) approach to modeling iron ore imports and consumption in China is both reliable and efficient. The prediction accuracies of the proposed model for imports and consumption have reached 3.2 % and 2.3 % respectively. Then some research that has been done for forecasting using the ARIMA method has been carried out, among others Identifying an Appropriate Forecasting Model For Forecasting Total Import of Bangladesh [2]. The research produced the best model, which is ARIMA(0,1,1)(1,0,0)12 with an MSE value of 15747374 and MAPE value of 22.97802 %. Next Forecasting International Tourism Demand in Malaysia Using Box Jenkins SARIMA Application [3]. The research produced the best model, which is ARIMA(1,0,1) model with RMSE value is 0.2914, MAE value is 0.2075 and MAPE value of 1.4319 %. SARIMA is a development of ARIMA models that have seasonal patterns in their data. ARIMA is one of the forecasting models that fully ignores independent variables and uses dependent variables where data is interconnected. The advantages of the ARIMA method are being able to produce highly accurate forecasting in forecasting short-term, flexible and can represent a wide range of time series characters occurring in the short term, and can analyze random, trending, and seasonal data situations. Especially for data that has seasonal patterns such as Indonesian import data, the exact method is Seasonal Autoregressive Integrated Moving Average (SARIMA). Based on the problem in the import trade, this research will use SARIMA method to forecast Indonesian imports. The SARIMA method is chosen because it is capable of predicting time series data and generating high levels of accuracy for short-term forecasting. So by using SARIMA method is expected to produce good forecasting and become a step to the development of innovation and the establishment of a strategic plan in determining the ledge to reduce the outcome of imports. II. Materials and Methods The research is divided into 5 main phases (shown in Figure 1), namely data collection, preprocessing, model candidate determination, model assessment and evaluation, and best model determination. A. Data Collection The dataset used in this research is sourced from the official website kemendag.go.id. The Website of Ministry of Trade of the Republic of Indonesia (KEMENDAGRI) is a site that provides various information about trading in Indonesia, such as the development of exports and imports, trade balance, foreign exchange rate against rupiah, inflation, and other trading activities. It contains 211 Indonesia’s import data from 1996 until July 2019. The dataset has 5 attributes, which is Year, Total, Consumption Goods, Raw Material Support, and Capital Goods. Fig. 1. Research design 92 H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 B. Preprocessing 1) Attribute Removal Attribute Removal isa trivial process by eliminating or removing unused attributes for the forecasting process. The original data consists of time-series occasions of imports in Indonesia. However, forecasting import only requires the time attribute as the independent variable (x-axis) and import amount as the target output (y-axis), the remaining attributes were removed from the dataset. The removal process was done manually via spreadsheet application. In addition, the resulting dataset was converted to dd-mm-yyyy format. Then, the order of the dataset was reversed to an increasing time order. 2) Stationary Test Stationarity of a data means that the statistical attributes of the time-series data has not change over time. One can illustrate that there is a constant progression of the graph. It is similar to a linear model, but not a constant one. As time progresses, the linear function constantly changes. It has a constant slope, a value representing the rate of change. So, time series with seasonal occasions or trends are not stationary. In contrast, a stationary time series contains no foreseen patterns in the long-term. In this case, forecasting becomes impossible because wherever any point one observed, there exist relatively the same values. A stationary test is performed to determine whether the data is stationary or not [4]. Stationary tests can be performed in two ways. The first way is by viewing the graph of dataset, if the graph fits to a straight line or the average of a chart is close to zero then the dataset is already stationary. The second way is to see the Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) plots on the dataset. The ACF plot is used to measure the comparison between time series data and time lag. PACF plot is used to measure the amount between a variable and the time lag. If the plot of ACF and PACF display a change in the value between lag which is evident in the form of cut off and dies down then the dataset is stationary. 3) Differencing Differencing is a technique to make the time-series data stationary as a requirement for the SARIMA model. So, differencing only applies for the non-stationary time-series data. Differencing removes the series dependencies on time, this includes structures like trends and seasonality. A non stationary time series would not be suitable to be forecasted. Therefore, differencing is done by calculating the change or difference between the subsequent observations. The value of difference obtained is checked back whether it is stationary, otherwise, it will repeat the process. Equation (1) shows a formulate for the differenciation process between Yt and Yt-1 [5]. 𝑌 = 𝑌 − 𝑌 (1) More sequence differences are calculated in the same way. For example, the second sequence difference (d = 2) is only expanded to include the second lag of the series, as follows [5]. 𝑌 = 𝑌 − 𝑌 (2) The number of differencing processes that have been done will determine the order of the coefficient d which is then used to determine the candidate models such as Auto Regressive (AR) and Moving Average (MA). C. SARIMA Seasonal Autoregressive Integrated Moving Average (SARIMA) is a development of the ARIMA model that has seasonal patterns in their data. Seasonal patterns are patterns that experience a loop at each season, such as Weekly, Monthly, quarterly, yearly, and so on. ARIMA is a method developed by George Box and Gwilyn Jenkins in 1970 and commonly referred to as the Box-Jenkins Method [6][7]. ARIMA is one of the models used in time-series forecasting and its accuracy is recognisable for the short-term forecasting. ARIMA is a forecasting model that fully ignores independent variables and uses dependent variables where data is interconnected and has some assumptions that must be fulfilled such as autocorrelation, trend, or seasonal [8]. ARIMA uses its previous data values to produce accurate short-term forecasting. In the SARIMA model (p,d,q)(P,D,Q)S, parameters p and P indicates non-seasonal AR values and seasonal AR values, the parameters q and Q indicates non- H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 93 seasonal MA values and seasonal MA values, and parameter d indicates the differencing process in non-seasonal data for D in seasonal data [9]. The ARIMA method is divided into 4 groups, namely AR, MA, ARMA and ARIMA. Inserting a SARIMA model into data involves the following things four-step recurring cycle: (a) Identification of SARIMA structure (p, D, Q) (P, D, Q); (b) estimate the unknown parameters; (c) Perform tests on residual estimates; (d) Forecasting future results based on known data [10]. The SARIMA method is defined as data that has a repeating pattern within a fixed period of time. Since there are seasonal patterns, the models used by the mathematical ARIMA are ARIMA (p, D, Q) (P, D, Q)S with the formulae models (3) [11]. Φ(𝐵 )𝜙 (𝐵)(1 − 𝐵) (1 − 𝐵 ) 𝑋 = 𝜃 (𝐵)Θ (𝐵 )𝑒 (3) D. Model Candidate Determination The SARIMA model in the study has three orders namely p, d, and q for non-seasonal data while the three orders P, D, and Q are for seasonal data and the S order for the frequency of data used. The determination of SARIMA order candidates can be done by analyzing the plot of Autocorrelation function (ACF) and Partial Autocorrelation Function (PACF). The ACF Plot is used to measure the correlation between time series data and time-lag. The ACF Plot is used to indicate an Autoregressive (AR) or order (p, p) value. The PACF Plot is used to measure the amount of correlation between variables and time-lag after removing the linear dependency that is at the bottom lag. The PACF Plot is used to indicate the value of Moving Average (MA) or order (q, Q). The order value (d, D) is determined by the number of differencing processes performed in stationary data changes. As for the S order is determined by looking at the frequency of data used, weekly, monthly, yearly and others. The value of the order can be seen from the results of the plot ACF and the plot of PACF with the existence of dies down and cut off. The dies down pattern occur when the data decreases to close to a value of 0 slowly. While the cut off pattern occurs when the data is approaching a value of 0 at the initial lag or visible patterns of images have drastically decreased. Determining order value based on ACF and PACF plot conditions can be seen in Table 1. After the ACF and PACF plotting on each dataset, a white noise test is performed to determine if there is a residual between lag with the Ljung-Box model. If the resulting p-value is greater than α = 0.05 then the value meets the criteria of the white noise test. The Ljung-Box formula as follows [12]. Q = n(n + 2) (4) After the ACF and PACF plotting on each dataset, a white noise test is performed to determine if there is a residual between lag with the Ljung-Box model. If the resulting p-value is greater than α = 0.05 then the value meets the criteria of the white noise test. The Ljung-Box formula as follows [13]. AIC = −2 log 𝐿 + 2𝑉 (5) The smallest AIC value is a candidate for the selected SARIMA model for forecasting process. E. Testing Model for Prediction and Evaluation After obtaining SARIMA model candidates, the next step is to test each models. The testing process is divided into two stages: forecasting and evaluation. The models built in this experiment forecast imports in different periods: 6 (six) periods, 12 periods and 24 periods. Then, the evaluation calculates the error rate of each forecasting model using Mean Absolute Percentage Error (MAPE) [14]. Table 1. ACF and PACF plot criteria Model ACF Trend PACF Trend AR(p) Decreases exponentially Drastically decreased on certain lag MA(q) Drastically decreased on certain lag Drastically decreased on certain lag ARMA(p,q) Decreases exponentially Decreases exponentially 94 H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 MAPE is an alternative method used to measure the accuracy level of a forecasting model in a percentage unit (fraction). MAPE is an average of the overall percentage of error results from actual data and forecasting data. A low MAPE value indicates the resulting value is approaching its actual value. The MAPE formulae is shown in (6) [15]. MAPE = % (6) The test will be conducted using trial and error method, import dataset amounting to 211 data will be divided into training dataset and testing dataset (shown in Table 2). F. Best Model Determination MAPE scores of all forecasting model candidates act as the selection criteria of the forecasting model. The best model is the one with low MAPE scores (error rate) or high accuracy. III. Results and Discussions A. Preprocessing 1) Attribute Removal At this stage, only two out of five attributes are used: the year and the total. While the Consumption Goods, Raw Material Support, Capital Goods attributes are dismissed in forecasting process. Such removal process is trivial but the choice of attributes for selection was based on the forecasting target: the amount of yearly imports. Regarding the year attribute, reordering of the year was done to ensure it is consistent with the time-series x-axis. In addition, the year attribute needs reformatting from mm-yyyy to dd-mm-yyyy. A peek of final import dataset can be seen in Table 3. 2) Stationary Test A stationary test can be done in two ways, first by looking at the original data graph plot or viewing the graphic plot of the ACF data. FIgure 2 indicates that the data is not stationary because the graph do not fit to a straight line. From the image, it appears that the ACF shows a value that exceeds the line at the initial lags and decreases very slowly. From both tests, it can be ensured that the data has not been stationary. Table 3. Import dataset after attribute removal process Date Value 1/1/2002 2,087.90 1/2/2002 2,182.30 1/3/2002 2,362.71 1/4/2002 2,382.90 1/5/2002 2,498.09 1/6/2002 2,438.90 1/7/2002 2,646.30 1/8/2002 2,823.70 1/9/2002 2,860.20 1/10/2002 3,104.80 1/11/2002 2,955.90 1/12/2002 2,945.20 Table 2. Data distribution scenario Period Training Data Testing Data 6 January 2002 to January 2019 (205 Data) February 2019 to July 2019 (6 Data) 12 January 2002 to July 2018 (199 Data) August 2018 to July 2019 (12 Data) 24 January 2002 to July 2017 (187 Data) August 2017 to July 2019 (24 Data) H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 95 3) Differencing The next step is to differencing the data by using the diff () function of timeseries package in R. Differencing process is done once and Figure 3 shows the resulting ACF plot. From the graph in Figure 3, the import dataset is now in a stationary form. The ACF plot indicates the presence of significant changes showed by the boxed values but there is a repetition of seasonal patterns or patterns occurring. On the ACF plot, the seasonal pattern occurs nearly by the increment of 12, so the S value used is S = 12. Then, by re-differencing the seasonal lag to determine the candidate order model on the seasonal pattern. From the data graph and the ACF/PACF plot in Figure 4, the data has been changed to stationary. The data graph shows a straight chart at a value of 0 in the middle. It shows that the data is stationary and the ACF plot is also subjected to significant changes and does not exceed the line limit. B. Model Candidate Determination The next stage is the determination of the candidate order model p, q, P and Q via observation to the ACF and PACF plots. Based on Figure 4, the dataset graph shows a straight chart at a value of 0 in the middle. It shows the data is stationary and the ACF plot has also undergone significant changes. Determining candidate order models that do not have a seasonal pattern is done by looking at the initial lag (lag 1, 2, 3, and so on) while determining the candidate model on the data that has a seasonal pattern seen at lag 12, 24 and 36. Meanwhile, both seasonal and non-seasonal data only needs one differencing process, thus, the D value is 1. Based on the plot results of the ACF/PACF for the non-seasonal pattern, the cuts were taken off at Lag 1, 2 and 10. Hence, the PACF plot shows the dies down. Meanwhile, the ACF plot results for seasonal patterns shows no lag that exceeds the line, and on the PACF plot there is a line exceeding the 12th lag. From these results, candidate models for the order p and Q in the non-seasonal patterns are 1, 2, and 3, while for the candidate of the order P and Q in the non-seasonal pattern is 1 on the SARIMA model. Order D and D are 1, due to the differencing process was done once. Fig. 2. Graph and ACF plot on import dataset Fig. 3. ACF plot after differencing on import dataset 96 H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 From several candidates of the order of p, d, q, P, D, Q and S, the combination produce forecasting model candidates in a form of (1,1,0)(1,1,0)12, (2,1,0)(1,1,0)12, (3,1,0)(1,1,0)12, (1,1,0)(0,1,1)12, (2,1,0)(0,1,1)12, (3,1,0)(0,1,1)12, (0,1,1)(1,1,0)12, (0,1,2)(1,1,0)12, (0,1,3)(1,1,0)12, (0,1,1)(0,1,1)12, (0,1,2)(0,1,1)12, and (0,1,3)(0,1,1)12. 1) White Noise Test Assuming white noise is met when it meets the criteria, i.e. when p-value resulting from the Ljung- Box process is greater than α = 0.05 then the value meets the criteria [16]. The import dataset has a p- value value that can be seen in Table 4. Both datasets have a p-value that exceeds α = 0.05, so white noise assumptions are fulfilled. To display the p-value value of Ljung-box using the box test() function in the Rstudio application. 2) Akaike’s Information Criterion (AIC) The best models are models that have the smallest AIC value of all existing model candidates [17]. Comparison table of values on each candidate model can be seen in Table 5. From both stages of selection of the best models, it can be concluded that the best ARIMA models are ARIMA (0,1,3)(0,1,1)12, because each dataset has the smallest AIC value. Table 4. Import dataset of ARIMA model candidates’ Ljung-box value ARIMA Model Ljung-Box (1,1,0)(1,1,0)12 0.3167 (2,1,0)(1,1,0)12 0.7164 (3,1,0)(1,1,0)12 0.5898 (1,1,0)(0,1,1)12 0.1658 (2,1,0)(0,1,1)12 0.6858 (3,1,0)(0,1,1)12 0.6807 (0,1,1)(1,1,0)12 0.09715 (0,1,2)(1,1,0)12 0.4701 (0,1,3)(1,1,0)12 0.9819 (0,1,1)(0,1,1)12 0.07206 (0,1,2)(0,1,1)12 0.5218 (0,1,3)(0,1,1)12 0.9662 Fig. 4. Data graph and ACF/PACF plot on import dataset after differencing H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 97 C. Testing Model for Forecasting Testing was conducted with the training and test data specified in Table 2. The testing process is divided into two, i.e. forecasting and calculates error rate forecasting results. Forecasting is conducted to obtain forecasting of import results in several periods. After the forecasting results, then the calculation of error rate of each forecasting using MAPE. As for the calculation of MAPE using (10) and assisted by the MAPE() function of package MLmetrics on RStudio applications. The first testing phase is testing the model for forecasting the import results. Sample testing done on the model ARIMA(1,1,0)(1,1,0)12. The function used to commit forecasting is ARIMA(x, order = c(p,d,q), seasonal = (P,D,Q)). The results can be seen in Figure 5. The Model generates two values of AR coefficient and 1 value of AR coefficient for seasonal. The value of the coefficient will then be used for the subsequent period forecasting using the (7). Forecasting results using model ARIMA(1,1,0)(1,1,0)12 for the 6 future periods can be seen in Figure 6. To display forecasting Results Use the forecast () function. The output of the function is forecasting value, lower limit and upper limit of forecasting. Then calculate the error rate of the prediction result with the actual value using MAPE. To display the MAPE calculations using the MAPE () function and the result can be seen in Figure 7. On the other model, candidates are done the same testing and evaluation methods Table 5. Import dataset of ARIMA model candidates’ AIC value ARIMA Model AIC (1,1,0)(1,1,0)12 3274.87 (2,1,0)(1,1,0)12 3272.56 (3,1,0)(1,1,0)12 3268.29 (1,1,0)(0,1,1)12 3271.9 (2,1,0)(0,1,1)12 3263.48 (3,1,0)(0,1,1)12 3262.28 (0,1,1)(1,1,0)12 3282.45 (0,1,2)(1,1,0)12 3262.51 (0,1,3)(1,1,0)12 3260.28 (0,1,1)(0,1,1)12 3276.74 (0,1,2)(0,1,1)12 3256.99 (0,1,3)(0,1,1)12 3255.22 Fig. 5. ARIMA(1,1,0)(2,1,0)12 coefficient value Fig. 6. ARIMA(1,1,0)(1,1,0)12 forecasting results 98 H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 D. Evaluation of Forecasting Results After testing each model for prediction and getting the predicted result for the current period, the next step is to calculate the prediction error rate using MAPE. The final result obtained for each model can be seen in Table 6 for 6-period forecast model, Table 7 for 12-period forecast model, and Table 8 for 24-period forecast model The result of forecasting on the import dataset with 6 periods resulted in ARIMA(0,1,3)(0,1,1)12 as the best model with MAPE value of 7.516 % or an accuracy rate of 92.79 %. The result of forecasting on the import dataset with 12 periods resulted in a different model with the previous period of 6 periods. In this period resulted in the best two models because it produces the same MAPE model ARIMA(1,1,0)(0,1,1)12 and ARIMA(2,1,0)(0,1,1)12 with MAPE value of 16.029 % or an accuracy rate of 83.971 %. The forecasting of the import dataset with 24 periods resulted in different models with the previous period of 6 periods and 12 periods. This period resulted in the best model of ARIMA model (0,1,3)(1,1,0)12 with 9.526 % MAPE value or an accuracy rate of 90.474 %. From the test results, by adding the number of forecasting periods, it can be concluded that there is an increase in the MAPE value when compared with the short term shown in Table 9. It proves that the SARIMA method can do Short-term forecasting with a high degree of accuracy. Table 6. Import dataset of MAPE value for 6 periods forecasting Model ARIMA MAPE (1,1,0)(1,1,0)12 10.301 (2,1,0)(1,1,0)12 11.688 (3,1,0)(1,1,0)12 8.759 (1,1,0)(0,1,1)12 11.973 (2,1,0)(0,1,1)12 11.973 (3,1,0)(0,1,1)12 9.279 (0,1,1)(1,1,0)12 12.913 (0,1,2)(1,1,0)12 8.847 (0,1,3)(1,1,0)12 8.807 (0,1,1)(0,1,1)12 14.038 (0,1,2)(0,1,1)12 8.307 (0,1,3)(0,1,1)12 7.516 Table 7. Import dataset of MAPE value for 12 periods forecasting Model ARIMA MAPE (1,1,0)(1,1,0)12 23.530 (2,1,0)(1,1,0)12 22.658 (3,1,0)(1,1,0)12 22.085 (1,1,0)(0,1,1)12 16.029 (2,1,0)(0,1,1)12 16.029 (3,1,0)(0,1,1)12 17.452 (0,1,1)(1,1,0)12 24.445 (0,1,2)(1,1,0)12 22.360 (0,1,3)(1,1,0)12 23.374 (0,1,1)(0,1,1)12 19.267 (0,1,2)(0,1,1)12 18.445 (0,1,3)(0,1,1)12 19.875 Fig. 7. ARIMA(1,1,0)(2,1,0)12 MAPE value H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 99 E. Discussions Test results of 12 model candidates to forecast Indonesia’s import by 6 periods, 12 periods, and 24 periods produce interesting error rates. The 6-period forecast model be the best one with smallest MAPE of 7.210 %, that is ARIMA(0,1,3)(0,1,1)12 model. Interestingly, the 12-period forecast (ARIMA(1,1,0)(0,1,1)12 and ARIMA(2,1,0)(0,1,1)12 models) have MAPE values much larger than the shorter or longer period forecast models. This 12-period forecast model experienced a greater increase in MAPE value because the dataset has a high value in the last data. From these experiments, SARIMA is superior to short-term forecasting, this result is consistent with previous research [18] stating that the more periods produced from the dataset can lead to a higher accuracy of forecasting. Therefore, forecasting import in 6 periods (month) produced more set for forecasting, thus, a better accuracy IV. Conclusion This research produced a forecasting model for Indonesia’s import. In the experiments with the set of months as periodical forecast, the best result was obtained when forecasting the future 6 periods of imports. The best forecast for imported results is ARIMA(0,1,3)(0,1,1)12 because it produces the smallest MAPE value and AIC value. MAPE value for forecasting Indonesian imports is 7.210 % with an accuracy value of 92.79 % and AIC value of 3255.22. This research also proved that forecasting using SARIMA method is best used for short-term future trends for Indonesia’s imports. In this regards, the 6-period forecast should make the government to be more aware in their development planning and highly prepared with contingency planning in import policy. Therefore, the strategic plan to improve the local businesses can be accommodated by effective yet efficient imports as the supporting roles. In this research, the forecasting model development applied a hold-out validation method where the test set was the time series of the last period. Hence, it may not be the best (generic) method applied. Therefore, there is an open challenge to improve this research by applying the cross validation method. It is expected that by applying this method, a more generic forecasting model can be built. Table 8. Import dataset of MAPE value for 24 periods forecasting Model ARIMA MAPE (1,1,0)(1,1,0)12 10.035 (2,1,0)(1,1,0)12 9.982 (3,1,0)(1,1,0)12 9.930 (1,1,0)(0,1,1)12 15.322 (2,1,0)(0,1,1)12 15.322 (3,1,0)(0,1,1)12 15.175 (0,1,1)(1,1,0)12 10.376 (0,1,2)(1,1,0)12 9.663 (0,1,3)(1,1,0)12 9.526 (0,1,1)(0,1,1)12 13.579 (0,1,2)(0,1,1)12 15.383 (0,1,3)(0,1,1)12 15.236 Table 9. MAPE result comparison Period MAPE 6 period 7.210 % 12 period 16.029 % 24 period 9.526 % 100 H.A. Rosyid et al. / Knowledge Engineering and Data Science 2019, 2 (2): 90–100 Acknowledgement We thank everyone who contributed to the completion of this paper in one way or another. First of all, we thank God for the ability to do the job. We are also very grateful to my informants. Their identities cannot be published, but during our research we want to recognize and appreciate their support and accountability. We are also so grateful to my fellow students whose struggles and constructive critics are facing the search for new ideas. Lastly, we would like to thank PUI Disruptive Learning Innovation, Universitas Negeri Malang, for the intensive support and guidance for this research to run well. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no conflict of interest. Additional information No additional information is available for this paper. References [1] W. Ma, X. Zhu, and M. Wang, “Forecasting iron ore import and consumption of China using grey model optimized by particle swarm optimization algorithm,” Resour. Policy, vol. 38, no. 4, pp. 613–620, 2013. [2] T. Khan, “Identifying an Appropriate Forecasting Model for Forecasting Total Import of Bangladesh,” Int. J. Trade, Econ. Financ., vol. 2, no. 3, pp. 242–246, 2011. [3] Y. Ibrahim, Nanthakumar, and Loganathan, “Forecasting International Tourism Demand in Malaysia Using Box Jenkins Sarima Application,” South Asian J. Tour. Herit., vol. 3, no. 2, pp. 50–60, 2010. [4] T. S. Rao, and M. M. Gabr, "A test for linearity of stationary time series," Journal of time series analysis, vol. 1, no. 2, pp. 145-158, 1980. [5] Rob J Hyndman, “Forecasting: Forecasting: Principles & Practice,” no. September, p. 138, 2014. [6] E. B. Dagum, The X-II-ARIMA seasonal adjustment method. Ottawa: Statistic Canada, 1980. [7] G. Box, "Box and Jenkins: time series analysis, forecasting and control," In A Very British Affair, pp. 161-215. Palgrave Macmillan, London, 2013. [8] A. Qonita, A. G. Pertiwi, and T. Widiyaningtyas, “Prediction of rupiah against us dollar by using arima,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 4, no. September, pp. 746–750, 2017. [9] K. K. Sumer, O. Goktas, and A. Hepsag, “The application of seasonal latent variable in forecasting electricity demand as an alternative method,” Energy Policy, vol. 37, no. 4, pp. 1317–1322, 2009. [10] K. Y. Chen and C. H. Wang, “A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan,” Expert Syst. Appl., vol. 32, no. 1, pp. 254–264, 2007. [11] F. M. Tseng and G. H. Tzeng, “A fuzzy seasonal ARIMA model for forecasting,” Fuzzy Sets Syst., vol. 126, no. 3, pp. 367–376, 2002. [12] W. W. S. Wei, “Time Seried Analysis: Univariate and Multivariate Methods 2nd Edition.” Pearson Addison Wesley, New York, 2006. [13] E. J. Wagenmakers and S. Farrell, “AIC model selection using Akaike weights,” Psychon. Bull. Rev., vol. 11, no. 1, pp. 192–196, 2004. [14] M. V. Shcherbakov, A. Brebels, N. L. Shcherbakova, A. P. Tyukov, T. A. Janovsky, & V. A. E. Kamaev, "A survey of forecast error measures," World Applied Sciences Journal, vol. 24, no. 24, pp. 171-176, 2013. [15] A. de Myttenaere and Dkk, “Mean Absolute Percentage Error for regression models,” Neurocomputing, vol. 192, pp. 38–48, 2016. [16] R. Serra, and A. C. Rodríguez, "The Ljung-Box test as a performance indicator for VIRCs," International Symposium on Electromagnetic Compatibility-EMC EUROPE, IEEE, pp. 1-6, 2012. [17] T. W. Arnold, "Uninformative parameters and model selection using Akaike's Information Criterion." The Journal of Wildlife Management, vol. 74, no. 6, pp. 1175-1178, 2010. [18] T. Widiyaningtyas, Muladi, and A. Qonita, “Use of ARIMA Method to Predict the Number of Train Passenger in Malang City,” Proceeding - 2019 Int. Conf. Artif. Intell. Inf. Technol. ICAIIT 2019, pp. 359–364, 2019.