Hybrid Model of Singular Spectrum Analysis and ARIMA for Seasonal Time Series Data CAUCHY โ€“Jurnal Matematika Murni dan Aplikasi Volume 7(2) (2022), Pages 302-315 p-ISSN: 2086-0382; e-ISSN: 2477-3344 Submitted: December 01, 2021 Reviewed: December 10, 2021 Accepted: December 23, 2021 DOI: http://dx.doi.org/10.18860/ca.v7i1.14136 Hybrid Model of Singular Spectrum Analysis and ARIMA for Seasonal Time Series Data Gumgum Darmawan1,2,*, Dedi Rosadi1, Budi N Ruchjana2 1Gadjah Mada University, Yogyakarta, Indonesia 2Padjadjaran University, Bandung, Indonesia *Corresponding Author Email: gumgum.darmawan@gmail.ugm.ac.id*, dedirosadi@gadjahmada.edu, budi.nurani@unpad.ac.id ABSTRACT Hybrid models between Singular Spectrum Analysis (SSA) and Autoregressive Integrated Moving Average (ARIMA) have been developed by several researchers. In the SSA-ARIMA hybrid model, SSA is used in the decomposition and reconstruction process, while forecasting is done through the ARIMA model. In this paper, hybrid SSA-ARIMA uses two auto grouping models. The purpose of this paper is to analyze seasonal data using the SSA-ARIMA hybrid by auto grouping. The first model namely the Alexandrov method and the second method is alternative auto grouping with long memory approach. The two hybrid models were tested for two types of seasonal pattern, multiplicative and additive seasonal time series data. The analysis results using both methods give accurate result; as seen from the MAPE generated the 12 observations for future, the value is below 5%. For additive seasonal pattern, The hybrid SSA-ARIMA method with Alexandrov auto grouping is more accurate (MAPE= 0.13%) than the hybrid SSA-ARIMA method with Alternative method but for multiplicative seasonal pattern the hybrid SSA-ARIMA with alternative auto grouping is more accurate (MAPE = 3.63%) than the hybrid SSA-ARIMA method with Alexandrov method. Keywords: ARIMA; Automatic Grouping; Long Memory Effect; Seasonal Pattern, Singular Spectrum Analysis INTRODUCTION Singular Spectrum Analysis (SSA) is a relatively new non-parametric method that has proved its capability in various time series types. Solving all these problems correspond to the so-called basic capabilities of SSA. Besides, the method has several extensions. First, the multivariate version of the method permits the simultaneous expansion of several time series data; see, for example, [1]. Second, the SSA ideas lead to several forecasting procedures for time series; see [2]. Third, SSA has been utilized for change-point detection in time series. The SSA technique has been used as a filtering method in [3]. Fifth, a family of the causality test based on the multivariate SSA technique has been introduced in [4]. Sixth, SSA can be applied for missing value imputation [5]. SSA can be applied in various disciplines, from mathematics and physics to economics and financial mathematics, meteorology and oceanography, to social sciences. http://dx.doi.org/10.18860/ca.v7i1.14136 mailto:gumgum.darmawan@gmail.ugm.ac.id mailto:dedirosadi@gadjahmada.edu mailto:budi.nurani@unpad.com Hybrid Model of Singular Spectrum Analysis and ARIMA for Seasonal Time Series Data G.Darmawan 303 For instance, in climatology ([6], [7], [8]) and biomedical data time series analysis [9]. Hybrid modeling of SSA in time series data has been carried out by many researchers. The hybrid model is carried out so that the advantages of two or more models make a positive contribution to the forecasting results. SSA hybrid model with other time series models includes ARIMA, Neural network, ARIMAX, PAR, VARIMAX, and others. [10], performed hybrid SSA with Neural Network.[11] perform the hybrid SSA-Algorithm Firefly-BP Neural Network process. [12] carried out a hybrid SSA model with ARMAX. [13] Combining the SSA model with PAR(p), this model was applied to wind speed data.[14], built the SSA-VARIMAX hybrid model and used it for climate data. The ARIMA model is often used as a comparison for the SSA model, such as [15], comparing SSA, ARIMA, and other time series models for tourism cases in various countries in Europe. The result has indicated that there is no good time series model for all tourism data. [16] compared SSA and ARIMA for predicting ambulance demand. The SSA-ARIMA hybrid model studied by [17] was applied to the annual Runoff data. [18], the SSA-ARIMA hybrid model was compared with the basic SSA and ARIMA models. The result showed that the SSA-ARIMA hybrid model was the most accurate. However, many of these papers do not discuss specific data forms (e.g., seasonal patterns), so we consider it necessary to examine this hybrid model for seasonal data. In this study, the SSA and the ARIMA were employed collectively to forecast two types of time series data. Both models run to get fast and accurate computation. In SSA, there are two methods of automatic grouping (Alexandrov and Alternative). The forecasting performance of the hybrid SSA-ARIMA model was compared between the two methods (alternative vs. Alexandrov). This paper contributes to the analysis of the seasonal patterns (additive and multiplicative) by the SSA-ARIMA hybrid. The purpose of this paper was to analyze seasonal data using the SSA-ARIMA hybrid by auto grouping for two types of seasonal patterns. This paper was organized as follows: The current section was an introduction where we briefly outlined the use of SSA and introduced our study. In the next section, the methods section, we described the detailed methodology of SSA and ARIMA, briefly outlined forecasting using a linear recurrent formula, identification of fractional differencing parameter, identification of hidden periodicities based on Periodogram and Automatic grouping on Alexandrov Method ([19], [20]) also alternative Automatic grouping [21]. This section also included a proposed algorithm for automatic hybrid SSA- ARIMA. In the results and discussion section, we demonstrated the abilities of hybrid SSA- ARIMA in real-time series data. In this part, we also investigated three types of time series data: Seasonal with no trend, multiplicative seasonal with the trend, and Additive seasonal with the trend. This section also discussed the comparison result between hybrid SSA-ARIMA with the Alexandrov method and hybrid SSA-ARIMA with an alternative method for real data analysis. METHODS Singular Spectrum Analysis The (non-parametric) SSA method has received a fair amount of attention in the literature. The first phase of SSA is the decomposition, where the time series are broken down into four components: trend, seasonal, cyclical, and noise. This phase consists of the Embedding and Singular Value Decomposition steps. The second phase, namely the Reconstruction phase, consists of Grouping and Diagonal Average process. The Hybrid Model of Singular Spectrum Analysis and ARIMA for Seasonal Time Series Data G.Darmawan 304 forecasting process can be done once the four stages have been completed. For the completeness of presentation of our method, we presented the complete phase of the SSA algorithm in the following section. Embedding The embedding step will transform one-dimensional time series ๏€จ ๏€ฉ1 2 TX = x , x , ....., x into multi-dimensional series 1 2 KX , X , ..., X with vectors ๐‘‹ = (๐‘‹๐‘–,๐‘‹๐‘–+1,๐‘‹๐‘–+2, . . ,๐‘‹๐‘–+๐ฟโˆ’1) ๐‘‡ โˆˆ ๐‘…๐ฟ , where ๐‘– = 1,2,โ€ฆ,๐พ, ๐พ = ๐‘‡ โˆ’ ๐ฟ + 1. The parameter window length L defines the embedding process, where 2 โ‰ค ๐ฟ โ‰ค ๐‘‡ โˆ’ 1 [22]. If we need to emphasize the size (dimension) of the vectors Xi, then we shall call them L-lagged vectors. The L-trajectory matrix (or simply the trajectory matrix) of the series X is defined as ๐‘‹ = [ ๐‘ฅ1 ๐‘ฅ2 โ‹ฏ ๐‘ฅ๐พ ๐‘ฅ2 โ‹ฎ ๐‘ฅ3 โ‹ฎ โ€ฆ โ‹ฏ ๐‘ฅ๐พ+1 โ‹ฎ ๐‘ฅ๐ฟ ๐‘ฅ๐ฟ+1 โ€ฆ ๐‘ฅ๐‘‡ ] (1) The lagged vectors Xi are the columns of the trajectory matrix X. Both the rows and column of X are sub-series of the original series. The (i,j) element of matrix X is ๐‘ฅ๐‘–๐‘— = ๐‘ฅ๐‘–+๐‘—โˆ’1 which yields that X has equal elements on the โ€˜antidiagonalsโ€™ i+j=const. Hence the trajectory matrix is a Hankel matrix. Singular Value Decomposition The second step, the SVD step, makes the singular value decomposition of the trajectory matrix X and represents it as a sum of rank-one bi-orthogonal elementary matrices. Set ๐‘† = ๐‘‹๐‘‹๐‘‡ and denoted by ๐œ†1,๐œ†2,โ€ฆ,๐œ†๐ฟthe eigenvalues of S taken in the decreasing order of magnitude (๐œ†1 โ‰ฅ ๐œ†2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œ†๐ฟ โ‰ฅ 0)and by U1, U2,โ€ฆ., UL the orthonormal system of the eigenvectors of the matrix S corresponding to these eigenvalues. ๐‘‘ = ๐‘š๐‘Ž๐‘ฅ{๐‘–,๐‘ ๐‘ข๐‘โ„Ž ๐‘กโ„Ž๐‘Ž๐‘ก ๐œ†๐‘– > 0} = ๐‘Ÿ๐‘Ž๐‘›๐‘˜ ๐‘‹ If we denote ๐‘‰๐‘– = ๐‘‹๐‘‡๐‘ˆ๐‘– โˆš๐œ†๐‘– , then the SVD of the trajectory matrix can be written as ๐‘‹ = ๐‘‹1 + ๐‘‹2 + โ‹ฏ+ ๐‘‹๐‘‘, where eigenvector Ui, Eigenvalues iฮป form matrix ๐‘‰๐‘– ๐‘‡๐‘‹. The three elements of SVD forming are called eigen triple. Grouping The purpose of this step is to appropriately identify the trend, the oscillatory components with different periods and noise. This step can be skipped if one does not want to extract hidden information by regrouping and filtering components precisely. The grouping procedure partitions the set of indices 1,2,โ€ฆ., L into m disjoint subsets ๐ผ = ๐ผ1, ๐ผ2,โ€ฆ,๐ผ๐‘š, so the elementary matrix in equation (2) is regrouped into m groups. Let ๐ผ = {๐‘–1, ๐‘–2,โ€ฆ, ๐‘–๐‘}. Then the resultant matrix Xi corresponding to the group i is defined as ๐‘‹๐‘– = ๐‘‹๐‘–1 + ๐‘‹๐‘–2+.. .+๐‘‹๐‘–๐‘. The matrices are computed for I1, I2,โ€ฆIm, and substituted into equation (2) to obtain the new expansion. The grouping process is the phase when the LxK matrix is grouped into several sub-groups, namely trend patterns, seasonal or periodic, and noise patterns. Here, in this paper, the patterns are identified by Fourier series analysis and long-memory analysis. Fourier series analysis is Hybrid Model of Singular Spectrum Analysis and ARIMA for Seasonal Time Series Data G.Darmawan 305 used to identify a seasonal pattern, and long memory series analysis is used to identify the differencing parameter of data. We use the GPH method [23] to identify the differencing parameter of time series. Diagonal Averaging The next step in Basic SSA transforms each resultant matrix of the grouped decomposition (3) into a new one-dimensional series of length N and is called diagonal averaging. Let Y denote a matrix with orde (LxK), with the elements ๏‚ฃ ๏‚ฃ ๏‚ฃ ๏‚ฃijy ,1 i L,1 j K , and define L* = min(L, K), K*=max(L, K), and T=L+K-1. Let ๐‘ฆ๐‘–๐‘— โˆ— = ๐‘ฆ๐‘–๐‘— if L