Copernican Journal of Finance & Accounting e-ISSN 2300-3065 p-ISSN 2300-12402015, volume 4, issue 2 Date of submission: August 7, 2015: date of acceptance: October 14, 2015. * Contact information: gediminas.zylius@ktu.edu, Department of Automation, Faculty of Electrical and Electronics Engineering, Kaunas University of Technology, Studentų g. 50 - 154, LT - 51367 Kaunas, Lithuania, phone: +370 642 04 808. Žylius G. (2015). Evaluation of ATM Cash Demand Process Factors Applied for Forecasting with CI Models. Copernican Journal of Finance & Accounting, 4(2), 211–235. http://dx.doi.org/10.12775/ CJFA.2015.025 Gediminas Žylius* Kaunas University of Technology evaluation of atm cash demand process factors applied for forecastinG with ci models Keywords: computational intelligence, regression, time series forecasting, cash mana- gement, data-based forecasting, daily cash f low. J E L Classification: C45, C53, G21. Abstract: The purpose of cash management is to optimize distribution of cash. Effec- tive cash management brings savings to retail banks that are related to: dormant cash reduction; reduced replenishment costs; decrease of cash preparation costs; reduction of cash insurance costs. Optimization of cash distribution for retail banking in ATM and branch networks requires estimation of cash demand/supply in the future. This estimation determines overall cash management efficiency: accurate cash demand estimation reduces bank overall costs. In order to estimate cash demand in the future, cash f low forecasting must be performed that is usually based on historical cash point (ATM or branch) cash f low data. Many factors that are uncertain and may change in time inf luence cash supply/de- mand process for cash point. These may change throughout cash points and are related to location, climate, holiday, celebration day and special event (such as salary days and sale of nearby supermarket) factors. Some factors affect cash demand periodically. Pe- riodical factors form various seasonality in cash f low process: daily (related to intra- day factors throughout the day), weekly (mostly related to weekend effects), monthly Gediminas Žylius212 (related to payday) and yearly (related to climate seasons, tourist and student arrivals, periodical celebration days such as New Year) seasons. Uncertain (aperiodic) factors are mostly related to celebration days that do not occur periodically (such as Easter), structural break factors that form long term or permanent cash f low shift (new shop- ping mall near cash point, shift of working hours) and some may be temporal (recon- struction of nearby building that restricts cash point reachability). Those factors form cash f low process that contains linear or nonlinear trend, mi- xtures of various seasonal components (intraday, weekly, monthly yearly), level shifts and heteroscedastic uncertainty. So historical data-based forecasting models need to be able to approximate historical cash demand process as accurately as possible pro- perly evaluating these factors and perform forecasting of cash f low in the future based on estimated empirical relationship.  Introduction The aim of this research is to study how cash f low process factors affect cash f low forecasting accuracy in ATM network, using computational intelligence methods as cash f low forecasting models when performing daily aggregated cash f low forecasting. For factor evaluation 8 typical (affected by different fac- tors) ATM cash withdrawal process f lows selected from real ATM network are used with historical period of 33 months. Previous studies of Automatic Tellec Machine (ATM) withdrawal cash f low (Rodrigues, Esteves 2010) show that this process is strongly affected by cal- endar factors (day-of-the-week, week-of-the-month, month-of-the-year and holidays). Those effects can be used as numerical input values with CI models. Classical neural network regression approach using calendar effects (working day, weekday, holiday effect, salary day effect) as inputs were used by Kumar and Walia (2006) for cash demand process forecasting. However, cash demand varies in time and simply using regression inputs without incorporating re- cent time series information is usually not enough. Simutis et al. (2007) applied more advanced f lexibe neural network approach by incorporating both regres- sion inputs (calendar effects) and time series inputs (such as value of aggregat- ed cash demand of previous several days). During last decade, support vector machines (SVM) for various computa- tional intelligence task was considered as alternative to neural networks be- cause of benefits such as decision stability, less overfitting and better gener- alization abilities when smaller training (historical) data is available. However Simutis, Dilijonas and Bastina (2008) show that application of support vector machines to cash demand forecasting process has no superiority to neural net- works and is even less accurate when reasonably long historical data period Evaluation of atm cash dEmand procEss factors… 213 for model training is available. As an alternative to neural network approach, interval type-2 fuzzy neural network (IT2FNN) was applied for cash demand forecasting (Darwish 2013). This type of model has both on-line structure and parameter learning abilities that lets model automatically adapt to different cash f low processes. CI models for forecasting are considered to be a more advanced approach. However, time series models are also used for cash f low process forecasting. When comparing classical econometric time series models researches show (Gurgul, Suder 2013) that SARIMA (seasonal autoregressive integrated mov- ing average) models are most accurate. It is shown (Wagner 2010) that SARI MA model even outperform joint forecasting approach using vector time series models. A comparison between time series probability density forecasting models (such as linear, autoregressive and structural and Markov-switching time series models) is made by Brentnall, Crowder and Hand (2010b) and re- sults show that Markov-switching density forecasting model performs best. Deeper investigation of cash demand process factors are also useful for forecasting. Random-effects models (Brentnall, Crowder & Hand 2008, 2010a) are used when modelling of individual client cash withdrawal patterns is made for ATM withdrawal forecasting. Laukaitis (2008) presented research on cash f low forecasting when intraday cash f low time series is treated as ran- dom continuous functions projected onto low dimensional subspace and ap- ply functional autoregressive model as predictor of cash f low and intensity of transactions. Forecasting models This section presents computational intelligence forecasting models that are applied for daily-aggregated ATM cash f low forecasting. Support vector regression model. Support vector regression (SVR) is appli- cation of support vector machines (SVM) model for regression problems. SVM was originally proposed by Cortes and Vapnik (1995) as a robust linear mod- el for classification problems. The idea behind SVM is to map input space data vectors to output space via high dimensional space called feature space. This mapping is performed using so called kernel trick. By doing so, linear nature of SVM model can be applied for nonlinear function approximation. The nonline- ar mapping requires nonlinear kernel function selection. Gaussian kernel func- tions are usually used because of few parameters. In this paper two types of Gediminas Žylius214 SVR models are applied: 1) 1) ν-support vector regression (ν-SVR) and 2) least squares support vector regression (LSSVR). ■ ν-SVR. Let input data vectors to be CI models for forecasting are considered to be a more advanced approach. However, time series models are also used for cash flow process forecasting. When comparing clas- sical econometric time series models researches show (Gurgul, Suder 2013) that SARIMA (seasonal autoregressive integrated moving average) models are most accurate. It is shown (Wagner 2010) that SARIMA model even outperform joint forecasting approach using vector time series models. A comparison between time series probability density forecast- ing models (such as linear, autoregressive and structural and Markov-switching time series models) is made by Brentnall, Crowder and Hand (2010b) and results show that Markov- switching density forecasting model performs best. Deeper investigation of cash demand process factors are also useful for forecasting. Random-effects models (Brentnall, Crowder & Hand 2008, 2010a) are used when model- ling of individual client cash withdrawal patterns is made for ATM withdrawal forecasting. Laukaitis (2008) presented research on cash flow forecasting when intraday cash flow time series is treated as random continuous functions projected onto low dimensional subspace and apply functional autoregressive model as predictor of cash flow and intensity of trans- actions. Forecasting models This section presents computational intelligence forecasting models that are applied for daily-aggregated ATM cash flow forecasting. Support vector regression model. Support vector regression (SVR) is application of support vector machines (SVM) model for regression problems. SVM was originally pro- posed by Cortes and Vapnik (1995) as a robust linear model for classification problems. The idea behind SVM is to map input space data vectors to output space via high dimen- sional space called feature space. This mapping is performed using so called kernel trick. By doing so, linear nature of SVM model can be applied for nonlinear function approxima- tion. The nonlinear mapping requires nonlinear kernel function selection. Gaussian kernel functions are usually used because of few parameters. In this paper two types of SVR models are applied: 1) 1) ν-support vector regression (ν-SVR) and 2) least squares support vector regression (LSSVR). – ν-SVR. Let input data vectors to be xiRn (i-th observation n-dimensional vector) and output (cash flow) data to be yiR1. Model optimization problem for ν-SVR algorithm is formulated by following equations (Chih-Chung & Chih-Jen 2011): (i-th observation n-dimensio- nal vector) and output (cash f low) data to be yiR 1. Model optimization problem for ν-SVR algorithm is formulated by following equations (Chih- -Chung & Chih-Jen 2011):                                   .0 ,0, , , subject to 1 2 1 min * * 1 * ,,,       ii ii T i iii T l i ii T bξ,ξ by yb ξ l C * xw xw ww w Where  ix is kernel function (Gaussian) that performs mapping of input space to high dimensional feature space (the space where linear regression is performed); w is a parameter vector of n-dimensional hyperplane; b is hyperplane bias parameter; ξi*, ξi are upper and lower training errors (slack variables) subject to ε – insensitive tube; C is a cost parameter, that controls the trade-off between allowing training errors and forcing rigid margins; ν is regularization parameter that controls parameter number of support vectors; l – is number of data points (observations). Data points that lie on the boundaries of ε – insensitive tube are called support vectors. Graphical illustration of ν-SVR model is depicted in Figure 1 Figure 1. An illustration of ν-SVR approximation principle In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih- Chung & Chih-Jen 2011). – LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to ν- SVR. Optimization problem is formulated by following equations (Suykens et al. 2002):                 .0 , subject to 22 1 min 1 2 ,,    iii T l i i T eb yeb e xw ww w Where ei are error variables and γ is regularization constant. Where                                   .0 ,0, , , subject to 1 2 1 min * * 1 * ,,,       ii ii T i iii T l i ii T bξ,ξ by yb ξ l C * xw xw ww w Where  ix is kernel function (Gaussian) that performs mapping of input space to high dimensional feature space (the space where linear regression is performed); w is a parameter vector of n-dimensional hyperplane; b is hyperplane bias parameter; ξi*, ξi are upper and lower training errors (slack variables) subject to ε – insensitive tube; C is a cost parameter, that controls the trade-off between allowing training errors and forcing rigid margins; ν is regularization parameter that controls parameter number of support vectors; l – is number of data points (observations). Data points that lie on the boundaries of ε – insensitive tube are called support vectors. Graphical illustration of ν-SVR model is depicted in Figure 1 Figure 1. An illustration of ν-SVR approximation principle In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih- Chung & Chih-Jen 2011). – LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to ν- SVR. Optimization problem is formulated by following equations (Suykens et al. 2002):                 .0 , subject to 22 1 min 1 2 ,,    iii T l i i T eb yeb e xw ww w Where ei are error variables and γ is regularization constant. is kernel function (Gaussian) that performs mapping of input space to high dimensional feature space (the space where linear regression is performed); w is a parameter vector of n-dimensional hyperplane; b is hy- perplane bias parameter; ξi *, ξi are upper and lower training errors (slack vari- ables) subject to ε – insensitive tube; C is a cost parameter, that controls the trade-off between allowing training errors and forcing rigid margins; ν is regu- larization parameter that controls parameter number of support vectors; l – is number of data points (observations). Data points that lie on the boundaries of ε – insensitive tube are called support vectors. Graphical illustration of ν-SVR model is depicted in Figure 1. Figure 1. An illustration of ν-SVR approximation principle                                   .0 ,0, , , subject to 1 2 1 min * * 1 * ,,,       ii ii T i iii T l i ii T bξ,ξ by yb ξ l C * xw xw ww w Where  ix is kernel function (Gaussian) that performs mapping of input space to high dimensional feature space (the space where linear regression is performed); w is a parameter vector of n-dimensional hyperplane; b is hyperplane bias parameter; ξi*, ξi are upper and lower training errors (slack variables) subject to ε – insensitive tube; C is a cost parameter, that controls the trade-off between allowing training errors and forcing rigid margins; ν is regularization parameter that controls parameter number of support vectors; l – is number of data points (observations). Data points that lie on the boundaries of ε – insensitive tube are called support vectors. Graphical illustration of ν-SVR model is depicted in Figure 1 Figure 1. An illustration of ν-SVR approximation principle In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih- Chung & Chih-Jen 2011). – LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to ν- SVR. Optimization problem is formulated by following equations (Suykens et al. 2002):                 .0 , subject to 22 1 min 1 2 ,,    iii T l i i T eb yeb e xw ww w Where ei are error variables and γ is regularization constant. S o u r c e : created by authors. Evaluation of atm cash dEmand procEss factors… 215 In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih-Chung & Chih-Jen 2011). ■ LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to ν-SVR. Optimization problem is formulated by following equ- ations (Suykens et al. 2002):                                   .0 ,0, , , subject to 1 2 1 min * * 1 * ,,,       ii ii T i iii T l i ii T bξ,ξ by yb ξ l C * xw xw ww w Where  ix is kernel function (Gaussian) that performs mapping of input space to high dimensional feature space (the space where linear regression is performed); w is a parameter vector of n-dimensional hyperplane; b is hyperplane bias parameter; ξi*, ξi are upper and lower training errors (slack variables) subject to ε – insensitive tube; C is a cost parameter, that controls the trade-off between allowing training errors and forcing rigid margins; ν is regularization parameter that controls parameter number of support vectors; l – is number of data points (observations). Data points that lie on the boundaries of ε – insensitive tube are called support vectors. Graphical illustration of ν-SVR model is depicted in Figure 1 Figure 1. An illustration of ν-SVR approximation principle In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih- Chung & Chih-Jen 2011). – LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to ν- SVR. Optimization problem is formulated by following equations (Suykens et al. 2002):                 .0 , subject to 22 1 min 1 2 ,,    iii T l i i T eb yeb e xw ww w Where ei are error variables and γ is regularization constant. Where ei are error variables and γ is regularization constant. Differently from ν-SVR, LS-SVR doesn’t use insensitive tube and is regular- ized only by parameter γ, so is not as sparse as ν-SVR model (has more support vectors). But least squares loss function brings other f lexibility benefits for re- gression problems. As for ν-SVR model, for LS-SVR model Gaussian kernel func- tions are used. In this research LS-SVR code is used from LS-SVMlab toolbox presented in website (see Pelckmans et al. 2002). Relevance vector regression model. Relevance vector regression (Tipping 2001) is model that has same linear functional form as support vector regression: Differently from ν-SVR, LS-SVR doesn’t use insensitive tube and is regularized only by parameter γ, so is not as sparse as ν-SVR model (has more support vectors). But least squares loss function brings other flexibility benefits for regression problems. As for ν- SVR model, for LS-SVR model Gaussian kernel functions are used. In this research LS-SVR code is used from LS-SVMlab toolbox presented in website (see Pelckmans et al. 2002). Relevance vector regression model. Relevance vector regression (Tipping 2001) is model that has same linear functional form as support vector regression: ������ � ���������� � �� � ��� . Where ������� is defined as kernel function and � is model weight vector. RVR uses even less support vectors (is more sparse than ν-SVR), that are called relevance vectors. Bayesian inference methodology is used during RVR model parameter and relevance vector determination. RVR uses EM-like (expectation-maximization) learning algorithm and applies a priori distributions (because of Bayesian methodology) over parameters. Gaussian kernel as in both LSSVR and ν-SVR cases are also used with RVR. In this research RVR implemented in SparseBayes package for MATLAB by RVR author himself (see Tipping 2001) is used. Feed-forward neural network model. Feed-forward neural network (FFNN) is the most popular artificial neural network architecture for nonlinear function approximation (classi- fication and regression) problems. The architecture of algorithm was inspired by biological neural networks (brains) and models the interconnection of neurons that can compute val- ues from inputs by feeding information through the network. FFNN model architecture is depicted in Figure 2. Figure 2. An illustration of feed-forward neural network architecture In this research, logarithmic sigmoid transfer function was used in the hidden layers and linear transfer function in the output layer. Logarithmic sigmoid transfer function output of Where Differently from ν-SVR, LS-SVR doesn’t use insensitive tube and is regularized only by parameter γ, so is not as sparse as ν-SVR model (has more support vectors). But least squares loss function brings other flexibility benefits for regression problems. As for ν- SVR model, for LS-SVR model Gaussian kernel functions are used. In this research LS-SVR code is used from LS-SVMlab toolbox presented in website (see Pelckmans et al. 2002). Relevance vector regression model. Relevance vector regression (Tipping 2001) is model that has same linear functional form as support vector regression: ������ � ���������� � �� � ��� . Where ������� is defined as kernel function and � is model weight vector. RVR uses even less support vectors (is more sparse than ν-SVR), that are called relevance vectors. Bayesian inference methodology is used during RVR model parameter and relevance vector determination. RVR uses EM-like (expectation-maximization) learning algorithm and applies a priori distributions (because of Bayesian methodology) over parameters. Gaussian kernel as in both LSSVR and ν-SVR cases are also used with RVR. In this research RVR implemented in SparseBayes package for MATLAB by RVR author himself (see Tipping 2001) is used. Feed-forward neural network model. Feed-forward neural network (FFNN) is the most popular artificial neural network architecture for nonlinear function approximation (classi- fication and regression) problems. The architecture of algorithm was inspired by biological neural networks (brains) and models the interconnection of neurons that can compute val- ues from inputs by feeding information through the network. FFNN model architecture is depicted in Figure 2. Figure 2. An illustration of feed-forward neural network architecture In this research, logarithmic sigmoid transfer function was used in the hidden layers and linear transfer function in the output layer. Logarithmic sigmoid transfer function output of is defined as kernel function and is model weight vector. RVR uses even less support vectors (is more sparse than ν-SVR), that are called relevance vectors. Bayesian inference methodology is used during RVR model parameter and relevance vector determination. RVR uses EM-like (ex- pectation-maximization) learning algorithm and applies a priori distributions (because of Bayesian methodology) over parameters. Gaussian kernel as in both LSSVR and ν-SVR cases are also used with RVR. In this research RVR implemented in SparseBayes package for MATLAB by RVR author himself (see Tipping 2001) is used. Gediminas Žylius216 Feed-forward neural network model. Feed-forward neural network (FFNN) is the most popular artificial neural network architecture for nonlinear function approximation (classification and regression) problems. The architecture of al- gorithm was inspired by biological neural networks (brains) and models the in- terconnection of neurons that can compute values from inputs by feeding infor- mation through the network. FFNN model architecture is depicted in Figure 2. Figure 2. An illustration of feed-forward neural network architecture Differently from ν-SVR, LS-SVR doesn’t use insensitive tube and is regularized only by parameter γ, so is not as sparse as ν-SVR model (has more support vectors). But least squares loss function brings other flexibility benefits for regression problems. As for ν- SVR model, for LS-SVR model Gaussian kernel functions are used. In this research LS-SVR code is used from LS-SVMlab toolbox presented in website (see Pelckmans et al. 2002). Relevance vector regression model. Relevance vector regression (Tipping 2001) is model that has same linear functional form as support vector regression: ������ � ���������� � �� � ��� . Where ������� is defined as kernel function and � is model weight vector. RVR uses even less support vectors (is more sparse than ν-SVR), that are called relevance vectors. Bayesian inference methodology is used during RVR model parameter and relevance vector determination. RVR uses EM-like (expectation-maximization) learning algorithm and applies a priori distributions (because of Bayesian methodology) over parameters. Gaussian kernel as in both LSSVR and ν-SVR cases are also used with RVR. In this research RVR implemented in SparseBayes package for MATLAB by RVR author himself (see Tipping 2001) is used. Feed-forward neural network model. Feed-forward neural network (FFNN) is the most popular artificial neural network architecture for nonlinear function approximation (classi- fication and regression) problems. The architecture of algorithm was inspired by biological neural networks (brains) and models the interconnection of neurons that can compute val- ues from inputs by feeding information through the network. FFNN model architecture is depicted in Figure 2. Figure 2. An illustration of feed-forward neural network architecture In this research, logarithmic sigmoid transfer function was used in the hidden layers and linear transfer function in the output layer. Logarithmic sigmoid transfer function output of S o u r c e : created by authors. In this research, logarithmic sigmoid transfer function was used in the hid- den layers and linear transfer function in the output layer. Logarithmic sigmoid transfer function output of input variable is equal to: input variable is equal to: . 1 1 log         ae y In this work, Levenberg – Marquardt (Hagan & Menhaj 1994) backpropagation (Rumelhart, Hinton & Williams 1986) training algorithm was used for feed-forward neural network model training, which is implemented in MATLAB Neural Networks Toolbox. Generalized regression neural network model. Generalized Regression Neural Network (GRNN) first proposed by Specht (1991) is special case of radial basis function (RBF) neural network. GRNN does not require an iterative training procedure (error back propagation as with other neural network architectures). GRNN training procedure requires specification of RBF spread parameter. It uses those functions to cover input space and approximates function as weighted linear combination of radial basis functions. Number of RBF function is equal to number of observations (number of days in historical daily cash flow). Each RBF is formed for each data point vector that is a center of RBF. RBF transfer function values are calculated according to Euclidean distance from the central point to input vector. In this research GRNN implemented in MATLAB Neural Network Toolbox is used. Adaptive neuro-fuzzy inference system model. Adaptive neuro-fuzzy inference system (ANFIS) (Jang 1993) combines fuzzy inference system and neural network features: neural network training capabilities (backpropagation) with fuzzy input and output formation (Takagi–Sugeno fuzzy inference system). An architecture of ANFIS model that has two membership functions is depicted in Figure 3. This type of model architecture has 5 layers: fuzzy layer (1), product layer (2), normalization layer (3), defuzzification layer (4) and summation layer (5). Figure 3. Illustration of ANFIS model architecture with two membership functions 1 layer 2 layer 3 layer 4 layer 5 layer 1w 1w 11fw X  F Y 2w 2w 22 fw A1 A2 B1 B2 For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: In this work, Levenberg – Marquardt (Hagan & Menhaj 1994) backpropa- gation (Rumelhart, Hinton & Williams 1986) training algorithm was used for feed-forward neural network model training, which is implemented in MATLAB Neural Networks Toolbox. Generalized regression neural network model. Generalized Regression Neu- ral Network (GRNN) first proposed by Specht (1991) is special case of radial basis function (RBF) neural network. GRNN does not require an iterative train- ing procedure (error back propagation as with other neural network architec- tures). GRNN training procedure requires specification of RBF spread param- eter. It uses those functions to cover input space and approximates function as weighted linear combination of radial basis functions. Number of RBF func- tion is equal to number of observations (number of days in historical daily cash f low). Each RBF is formed for each data point vector that is a center of RBF. RBF transfer function values are calculated according to Euclidean distance from the central point to input vector. In this research GRNN implemented in MAT LAB Neural Network Toolbox is used. Evaluation of atm cash dEmand procEss factors… 217 Adaptive neuro-fuzzy inference system model. Adaptive neuro-fuzzy infer- ence system (ANFIS) (Jang 1993) combines fuzzy inference system and neural network features: neural network training capabilities (backpropagation) with fuzzy input and output formation (Takagi–Sugeno fuzzy inference system). An ar- chitecture of ANFIS model that has two membership functions is depicted in Fig- ure 3. This type of model architecture has 5 layers: fuzzy layer (1), product layer (2), normalization layer (3), defuzzification layer (4) and summation layer (5). Figure 3. Illustration of ANFIS model architecture with two membership functions input variable is equal to: . 1 1 log         ae y In this work, Levenberg – Marquardt (Hagan & Menhaj 1994) backpropagation (Rumelhart, Hinton & Williams 1986) training algorithm was used for feed-forward neural network model training, which is implemented in MATLAB Neural Networks Toolbox. Generalized regression neural network model. Generalized Regression Neural Network (GRNN) first proposed by Specht (1991) is special case of radial basis function (RBF) neural network. GRNN does not require an iterative training procedure (error back propagation as with other neural network architectures). GRNN training procedure requires specification of RBF spread parameter. It uses those functions to cover input space and approximates function as weighted linear combination of radial basis functions. Number of RBF function is equal to number of observations (number of days in historical daily cash flow). Each RBF is formed for each data point vector that is a center of RBF. RBF transfer function values are calculated according to Euclidean distance from the central point to input vector. In this research GRNN implemented in MATLAB Neural Network Toolbox is used. Adaptive neuro-fuzzy inference system model. Adaptive neuro-fuzzy inference system (ANFIS) (Jang 1993) combines fuzzy inference system and neural network features: neural network training capabilities (backpropagation) with fuzzy input and output formation (Takagi–Sugeno fuzzy inference system). An architecture of ANFIS model that has two membership functions is depicted in Figure 3. This type of model architecture has 5 layers: fuzzy layer (1), product layer (2), normalization layer (3), defuzzification layer (4) and summation layer (5). Figure 3. Illustration of ANFIS model architecture with two membership functions 1 layer 2 layer 3 layer 4 layer 5 layer 1w 1w 11fw X  F Y 2w 2w 22 fw A1 A2 B1 B2 For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: S o u r c e : created by authors. For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be ex- pressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained: ■ 1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ 2nd layer. In this layer each node is fixed and represents weight of parti- cular rule. In this node AND operation is performed, which is product of inputs: Gediminas Žylius218 For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ 3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ 4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are es- timated during training process. Output of nodes are calculated as: For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ 5th layer. This is an output layer, where output value is calculated as a sum of all inputs: For a 1st order of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: 1) IF x is A1 AND y is B1 THEN f1 = p1x + q1y + r1; 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. Further each of five layer functionality is shortly explained:  1st layer. Forms output, which determines membership degree in each of membership functions (µA1, µA2, µB1, µB2): ��,� = ������, � = �,�, ��,� = ��������, � = �,�.  2nd layer. In this layer each node is fixed and represents weight of particular rule. In this node AND operation is performed, which is product of inputs: ��,� = �� = ������ � ������, � = �,�.  3rd layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: ��,� = ��� = �� �� � �� , � = �,�.  4th layer. This layer is not fixed as other and parameters (pi, qi, ri) are estimated during training process. Output of nodes are calculated as: ��,� = ����� = ������� � ��� � ���.  5th layer. This is an output layer, where output value is calculated as a sum of all inputs: ��,� = ������ � = ∑ ������∑ ��� . In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algorithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c-means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function initialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. In this research, two types of ANFIS model training algorithms are used: classical gradient steepest descend backpropagation and hybrid training algo- rithm. Hybrid training combines gradient descend backpropagation and least squares methods. Backpropagation is used to tune input layer membership function parameters, while least squares method is used for output function parameter tuning. For input layer membership function parameter initialization fuzzy c- means (FCM) clustering algorithms is used, that partitions data of the input space into some number (c) of clusters and use them as input membership function ini- tialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ Extreme learning machine model. Extreme learning machines (ELM) (Hu- ang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural ne- twork achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of le- arning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Evaluation of atm cash dEmand procEss factors… 219 Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: – Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: �� = ∑ ���(���� � ��)���� . Where �� is ith input vector; �� is weight vector connecting the jth hidden node and the input nodes; �� is weight scalar connecting jth hidden node and output node; �� is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: � = ��. Where � is � � � hidden layer output matrix and ���� = �(���� � ��). The solution of applying ELM theory is simply estimated as: � = ����� Where �� = (���)���� is Moore – Penrose generalized inverse (pseudoinverse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with historical period up to 990 days. Those 8 ATMs were selected from large database and represent typ- ical cash flow factors that occur in cash flow process. Further a short explanation of every ATM is conduced:  ATM number 1 contains cash flow with strong weekly seasonality factor;  ATM number 2 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is smooth;  ATM number 3 contains cash flow with strong monthly seasonality factor; . Where xi is wj th input vector; is weight vector connecting the jth hidden node and the input nodes; ßj is weight scalar connecting jth hidden node and output node; bj is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: – Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: �� = ∑ ���(���� � ��)���� . Where �� is ith input vector; �� is weight vector connecting the jth hidden node and the input nodes; �� is weight scalar connecting jth hidden node and output node; �� is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: � = ��. Where � is � � � hidden layer output matrix and ���� = �(���� � ��). The solution of applying ELM theory is simply estimated as: � = ����� Where �� = (���)���� is Moore – Penrose generalized inverse (pseudoinverse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with historical period up to 990 days. Those 8 ATMs were selected from large database and represent typ- ical cash flow factors that occur in cash flow process. Further a short explanation of every ATM is conduced:  ATM number 1 contains cash flow with strong weekly seasonality factor;  ATM number 2 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is smooth;  ATM number 3 contains cash flow with strong monthly seasonality factor; Where H is N x M hidden layer output matrix and – Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: �� = ∑ ���(���� � ��)���� . Where �� is ith input vector; �� is weight vector connecting the jth hidden node and the input nodes; �� is weight scalar connecting jth hidden node and output node; �� is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: � = ��. Where � is � � � hidden layer output matrix and ���� = �(���� � ��). The solution of applying ELM theory is simply estimated as: � = ����� Where �� = (���)���� is Moore – Penrose generalized inverse (pseudoinverse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with historical period up to 990 days. Those 8 ATMs were selected from large database and represent typ- ical cash flow factors that occur in cash flow process. Further a short explanation of every ATM is conduced:  ATM number 1 contains cash flow with strong weekly seasonality factor;  ATM number 2 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is smooth;  ATM number 3 contains cash flow with strong monthly seasonality factor; The solution of applying ELM theory is simply estimated as: – Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: �� = ∑ ���(���� � ��)���� . Where �� is ith input vector; �� is weight vector connecting the jth hidden node and the input nodes; �� is weight scalar connecting jth hidden node and output node; �� is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: � = ��. Where � is � � � hidden layer output matrix and ���� = �(���� � ��). The solution of applying ELM theory is simply estimated as: � = ����� Where �� = (���)���� is Moore – Penrose generalized inverse (pseudoinverse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with historical period up to 990 days. Those 8 ATMs were selected from large database and represent typ- ical cash flow factors that occur in cash flow process. Further a short explanation of every ATM is conduced:  ATM number 1 contains cash flow with strong weekly seasonality factor;  ATM number 2 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is smooth;  ATM number 3 contains cash flow with strong monthly seasonality factor; Where – Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation). Given N number of observations (xi, yi), single layer neural network output with M hidden nodes is modeled as: �� = ∑ ���(���� � ��)���� . Where �� is ith input vector; �� is weight vector connecting the jth hidden node and the input nodes; �� is weight scalar connecting jth hidden node and output node; �� is bias parameter of jth hidden node. In this research linear output nodes and sigmoid hidden nodes are used. Above equation can be written in vector form: � = ��. Where � is � � � hidden layer output matrix and ���� = �(���� � ��). The solution of applying ELM theory is simply estimated as: � = ����� Where �� = (���)���� is Moore – Penrose generalized inverse (pseudoinverse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with historical period up to 990 days. Those 8 ATMs were selected from large database and represent typ- ical cash flow factors that occur in cash flow process. Further a short explanation of every ATM is conduced:  ATM number 1 contains cash flow with strong weekly seasonality factor;  ATM number 2 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is smooth;  ATM number 3 contains cash flow with strong monthly seasonality factor; is Moore – Penrose generalized inverse (pseudoin- verse) matrix. A MATLAB implementation of classical ELM is used in this research, which is available at webpage (see Huang, Zhu & Siew 2006). Experimental data In this research 8 different typical ATM daily withdrawal data is used with his- torical period up to 990 days. Those 8 ATMs were selected from large database and represent typical cash f low factors that occur in cash f low process. Further a short explanation of every ATM is conduced: ■ ATM number 1 contains cash f low with strong weekly seasonality factor; ■ ATM number 2 contains cash f low with strong yearly and weekly seaso- nality factors, when yearly seasonality is smooth; ■ ATM number 3 contains cash flow with strong monthly seasonality factor; ■ ATM number 4 contains cash f low with strong yearly and weekly seaso- nality factors, when yearly seasonality is not smooth (temporal structu- ral breaks); ■ ATM number 5 contains cash f low with weak mixed (weekly and mon- thly) seasonality, increasing trend and heteroscedasticity factors with Gediminas Žylius220 permanent structural break factor (sudden cash f low break when ATM cash f low decreases); ■ ATM number 6 contains cash f low with mixed (weekly and monthly) seasonality and permanent structural break factors (sudden cash f low break when ATM cash f low increases); ■ ATM number 7 contains cash f low with weekly seasonality and temporal structural break factors (sudden cash f low break when ATM cash f low decreases and again increases to normal cash f low level after some pe- riod); ■ ATM number 8 contains cash f low with weekly seasonality, increasing trend and heteroscedasticity factors. Cash f low pictures together with autocorrelation functions are presented in Appendix for every ATM. Methodology In order to evaluate forecasting accuracy a specific metric is needed. In this re- search forecasting accuracy metric called symmetric mean absolute percent- age error (SMAPE) is used which is calculated with following formula:  ATM number 4 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is not smooth (temporal structural breaks);  ATM number 5 contains cash flow with weak mixed (weekly and monthly) season- ality, increasing trend and heteroscedasticity factors with permanent structural break factor (sudden cash flow break when ATM cash flow decreases);  ATM number 6 contains cash flow with mixed (weekly and monthly) seasonality and permanent structural break factors (sudden cash flow break when ATM cash flow increases);  ATM number 7 contains cash flow with weekly seasonality and temporal structural break factors (sudden cash flow break when ATM cash flow decreases and again in- creases to normal cash flow level after some period);  ATM number 8 contains cash flow with weekly seasonality, increasing trend and heteroscedasticity factors. Cash flow pictures together with autocorrelation functions are presented in Appendix for every ATM. Methodology In order to evaluate forecasting accuracy a specific metric is needed. In this research forecasting accuracy metric called symmetric mean absolute percentage error (SMAPE) is used which is calculated with following formula: ����� � 100� � |��� � ��| 1 2���� � ��� � ��� . Where ��� is forecasted cash flow value and �� is real cash flow value. Forecasting is performed by training model with one part of historical cash flow dataset once and testing (forecasting accuracy evaluation) is done with another part of historical cash flow dataset. Four cases of training for every ATM are performed (Figure 4): 1) mod- els are trained with 2 year (or 800 days for some ATMs) historical period; 2) models are trained with 1.5 year (or 400 days for some ATMs) historical period; 3) models are trained with 1 year (or 200 days for some ATMs) historical period; 4) models are trained with 0.5 year (or 100 days for some ATMs) historical period. But for testing, same amount of data is used for every training case. Where  ATM number 4 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is not smooth (temporal structural breaks);  ATM number 5 contains cash flow with weak mixed (weekly and monthly) season- ality, increasing trend and heteroscedasticity factors with permanent structural break factor (sudden cash flow break when ATM cash flow decreases);  ATM number 6 contains cash flow with mixed (weekly and monthly) seasonality and permanent structural break factors (sudden cash flow break when ATM cash flow increases);  ATM number 7 contains cash flow with weekly seasonality and temporal structural break factors (sudden cash flow break when ATM cash flow decreases and again in- creases to normal cash flow level after some period);  ATM number 8 contains cash flow with weekly seasonality, increasing trend and heteroscedasticity factors. Cash flow pictures together with autocorrelation functions are presented in Appendix for every ATM. Methodology In order to evaluate forecasting accuracy a specific metric is needed. In this research forecasting accuracy metric called symmetric mean absolute percentage error (SMAPE) is used which is calculated with following formula: ����� � 100� � |��� � ��| 1 2���� � ��� � ��� . Where ��� is forecasted cash flow value and �� is real cash flow value. Forecasting is performed by training model with one part of historical cash flow dataset once and testing (forecasting accuracy evaluation) is done with another part of historical cash flow dataset. Four cases of training for every ATM are performed (Figure 4): 1) mod- els are trained with 2 year (or 800 days for some ATMs) historical period; 2) models are trained with 1.5 year (or 400 days for some ATMs) historical period; 3) models are trained with 1 year (or 200 days for some ATMs) historical period; 4) models are trained with 0.5 year (or 100 days for some ATMs) historical period. But for testing, same amount of data is used for every training case. is forecasted cash f low value and  ATM number 4 contains cash flow with strong yearly and weekly seasonality fac- tors, when yearly seasonality is not smooth (temporal structural breaks);  ATM number 5 contains cash flow with weak mixed (weekly and monthly) season- ality, increasing trend and heteroscedasticity factors with permanent structural break factor (sudden cash flow break when ATM cash flow decreases);  ATM number 6 contains cash flow with mixed (weekly and monthly) seasonality and permanent structural break factors (sudden cash flow break when ATM cash flow increases);  ATM number 7 contains cash flow with weekly seasonality and temporal structural break factors (sudden cash flow break when ATM cash flow decreases and again in- creases to normal cash flow level after some period);  ATM number 8 contains cash flow with weekly seasonality, increasing trend and heteroscedasticity factors. Cash flow pictures together with autocorrelation functions are presented in Appendix for every ATM. Methodology In order to evaluate forecasting accuracy a specific metric is needed. In this research forecasting accuracy metric called symmetric mean absolute percentage error (SMAPE) is used which is calculated with following formula: ����� � 100� � |��� � ��| 1 2���� � ��� � ��� . Where ��� is forecasted cash flow value and �� is real cash flow value. Forecasting is performed by training model with one part of historical cash flow dataset once and testing (forecasting accuracy evaluation) is done with another part of historical cash flow dataset. Four cases of training for every ATM are performed (Figure 4): 1) mod- els are trained with 2 year (or 800 days for some ATMs) historical period; 2) models are trained with 1.5 year (or 400 days for some ATMs) historical period; 3) models are trained with 1 year (or 200 days for some ATMs) historical period; 4) models are trained with 0.5 year (or 100 days for some ATMs) historical period. But for testing, same amount of data is used for every training case. is real cash f low value. Forecasting is performed by training model with one part of historical cash f low dataset once and testing (forecasting accuracy evaluation) is done with another part of historical cash f low dataset. Four cases of training for every ATM are performed (Figure 4): 1) models are trained with 2 year (or 800 days for some ATMs) historical period; 2) models are trained with 1.5 year (or 400 days for some ATMs) historical period; 3) models are trained with 1 year (or 200 days for some ATMs) historical period; 4) models are trained with 0.5 year (or 100 days for some ATMs) historical period. But for testing, same amount of data is used for every training case. Evaluation of atm cash dEmand procEss factors… 221 Figure 4. Illustration of ATM cash f low data division Figure 4. Illustration of ATM cash flow data division In order to train model for forecasting, some inputs features that represent cash flow factors must be constructed. In this research 9 features are constructed as inputs for every forecasting model (i represents forecasting day index):  1st input. Month number (numbers 1-12);  2nd input. Day of the month (numbers 1-31);  3rd input. Day of the week (numbers 1-7);  4th input. Withdrawal amount of i-1 day (previous day);  5th input. Withdrawal amount of i-7 day (previous week day);  6th input. Withdrawal amount of i-14 day (previous 2nd week day);  7th input. Withdrawal amount of i-21 day (previous 3rd week day);  8th input. Withdrawal amount of i-28 day (previous 4th week day);  9th input. Sum of withdrawals of last 5 days (i-5 to i-1). Those features were selected after experimental studies. In order to evaluate what fea- tures (factors) are best for each particular type of ATM cash flow forecasting, one must perform forecasting with all combinations of features which using binomial formula is ∑ ���� = 2� − 1 = 2� − 1 = 511���� number of training cases. However, using knowledge from previous experimental studies features from 5 to 8 are considered as one set. So the combination set reduces from 9 to 6 and so number of combinations reduces from 511 to 63. CI forecasting models have specific external parameters that need to be adjusted proper- ly during training phase. In order to statistically minimize CI forecasting model overfitting and underfitting during training phase, a 10-fold cross-validation procedure is used with every forecasting model that needs parameter tuning. Tuned parameters during training phase are used for testing phase. S o u r c e : created by authors. In order to train model for forecasting, some inputs features that represent cash f low factors must be constructed. In this research 9 features are construct- ed as inputs for every forecasting model (i represents forecasting day index): ■ 1st input. Month number (numbers 1–12); ■ 2nd input. Day of the month (numbers 1–31); ■ 3rd input. Day of the week (numbers 1–7); ■ 4th input. Withdrawal amount of i-1 day (previous day); ■ 5th input. Withdrawal amount of i-7 day (previous week day); ■ 6th input. Withdrawal amount of i-14 day (previous 2nd week day); ■ 7th input. Withdrawal amount of i-21 day (previous 3rd week day); ■ 8th input. Withdrawal amount of i-28 day (previous 4th week day); ■ 9th input. Sum of withdrawals of last 5 days (i-5 to i-1). Those features were selected after experimental studies. In order to evalu- ate what features (factors) are best for each particular type of ATM cash f low forecasting, one must perform forecasting with all combinations of features which using binomial formula is number of training cases. However, using knowledge from previous experimental studies features from 5 to 8 are con- sidered as one set. So the combination set reduces from 9 to 6 and so number of combinations reduces from 511 to 63. CI forecasting models have specific external parameters that need to be adjusted properly during training phase. In order to statistically minimize CI forecasting model overfitting and underfitting during training phase, a 10-fold cross-validation procedure is used with every forecasting model that needs pa- rameter tuning. Tuned parameters during training phase are used for testing phase. Gediminas Žylius222 Results This section presents forecasting accuracy results. Forecasting accuracy re- sults averaged over all 7 forecasting models are presented in Table 1. The first row in the table represents ATM numbers according to their type, which is described in section 3. Second to fifth rows show average SMAPE (averaged over all model forecasts) for particular training type when inputs that yield best average SMAPE amongst all forecasting models and training types (aver- aged over all seven models and all four training types) are selected. Sixth row show those best average input numbers according to their type as described in section 4. Seventh row show best averaged over all models SMAPE when best input set for particular training type (not for all training types) is selected. Eighth row show those best input numbers for particular training type. Other rows show same kind of results for other training types. Forecasting results averaged for all forecasting models and sorted according to average of all fore- casting models and training types for each particular input type (sorted 63 in- puts) are depicted in Appendix. Further an explanation of forecasting results for each ATM is made: ■ ATM number 1: worst accuracy is achieved with shortest training histo- ry (0.5 year) mostly for all training inputs (see figure in Appendix). Tra- ining with long history (2 years) doesn’t improve average forecasting re- sults and best results are achieved with 1.5 year training – a trade-off between overfitting because of too short history and overfitting becau- se of learning patterns that do not reoccur in the future. Best average in- put set is weekly values of cash f low (1–4 weeks before, see Table 1). Ho- wever week number wasn’t selected as best input over all models, this shows that cash f low is non-stationary and recent history update of cash f low pattern improves forecasting even for strong weekly seasonality. ■ ATM number 2: as for ATM 1, worst results are achieved with shortest history training. Now long history is more significant than for ATM 1 and best results are achieved with 2 year training (mostly over all in- put sets, see figure in Appendix). However best average input set do- esn’t include month number and day of month in best input set. Week number, 1–4 previous weeks cash f lows are related to strong weekly seasonality and moving average (9th input) lets model more adaptively react to recent cash f low trend variations because of yearly seasonality (see Table 1). Evaluation of atm cash dEmand procEss factors… 223 ■ ATM number 3: results are not so affected by history period length as for ATM 2 case, but worst results are obtained using 0.5year training also. Month and day of the week numbers weren’t included in best input set, this shows that no significant weekly or yearly seasonality that may af- fect forecasting took place. Other included inputs are related to monthly periodicity and help to forecast changing monthly pattern. ■ ATM number 4: surprisingly as for yearly seasonality in ATM 2, month number inputs weren’t included in best input set, even though this type of ATM has not so smooth transition of yearly pattern. This might be explained by more strong recent cash f low value impacts (moving ave- rage, cash f low day before, or 1–4 previous week cash f lows) for sudden change in cash f low than yearly-related (month number, day of month) regression input effects even though yearly seasonality is strong, it has a quasi-periodic structure and values of last year only partially reoccur in current year on the same day. However as for ATM number 2, long hi- story is more necessary in order to forecast more accurately. ■ ATM number 5: interesting results are obtained for this kind of ATM. As expected, most accurate results for most of the inputs (see figure in Ap- pendix) are achieved when shortest history (after sudden cash f low pro- cess structural break occurs) is used. This is because cash f low training data after structural break concludes relatively larger part of training examples than for longer period training data. So model adapts to recent break. But it is interesting that this affect is only partial: the results are worse for 1.5 year training, but for 2 year training accuracy increases. This is explained integral part of ATM cash f low (increasing trend) and at the beginning model learns small level cash f low relationship, which is similar to that after the structural break. As expected, for this kind of ATMs with structural break, inputs that encode recent past are more ef- fective than regression inputs. ■ ATM number 6: even though this ATM has structural break as ATM num- ber 5, the structural break impact is less obvious. This might be related to the direction of structural break and break level difference. An expla- nation of forecasting results is more difficult: shortest history learning is not the best (but looking at figure in Appendix seems worst) because of overfitting (even though it contains history without structural break) because of too short history training. Long history contains structural break so this disturbs model training and overfittig also takes place. Day Gediminas Žylius224 of month input is included in best input set because of monthly seaso- nality, and 1–4 previous week inputs add to both weekly and monthly (because of 4 weeks) pattern forecasting. But moving average and pre- vious day cash f low inputs were not included. This might be because of too small structural break. However 1–4 previous week cash f low inputs contain recent past information together encoding weekly and monthly seasonality patterns. ■ ATM number 7: because temporal structural break was relatively short, 2 year training still performed best. 1 year training performed worst because full temporal break was included and concluded biggest part of training data compared to 2 or 1.5 year training cases and this lead to model overfitting. 0.5 year training didn’t include temporal structural break part, but didn’t perform best because of too short training history. Because of weekly seasonality, best input set included related inputs. ■ ATM number 8: looking at figure in Appendix it is seen that using 2 year history training is worst case for most input sets, so this means that if ATM has growing tendency, forgetting the past is good, but too few tra- ining values will not train the model with maximum accuracy, so proper decision for model training is needed because of short and long history training trade-off. Moving average input is selected to best input set and helps when with trend forecasting, weekly-related inputs correspond to weekly seasonality factors in ATM cash f low process. Table 1. Forecasting results (in SMAPE, %) over all forecasting models for every ATM ATM Number 1 2 3 4 5 6 7 8 2 Year/800 Days Training (Best Avg. Inputs) 40.64 59.84 66.30 55.00 83.27 39.30 50.94 43.04 1.5 Year/400 Days Training (Best Avg. Inputs) 39.88 61.53 67.02 56.70 85.38 40.21 51.16 43.09 1 Year/200 Days Training (Best Avg. Inputs) 40.55 64.66 70.05 57.35 80.89 39.76 51.80 42.60 0.5 Year/100 Days Training (Best Avg. Inputs) 40.95 67,07 69.96 57.27 82.48 42.61 51.24 43.26 Best Avg. Inputs 5 6 7 8 3 5 6 7 8 9 2 4 5 6 7 8 9 3 4 9 5 6 7 8 9 2 3 5 6 7 8 3 5 6 7 8 3 5 6 7 8 9 2 Year/800 Days Training (Best 2 Year/800 Days Inputs) 40.64 59.84 66.30 55.00 82.40 39.30 50.71 43.04 Evaluation of atm cash dEmand procEss factors… 225 ATM Number 1 2 3 4 5 6 7 8 Best 2 Year/800 Days Inputs 5 6 7 8 3 5 6 7 8 9 2 4 5 6 7 8 9 3 4 9 9 2 3 5 6 7 8 2 3 5 6 7 8 3 5 6 7 8 9 1.5 Year/400 Days Training (Best 1.5 Year/400 Days Inputs) 39.42 61.53 66.92 55.71 83.31 40.21 51.16 43.09 Best 1.5 Year/400 Days Inputs 3 5 6 7 8 3 5 6 7 8 9 2 4 5 6 7 8 3 9 9 2 3 5 6 7 8 3 5 6 7 8 3 5 6 7 8 9 1 Year/200 Days Training (Best 1 Year/200 Days Inputs) 40.55 64.66 67.24 56.72 80.89 39.76 51.74 42.60 Best 1 Year/200 Days Inputs 5 6 7 8 3 5 6 7 8 9 1 2 4 5 6 7 8 1 3 4 5 6 7 8 9 5 6 7 8 9 2 3 5 6 7 8 2 3 4 5 6 7 8 3 5 6 7 8 9 0.5 Year/100 Days Training (Best 0.5 Year/100 Days Inputs) 40.95 66.88 67.17 57.27 75.45 41.48 51.08 42.52 Best 0.5 Year/100 Days Inputs 5 6 7 8 3 4 5 6 7 8 9 2 3 4 3 4 9 1 2 3 2 4 5 6 7 8 2 3 5 6 7 8 3 4 5 6 7 8 S o u r c e : own studies.  Conclusions and future works After obtaining forecasting results for various types of ATM cash f lows two im- portant conclusions can be made: ■ Choosing model training history is very important. Using long history is useful if yearly seasonality factors are relatively important in ATM cash f low process. If cash f low is relatively stationary (as ATM number 1 and ATM number 3) and has strong monthly or weekly seasonality, using too much history will not increase forecasting accuracy. But if ATM cash f low process has structural breaks or various trend components, tra- ining history length must be selected carefully in order to avoid overfit- ting. ■ Proper input selection is even more important than training history pe- riod selection. Time-series inputs such as previous day cash f low value, moving average and 1–4 previous week cash f low values proved to be more efficient input features than regression inputs and contain time-va- rying information with recent history encoding. So calendar effect fore- casting inputs are not as important as time-series feature inputs. Purely regression inputs are effective only when stationarity of cash f low could be assumed. Combination of both may increase forecasting accuracy. Gediminas Žylius226 In future works an investigation of multiple-day-ahead forecasting will be performed, which is more practical for cash f low forecasting. However, multi- ple-day-forecasting is far more challenging research object with CI models, be- cause of various factors related to error accumulation, conditional probability estimation. Also, automatic statistical methods that would be useful for ATM type identification for input and history period selection will be investigated.  References Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2008). A statistical model for the tempo- ral pattern of individual automated teller machine withdrawals. Journal of the Royal Statistical Society: Series C (Applied Statistics). 57(1). 43–59. http://dx.doi. org/10.1111/j.1467-9876.2007.00599.x. Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2010a). Predicting the amount individuals withdraw at cash machines using a random effects multinomial model. Statistical Modeling. 10(2). 197–214. http://dx.doi.org/10.1177/1471082X0801000205. Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2010b). Predictive-sequential forecasting system developement for cash machine stocking. International Journal of Forecast- ing. 26. 764–776. Chih-Chung, C., & Chih-Jen, L. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2(27). 1–27. http://www. csie.ntu.edu.tw/~cjlin/libsvm (accessed: 29.03.2015). Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning. 20(3). 273– –297. http://dx.doi.org/10.1007/BF00994018. Darwish, S.M. (2013). A methodology to improve cash demand forecasting for ATM net- work. International Journal of Computer and Electrical Engineering. 5(4). 405–409. http://dx.doi.org/10.7763/IJCEE.2013.V5.741. Ekinci, Y., Lu, J.C., & Duman, E. (2015). Optimization of ATM cash replenishment with group-demand forecasts. Expert Systems with Applications. 42(7). 3480–3490. http://dx.doi.org/10.1016/j.eswa.2014.12.011. Gurgul, H., & Suder, M. (2013). Modeling of withdrawals from selected ATMs of the Eu- ronet network. AGH Managerial Economics. 13. 65–82. http://dx.doi.org/10.7494/ manage.2013.13.65. Hagan, M.T., & Menhaj, M. (1994). Training feed-forward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks. 5(6). 989–993. Huang, G.B., Zhu, Q.Y., & Siew, C.K. (2006). Extreme learning machine: theory and appli- cations. Neurocomputing. 70. 489–501. http://www.ntu.edu.sg/home/egbhuang/ elm_codes.html (accessed: 29.03.2015). Jang, J.S.R. (1993). ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics. 23(3). 665–685. http://dx.doi. org/10.1109/21.256541. Evaluation of atm cash dEmand procEss factors… 227 Kumar, P., & Walia, E. (2006). Cash forecasting: An application of artificial neural net- works in finance. International Journal of Computer Science & Applications. 3(1). 61–77. Laukaitis, A. (2008). Functional data analysis for cash f low and transactions intensity continuous-time prediction using Hilbert-valued autoregressive processes. Europe- an Journal of Operational Research. 185(3). 1607–1614. http://dx.doi.org/10.1016/j. ejor.2006.08.030. Pelckmans, K., Suykens, J.A.K., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., De Moor, B., & Vandewalle, J. (2002). LS-SVMlab : a Matlab/C toolbox for Least Squares Support Vector Machines. Internal Report 02-44, ESAT-SISTA, KU Leuven, Leuven, Belgium, viewed 14 May 2015. http://www.esat.kuleuven.be/sista/lssvmlab/old/ lssvmlab_paper0.pdf (accessed: 29.03.2015). Rodrigues, P., & Esteves, P. (2010). Calendar effects in daily ATM withdrawals. Econom- ics Bulletin. 30(4). 2587–2597. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning representations by back- propagation errors. Nature. 323. 533–536. Simutis, R., Dilijonas, D., & Bastina, L. (2008). Cash demand forecasting for ATM us- ing neural networks and support vector regression algorithms. Proceedings of the twentieth EURO mini conference on continuous optimization and knowledge-based technologies (EurOPT-2008). Neringa. Lithuania. 416–421. Simutis, R., Dilijonas, D., Bastina, L., & Friman, J. (2007). A f lexible neural network for ATM cash demand forecasting. Proceedings of the sixth WSEAS international con- ference on computational intelligence, man-machine systems and cybernetics (CIM- MACS 07). 162–165. Specht, D.F. (1991). A general regression neural network. IEEE Transactions on Neural Networks. 2(6). 568–576. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific. Singapore. Tipping M.E. (2001). Sparse Bayesian learning and the relevance vector machine. Jour- nal of Machine Learning Research. 1. 211–244. http://www.miketipping.com/ sparsebayes.htm (accessed: 29.03.2015). Wagner M. (2010). Forecasting daily demand in cash supply chain. American Journal of Economics and Business Administration. 2(4). 377–383. http://dx.doi.org/10.3844/ ajebasp.2010.377.383. Simutis, R., Dilijonas, D., & Bastina, L. (2008). Cash demand forecasting for ATM using neural networks and support vector regression algorithms. Proceedings of the twentieth EURO mini conference on continuous optimization and knowledge-based technologies (EurOPT-2008). Neringa. Lithuania. 416–421. Simutis, R., Dilijonas, D., Bastina, L., & Friman, J. (2007). A flexible neural network for ATM cash demand forecasting. Proceedings of the sixth WSEAS international conference on computational intelligence, man- machine systems and cybernetics (CIMMACS 07). 162–165. Specht, D.F. (1991). A general regression neural network. IEEE Transactions on Neural Networks. 2(6). 568–576. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific. Singapore. Tipping M.E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research. 1. 211–244. http://www.miketipping.com/sparsebayes.htm (accessed: 29.03.2015). Wagner M. (2010). Forecasting daily demand in cash supply chain. American Journal of Economics and Business Administration. 2(4). 377–383. http://dx.doi.org/10.3844/ajebasp.2010.377.383. Appendix Day Sequence Number 0 100 200 300 400 500 600 700 800 900 0 0.5 1 ATM Number 1 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 10 20 30 40 50 60 40 45 50 55 60 65 ATM Number 1 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 2 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 2 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training appendix Sequence Number of Mean-Sorted Feature Set 10 20 30 40 50 60 40 45 50 55 60 65 ATM Number 1 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 2 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 2 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Sequence Number of Mean-Sorted Feature Set 10 20 30 40 50 60 40 45 50 55 60 65 ATM Number 1 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 2 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 2 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 3 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 65 70 75 80 85 90 95 100 105 ATM Number 3 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 4 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 3 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 65 70 75 80 85 90 95 100 105 ATM Number 3 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 4 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 3 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 65 70 75 80 85 90 95 100 105 ATM Number 3 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 4 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 4 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 5 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 70 80 90 100 110 120 130 140 150 160 170 ATM Number 5 Forecasting Results 800 days training 400 days training 200 days training 100 days training Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 4 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 5 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 70 80 90 100 110 120 130 140 150 160 170 ATM Number 5 Forecasting Results 800 days training 400 days training 200 days training 100 days training Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 60 70 80 90 100 110 120 130 ATM Number 4 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 5 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 70 80 90 100 110 120 130 140 150 160 170 ATM Number 5 Forecasting Results 800 days training 400 days training 200 days training 100 days training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 6 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 35 40 45 50 55 60 65 70 75 ATM Number 6 Forecasting Results 800 days training 400 days training 200 days training 100 days training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 7 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 6 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 35 40 45 50 55 60 65 70 75 ATM Number 6 Forecasting Results 800 days training 400 days training 200 days training 100 days training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 7 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 6 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 35 40 45 50 55 60 65 70 75 ATM Number 6 Forecasting Results 800 days training 400 days training 200 days training 100 days training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 7 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 52 54 56 58 60 62 64 66 68 70 ATM Number 7 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 8 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 40 45 50 55 60 65 ATM Number 8 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 52 54 56 58 60 62 64 66 68 70 ATM Number 7 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 8 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 40 45 50 55 60 65 ATM Number 8 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 50 52 54 56 58 60 62 64 66 68 70 ATM Number 7 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training Day Sequence Number 0 100 200 300 400 500 600 700 800 900 1000 0 0.5 1 ATM Number 8 Cash Flow Lag 0 50 100 150 200 250 300 350 400 450 500 -0.5 0 0.5 1 Sample Autocorrelation Function Sequence Number of Mean-Sorted Feature Set 0 10 20 30 40 50 60 70 40 45 50 55 60 65 ATM Number 8 Forecasting Results 2 year training 1.5 year training 1 year training 0.5 year training