Microsoft Word - cet-01.docx


 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 46, 2015 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.aidic.it/cet 

Guest Editors: Peiyu Ren, Yancang Li, Huiping Song 
Copyright © 2015, AIDIC Servizi S.r.l., 
ISBN 978-88-95608-37-2; ISSN 2283-9216 

Tourist Arrivals Real-time Prediction Based on IOWA-Gauss 
Method 

Lin Chen, Maozhu Jin*, Yonghuan He 
Business school, Sichuan University, Chengdu, 610000, China. 
jinmaozhu@scu.edu.cn. 

Currently, the forecasts research focuses on tourism in a tourist trends and tourists influencing factors. 
Although the forecast for the inter-annual and seasonal tourists has a wealth of research results, there is less 
study of everyday and real-time tourist arrivals. This paper analyzes the tourists’ real-time arrival law of 
Jiuzhaigou, and then the use of hierarchical clustering and Gaussian fitting mathematical methods for data 
processing, study the changes of day arrivals of tourist by segment analysis, and proposed a new model for 
the prediction of tourists in real-time arrivals. We take Jiuzhaigou Valley as an example to analysis, and 
experimental results show that the forecast method is effective. 

1. Introduction  

The rapid development of the tourism industry has promoted local economic development, at the same time, 
over-exploitation and the too large tourism scale brings enormous pressure to ecological and environmental 
protection. Balance visitors scale, control of temporal and spatial distribution of tourists has become an 
important content of scenic area visitor management in peak travel period. However, tourists distribution are 
affected by season, time, weather, tourist type and other factors, so there is a big uncertainty. There are 
various methods about forecasting tourist arrival, for example, Cho (2003)find that the artificial neural 
networks to forecast tourist arrivals perform well in comparison to the exponential smoothing ;Gil-Alana 
(2005)forecast international monthly arrivals by using seasonal univariate long-memory processes ;Chu 
(2008) uses fractionally integrated ARMA models to forecast tourism arrivals; S Chen (2010) apply ANFIS 
model to forecast the tourist arrivals to Taiwan; H Song (2011) forecast quarterly tourist arrivals by using a 
new model ,the TVP-STSM. in addition, they also include ARIMA (Cho, 2003; Cang, 2011; Wan & Wang, 
2013), GARCH (Bollerslev,1986; Kim & Wong, 2006; Coshall, 2009), SSA (Beneki & Eeckels, 2012), but we 
cannot find any paper adopting a model to forecast tourist arrivals of real-time dynamic in the scenic area. 
This is need to study real-time dynamic prediction of tourist number in the scenic area by Spatiotemporal 
analysis of tourism carrying capacity. So, the purpose of this paper is to fill this gap, this paper combine 
GAUSS algorithm with IOWA and build a scenic area real-time prediction model, segmentationly analyse the 
change of visitors on the scenic area, which helps to improve the prediction method of the visitor number. 

2. Model construction 

This paper collects data through field investigation. Firstly process the data, find a general law of tourist 
arrivals. That is, the total number is gradually increasing, and the increasing speed is gradually slowing down, 
finally the number tends to be stable. Next cluster data by using hierarchical clustering method according to 
size and predict by using the Gaussian fitting algorithm. Then modify prediction model using weight calculation 
based on the improved IOWA operator. Obtain the final prediction results. Finally, analyse the prediction 
results. Figure 1 is the research model of this paper. 
 

DOI: 10.3303/CET1546071

 
Please cite this article as: Chen L., Jin M.Z., He Y.H., 2015, Tourist arrivals real-time prediction based on iowa-gauss method, Chemical 
Engineering Transactions, 46, 421-426  DOI:10.3303/CET1546071  

421


Figure 1：The research model 

3. Empirical study 

3.1 Data preprocessing 
Observing the number of tourists who arrive scenic entrance at each time, we find it is a nonlinear non-
stationary time series. However, if we integrally process the number of people of each moment, i.e accumulate 
the number of people at each time, Let  denote time (minute),  denote the number of tourists who arrive 
scenic entrance at each time,  denote the total number of tourists arrivals, then: 

 
3.2 Data clustering according to size 
The basic principle of Hierarchical clustering is: firstly, classify a certain number of samples (or variables) into 
their own classes, then classify two classes which have the closest properties into a new class, calculate the 
distance between the different classes under the new class, combine two classes which have the closest 
properties, repeat this process until all the samples (or variables) are combined into one class. This paper 
conduct hierarchical clustering based on number scale, hoping that the scale distance of data in a class is as 
close as possible, make the average distance of all items in the combined class is smallest .So we define 

 .That is Within-groups linkage method. 

3.3 Prediction model based on Gaussian fitting algorithm  

The days historical data is divided into a number of scales. This paper fits the curve according to different 
scales, gets a series of prediction models which have same structures, different parameters. In the actual 

fitting process, we do not require  strictly through all the point , but require the fitting error

 in point ix  is the smallest according to certain criteria. Often use the least squares approximation 
searching the best fitting curve method.  

This paper uses the gaussian function as a basic function of the curve. Namely, set  as a gaussian 
function system, where each gaussian function is determined by three parameters: peak height A, peak 
position B and peak width C. The entire gaussian function system is written as: 

 
3.4 Weight calculation forecasting based on improved IOWA operator 
In 2.3, we calculated the number of people prediction model under several scales. However, the size is not 
entirely consistent with the scale of the number of people arrival every day. When given the total arrival 
number on a day, we hope to be able to predict the number of each moment arrival. Therefore, appropriate 
weights can be given to the known sizes to obtain the prediction value of each actual size. 
The concrete steps are given for solving attribute weights according to this thought as follows: 

Step1: Calculate deviation distance  between discrete scale i and target scale j . , 

where  represents the size of scale i . 

Step 2: Calculate the weight.  

t
tN

tS

1

, 1, 2,...
m

t t
i

S N t m
=

= =

,
d ( , ) m in

i i j j

i j i j
Z C Z C

C C Z Z
∈ ∈

= −

( )y f x= ( , )i ix y

( ) iD f x y= −

( )y F x=

2

1
( ) ex p

n
i

i
i i

x B
F x A

C=

  −
 = ∗ −  
   



ijd , 1, 2, ...,ij i jd v v i m= − =

iv

2

2

1

, 1, 2, ...,iji m
ij

i

d
i m

d
ω

=

= =


422


Step 3: Sort order of the weight and the scale . 

Step 4: Calculate the prediction equation , where  represents the prediction equation for 

scale . 

4. Empirical analysis 

In order to carry out performance evaluation on the proposed prediction model, we conduct a number of 
empirical studies on data based on real-life scenarios to predict the number of visitor arrival on each day and 
each time. 

4.1 Data Sources 
Taking jiuzhaigou scenic area as an example, we have collected data in real-time daily visitor arrivals from 
May 2012 to August 2012(Collected from RFID, in minutes).Data collected from 7:00 until 13:00, altogether 
720 minutes. Tourists scale is seasonal and tourists scale is similar when the date is close. In order to take all 
scales into account as much as possible in the forecasting process, it is inappropriate to only let the previous 
data as the training data. So randomly select several days as the training data, assess model parameters, and 
the rest is used to test the accuracy of the model. 

4.2 Scale hierarchical clustering 
Conduct hierarchical clustering on the number of days of training data, stratified results are as the following 
table (20 layers). 

Table 1: Hierarchical clustering results 

layer 1 2 3 4 5 6 7 8 9 10 
date 5.10 

5.14 
5.21 
 

5.11 
5.13 
5.15 
… 

5.12 
5.20 
5.23 
… 

5.18 
5.26 
6.13 
… 

5.19 
5.25 
6.10 
… 

5.28 
5.29 
5.31 
… 

6.15 
6.16 
6.22 
… 

6.17 
7.5 

6.23 
7.11 
8.19 
… 

6.29 
6.30 

scale 11000 11800 12800 14187 13600 10000 16300 15600 21231 15000 
layer 11 12 13 14 15 16 17 18 19 20 
date 7.6 

7.7 
7.8 
… 

7.10 
7.19 
7.23 
… 

7.12 
7.16 
7.17 
…. 

7.13 
7.27 
8.6 

7.14 
7.15 
7.20 
… 

7.18 
7.30 
7.31 
… 

7.29 
8.14 
8.18 

8.3 
8.7 
8.8 
 

8.4 
8.17 

8.5 
8.10 
8.11 

scale 19000 20200 22000 25300 23500 22900 24200 26300 27700 28500 
 
The scale of formation classification is arithmetic mean of the all day data in a class, arrival number at each 
time under the new scale is also arithmetic mean of the data at each time, changing from 1×104 to 2.9×104 
spans in scale.  
The line graph of different sizes is as the following figure. The abscissa represents time (in minutes), the 
ordinate represents the total number of the arrival at each time. From lower to upper curve represents scale is 
increasing in turn. We find there is a following regularity: 
 

Figure 2: Visitor arrival curve 

The graph curve of different sizes exist a high degree of similarity and consistent trend. And this curve shows 
that the growth rate of the number of visitor arrivals increases along with time increased at the beginning. After 
a growth period, value of the growth rate gradually slow down and close to a certain limit.  

4.3 Segmented gaussian fitting 
Of common fitting equation, gaussian fitting have the highest fitting accuracy to this study. Therefore, the 
following data of different sizes were gaussian fitted respectively. After testing, 8 gaussian fitting errors is the 

( )1 2 1 2, ,..., , ...
T

m mω ω ω ω ω ω ω= > > > ( )1 2 1 2, ,..., , ...
T

j j mj j j mjd d d d d d d= < < <

1
, 1, 2,...,

m

j i i
i

f f i mω
=

= =
if

i

423


smallest in the gaussian fitting process. So we choose eight formula gaussian fitting as fitting function. 
Equation is expressed as follows:

 
Describe it in detail by using 1.18×105scale as an example. Figure 4 are the fitting curve and the residual 
curve and the residual is large, so we consider fragmenting to reduce the fitting residuals. 
 

Figure 3: The fitting and residual curve                                  Figure 4: The segmented curve 

In order to ensure the reasonableness of segmentation, we conduct first, second derivative on tourist arrivals 
growth curve. Shown in Figure 2 Introduce tourist destination tourists growth "speed" and "acceleration" 
concepts and its related variables. Divide it into three stages, and the segmentation points are 110 and 180. 
Use the Matlab software to fit, we can obtain fit equation. The parameters of the three segments of gaussian 
fits are as follows: 

Table 2: The three segments parameters of gauss equation 

Coefficients (with 95% confidence bounds) 
 The first segment (0-110) The second segment (110-180) The third segment (180-720) 

A1 667.3 155.4 1.17E+04 
B1 114.3 181.3 787.2 
C1 10.04 31.96 2056 
A2 -19.35 -7.944 20.7 
B2 105 163.2 227.9 
C2 0.6545 5.696 7.678 
…..    
A8 189.6 1.29E+04 265.3 
B8 105.3 190.8 374.5 
C8 3.899 82.77 245.4 

 
The first segment Goodness of fit: SSE: 2.215e+004; R-square: 0.9999; Adjusted R-square: 0.9999; RMSE: 
16.24. 
The second segment Goodness of fit: SSE: 4511; R-square: 1; Adjusted R-square: 1; RMSE: 10.13. 
The third segment Goodness of fit: SSE: 2699; R-square: 0.9998; Adjusted R-square: 0.9998; RMSE: 2.291. 
It shows that Gaussian fitting can pass the conformance testing and have high fitting precision to this study. 
Repeat the above experiment,20 parameters values of discrete scale equation can be obtained. Here is not to 
list. 

4.4 Calculation of weights 
The closer the scale is, the closer the curve graph is. To reduce the interference by the too large distance of 
the scales to the scale to be predicted, we conduct a secondary clustering, and cluster different scales into 
four types. The result is as the following table. 

Table 3: Secondary clustering results 

Classification Scale               
1 10000 11000 11800 
2 12800 13600 14187 15000 15600 16300 
3 19000 20200 21231 22000 22900 23500  

24200  
4 25300 26300 27700 28500 
 
This paper choose data from May 10, June 13, July 22, August 8 to predict the above four classifications 
respectively to test the validity of model prediction. 

2 2
1 1 1 2 2 2

x
* ( (( ) / ) ) * ( (( ) / ) )

F

A exp x B C A exp x B C− − + − −
（ ）=

424


from the historical data can be known, the tourist number on May 10, June 13, July 22 and August 8 are 
respectively 1.1044×104, 1.4301×104, 2.2126×104 and 2.6288×104 (unit: person) the four-day Predicting 
scales are respectively recorded as  , according to improved IOWA operator, we can get： 

(1)  ( 321  and 、 fff  respectively represent the gaussian fitting function 
formulas of 1.0000×104, 1.1000×104 and 1.1800×104 in scale) 

(2)  

(3)  (4)

 
4.5 Analysis of experimental results 
The following is the experimental results, the blue line is the accumulated value each time of the actual data, 
the green line is the prediction data. (a), (b), (c), (d) in the figure 5 respectively represent the real and 
predicted data on May 10, June 13, July 22, August 8;(a), (b), (c), (d) in the figure 6 respectively represent 
every minute error curve on May 10, June 13, July 22, August 8. 
 

(a)                                                                 (b) 

                
(c)                                                                         (d) 

Figure 5: Real data and predicted data 

            
(a)                                                           (d) 

              
(c)                                                            (d) 

Figure 6: Every minute error curve 

From Figure 6 (a), (b), (d) can be seen: every minute errors on May 10, July 22 and August 8 are respectively 
in 20 or less; in the (c) ,despite three errors per minute value on June 13 are other than 20, but the overall 
trend is in 20 or less. It shows the validity and reliability of the model our proposed. 

5. Conclusion and prospect 

This paper proposes a new research question about the tourists arrival prediction, and enunciates the 
necessity and importance of this research question from a more detailed time scale. i.e studies the number of 
tourists who reach the scenic area at each time per day. We findThe graph curve of different sizes exists a 
high degree of similarity and consistent trend and this curve shows that the growth rate of the number of 

1 2 3 4F F F F、 、 、

1 1 2 3=0.00116 +0.65524 +0.34359F f f f

2 4 5 6 7 8 9= 0.00021 + 0.31264 + 0.66492 + 0.00021 + 0.31264 + 0.0002F f f f f f f

3 10 11 12 13 14 15 16=0.00077 +0.03932 0.18215 +0.47983 +0.21122 +0.09270 +0.02942F f f f f f f f+

4 17 18 19 20=0.25356 +0.62228 +0.12414 +0.00002F f f f f

425


visitors arrival increase along with time increment at the beginning. After a growth period, value of the growth 
rate gradually slow down and close to a certain limit. In order to study this problem deeply, on the base of 
stage prediction theoretical model based on tourism theory and clustering theory, this paper build a phased 
Gaussian fitting and weights combined forecasting model to predict tourists arrival, and analyse and test the 
model using Jiuzhaigou scenic area as an example. Example verification shows that the proposed new model 
has a good predictive accuracy. 
 However, we do not take the errors after fitting into account in the research process. The error can be used to 
correct in the future to improve accuracy. Then due that we solve a new problem to forecast real-time tourist 
number in the scenic area. The proposed prediction model for this new research question does not conduct 
compared prediction with other model in tourist arrivals filed. This prediction model and prediction method will 
be improved in future studies. 

Acknowledgements 

This work was supported by the Major International Joint Research Program of the National Natural Science 
Foundation of China (Grant no. 71020107027), and the National Natural Science Foundation of China (Grant 
no. 71001075) and Humanities and Social Sciences project of The Ministry of education of China (Grant 
no.12YJC630023). 

References 

Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 
31(3), 307-327. DOI: 10.1016/0304-4076(86)90063-1 

Broomhead, D. S., & King, G. P., 1986, Extracting qualitative dynamics from experimental data. Physica D: 
Nonlinear Phenomena, 20(2-3), 217-236. DOI: 10.1016/0167-2789(86)90031-X 

Cang, S., 2011. A non-linear tourism demand forecast combination model. Tourism Economics, 17(1), 5e20. 
DOI: 10.5367/te.2011.0031 

Chen M. S., Ying L. C., Pan M. C., 2010, Forecasting tourist arrivals by using the adaptive network-based 
fuzzy inference system [J]. Expert Systems with Applications, 37(2): 1185-1191. DOI: 
10.1016/j.eswa.2009.06.032  

Chu, F. L., 2008, A fractionally integrated autoregressive moving average approach to forecasting tourism 
demand. Tourism Management, 29(1), 79-88. DOI: 10.1016/j.tourman.2007.04.003 

Coshall, J. T., 2009, Combining volatility and smoothing forecasts of UK demand for international tourism. 
Tourism Management, 30(4), 495-511. DOI: 10.1016/j.tourman.2008.10.010 

De Gooijer, J., Hyndman, R., 2006. 25 years of time series forecasting. International Journal of Forecasting, 
22 (3), 443–473. DOI: 10.1016/j.ijforecast.2006.01.001 

Gil-Alana L. A., 2005, Modelling international monthly arrivals using seasonal univariate long-memory 
processes[J]. Tourism Management, 26(6): 867-878. DOI: 10.1016/j.tourman.2004.05.003 

Kim, S. S., & Wong, K. K., 2006, Effects of news shock on inbound tourist demand volatility in Korea. Journal 
of Travel Research, 44(4), 457-466. DOI: 10.1177/0047287505282946 

Song H., Li G., 2008, Tourism demand modelling and forecasting—A review of recent research [J]. Tourism 
Management, 29(2): 203-220. DOI: 10.1016/j.tourman.2007.07.016 

Wan S. K., Wang S. H., Woo C. K., 2013, Aggregate vs. disaggregate forecast: case of Hong Kong. Annals of 
Tourism Research, 42, 434-438. DOI: 10.1016/j.annals.2013.03.002 

 
426