Microsoft Word - 00_tresc.docx DYNAMIC ECONOMETRIC MODELS Vol. 9 – Nicolaus Copernicus University – Toruń – 2009 Roman Huptas Cracow University of Economics Intraday Seasonality in Analysis of UHF Financial Data: Models and Their Empirical Verification A b s t r a c t. The aim of this paper is to outline the typical characteristics of the ultra-high- frequency financial data and to present estimation methods of intraday seasonality of trading activity. Ultra-high-frequency financial data (transactions data or tick-by-tick data) is defined to be a full record of transactions and their associated characteristics. We consider two nonparame- tric estimation methods: cubic splines and a Nadaraya-Watson kernel estimator of regression. Both approaches are compared empirically and applied to financial data of stocks traded at the Warsaw Stock Exchange. K e y w o r d s: financial UHF data, intraday seasonality, diurnal pattern, cubic splines, kernel estimation. 1. Introduction The last dozen or so years has witnessed mounting global interest in analy- ses of the microstructure of financial markets. Research into the microstructure of financial markets centres around explaining the process behind the shaping of the price of financial instruments and analyses of individual trade events. The impact of various transaction factors and mechanisms on the way in which in- struments prices are shaped was captured by the so-called theoretical micro- structure models. A review of those models and numerous issues collectively referred to as market microstructure effects was incorporated into O’Hara (1995) and Dacorogna et al. (2001) (cf. Doman, Doman, 2004; Bień, 2006). Analysis of financial market processes and empirical verification of hy- potheses arising from theoretical microstructure models were made possible by the newly-gained access to transactional databases. These databases became the source of specific financial time series referred to as ultra-high-frequency or tick-by-tick data. New modelling tools for financial time series anticipate specific qualities of transaction data. They include, above all, asynchronous distributions of obser- Roman Huptas 130 vations relative to time units and discrete price changes. Additionally, the indi- vidual events of the transaction process manifest themselves with varying fre- quency from one time period to another. Consequently, there may be – from one day to the next – certain repeat pattern of intensity with which transactions are concluded. In pertinent literature, this pattern is referred to as intraday seasonal- ity of durations, with durations or waiting times standing for the time spans between trade events. However, before one can use econometric models in analyses of ultra-high-frequency time series, it is essential to eliminate intraday seasonality, which strongly manifests itself in the time series. In estimating intraday seasonality use is mostly made of selected nonparametric statistical methods. The aim of this paper is to outline the typical characteristics of UHF finan- cial data and further to present modelling and estimation methods of intraday seasonality of transaction activities. Within the framework of these methods, two nonparametric approaches will be presented: cubic splines interpolation and kernel estimations of regression functions. Both approaches will be verified and compared empirically on the basis of data extracted from the Polish share mar- ket. 2. Characteristics of UHF Financial Data Several years ago a new term became operative – „ultra-high frequency data”, also known as „tick-by-tick data” or „transaction data”. These are time series composed of trade event features to which the exact time of their appear- ance was assigned. Thus observations are recorded asynchronously on time units. UHF financial data have a few characteristic qualities, which do not manifest themselves at lower frequencies. The characteristic features of time series of transaction data include: non- synchronous distribution of observations over time units, discrete transaction price changes, appearance of a number of transactions in the same single sec- ond, bid-ask bounce of transaction prices and intraday seasonality i.e. transac- tions reveal a daily periodic pattern. The most important quality of UHF financial data is the nonsynchronous i.e. erratic distribution of observations over time units. Those data can, for instance, be aggregated so that they correspond to equal sequential time units (multiples of minutes, hours, days) and then – in analyses – enable use of a whole range of the GARCH models. On the other hand, such aggregation of transaction data and their analysis as observations made and selected at equal intervals leads to a loss of information furnished by the transaction process itself. Transactions or changes in the share price do not happen at equal intervals. Consequently, the durations between transactions involving shares of a given company may pro- vide relevant information as to the intensity of their trading. Thus the assump- tion that changes in prices or transactions are equidistant in terms of time may Intraday Seasonality in Analysis of UHF Financial Data: Models … 131 cause us to draw false conclusions. The problem of nonsynchronous trading and relevant examples are dealt with more extensively in Tsay (2002, p. 207) (cf. Doman, Doman, 2004; Osińska, 2006). An alternative way of analysis of financial data distributed asynchronously over time units involves using the so-called transaction-time models (ACD, UHF-GARCH models etc). With those models, raw data are used for analysis. Owing to that, information inherent in the duration of the time between selected trade events can also come into focus. Duration analyses may furnish informa- tion on the microstructure of the financial market, affording a more accurate insight into various market interdependencies. In pertinent literature, the most frequently modelled durations between trade events are trade durations, price durations, volume durations (cf. Engle, Russell, 1997, p. 1149). 3. Intraday Seasonality of Durations Under normal economic conditions, transactions reveal a daily seasonality factor. It appears that the number of transactions is higher immediately after the opening of business than prior to the close of the session (when the time gaps between transactions are the shortest), and markedly smaller during midday hours, i.e. in the middle of the session (so- called “lunchtime effect” when dura- tions between transactions are also the longest). Thus, there exists a certain repetitious pattern of transaction intensity for each day. This is termed „intraday seasonality of durations”. Consequently, Engle and Russell (1997) recommend decomposition of duration into a deterministic component )( itφ depending on moment it of the commencement of a given duration, and a stochastic compo- nent ix̂ , which is free from the seasonality effect and which models process dynamics. Pertinent literature (cf. Engle, Russell, 1997) recommends that data be transformed as follows: , )( i i i t x x φ = ) (1) where: 1−−= iii ttx - duration between transactions at time it and 1−it , ix̂ - dura- tion purged of the seasonality effect, )( itφ - multiplied factor of intraday seasonality at time it . The seasonal factor )( itφ is construed as the average duration of each time unit during which we made data observations (most often denoting the average duration for each second). The diagram which illustrates intraday seasonality pattern, also known as diurnal pattern or time-of-day function mostly has the shape of the letter U turned upside down. In numerous situations researchers lack adequate information to fully spec- ify a parametric function of intraday seasonality. Despite the fact that intraday Roman Huptas 132 cyclicality is not the key issue of investigation, it still cannot be ignored, but much rather needs to be included in analyses. Thus, in order to estimate the time-of-day function use can be made of selected nonparametric statistical methods such as splines, Fourier series, neural networks, wavelet analyses or kernel methods. In most works on duration modelling use is made of cubic splines or kernel estimations. In pertinent literature the time-of the day function is determined most com- monly by means of splines. This is a method which allows smoothing of aver- age durations between events in subsequent time periods. Firstly, all durations during all the subsequent hours of the sessions on each day are averaged. Then a cubic spline with knots on every full hour of the session is determined. Knots correspond to previously determined average durations. This version of ap- proximation of the daily period factor was presented in the paper (Engle, Rus- sell, 1997). With a view to ensuring enhanced elasticity, the authors added a knot on the half-hour of the last hour of the session to capture the fast growing trade activity prior to the close of the stock market. A slightly different ap- proach to cubic spline approximation can be observed in paper (Bauwens, Giot, 2000, p. 135). The authors note that the intraday seasonality factor may vary from one day of the week to the other, i.e. the shape of the periodic factor for Monday can be distinct from that for Tuesday etc. Consequently, the estimation of the intraday seasonality function was conducted separately for each day of the week to allow for possible seasonality within the week. In the first step, durations for the subsequent half-hours of the session were averaged separately for each day of the week and then the parameters of cubic splines were esti- mated for knots at full hours and half hours. An alternative and second most commonly practised method of estimation of intraday seasonality function is the kernel estimation method. The intraday seasonality pattern is estimated as the Nadaraya-Watson kernel estimator of regression of raw durations on the time of the day (cf. Bauwens, Veredas, 2004, p. 398): ,)( 1 1 ∑ ∑ = = ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ − = n i n i n i n i i h tt K h tt Kx tφ where: t - number of seconds since the midnight of each day (or since the start of a session), ix - durations corresponding to moments it ( ix is a de- pendent variable), it - number of seconds since the midnight of each day (or since the start of a session) until the moment of a given transaction, K - kernel function, nh - bandwidth, s - standard deviation of sample it , n - number of observations. Intraday Seasonality in Analysis of UHF Financial Data: Models … 133 As far as the kernel function is concerned, the paper (Bauwens, Veredas, 2004, p. 398) makes use of the quartic kernel (with optimal bandwidth of 5/178,2 −sn ) which has the following shape: , 1||,0 1||,)1( 16 15 )( 22 ⎪⎩ ⎪ ⎨ ⎧ > ≤−= xdla xdlaxxK and in paper (Bauwens, Giot, 2002, p. 13) use is made of the gamma kernel function. In the case of paper (Bauwens, Veredas, 2004, p. 398) the estimation of the intraday seasonality function is made separately for each day of the week to incorporate possible seasonality arising from the transaction repetition patterns also over a week-long period. 4. Empirical Example The empirical verification of the methods presented above was carried out on the basis of time series involving trades in the shares of three companies listed in the WIG20 index: Telekomunikacja Polska S.A. (TPSA)/Polish Tele- com/, Agora S.A. (Agora) and CEZ S.A. (CEZ) over a period between 22 March 2009 and 25 June 2009. The analysis covers transactions closed during the continuous quotation phase. On the basis of such time series, durations be- tween each transaction were determined. Additionally, the time lags between the close of the session and the opening of next day’s trading were removed. Tabel 1. Descriptive statistics of transaction durations CEZ Agora TPSA Number of observations 13919 19183 65166 Mean 98.930 84.840 25.110 Standard deviation (SD) 224.720 176.560 43.640 Dispersion index ( =Mean/SD) 2.270 2.080 1.740 Minimum 1 1 1 Maximum 4196 4003 833 ACF(1) 0.220 0.225 0.212 ACF(2) 0.157 0.176 0.168 Q(5) 1805.430 2799.010 8948.820 Q(10) 2569.120 4111.970 13294.120 Q(15) 3056.570 5179.310 16546.610 Q(20) 3298.350 5866.790 19483.450 Note: ACF(k) – the value of the k-th order autocorrelation coefficient, Q(k) – the value of the Ljung-Box Q-statistic of k-th order, descriptive statistics in seconds. Roman Huptas 134 The basic descriptive statistics of transaction durations for the shares in question are illustrated in Table 1. In our example, we witness three companies experiencing different trading activity patterns. The majority of the transactions involved TPSA, for which the average duration between transactions is 25 sec- onds. CEZ reported the fewest transactions and the average transaction time is 99 seconds. Agora, in turn, is a company of average liquidity and the average duration is around 85 seconds. In an analysis of the features of the distribution of durations our attention is momentarily attracted to marked overdispersion, i.e. the standard deviation exceeds the mean. The dispersion indexes (the ratio standard deviation to mean) are generally very high, which may imply great dynamics of the series in question. It is worth noting that the greater the fre- quency of trades, the lower the value of the dispersion index. The values of the Q Ljung-Box statistics in Table 1 formally test the null hypothesis whereby there is no autocorrelation of durations respectively for the fifth, the tenth, the fifteenth and the twentieth order. Clearly, on the basis of the determined test statistics, the null hypothesis whereby there is no autocorrela- tion is definitely rejected for all three companies. Duration autocorrelation is thus extremely strong. Figure 1. Autocorrelation functions of transaction durations for CEZ, Agora and TPSA Graphs representing the autocorrelation function of the durations for the companies in question are to be found in Figure 1. Regardless of the company, the first values of the autocorrelation function are surprisingly low and stand at around 0.22. For CEZ shares the ACF function fairly soon shrinks to zero for the first several dozen delays, only to level off. As far as TPSA shares are con- cerned, the autocorrelation function takes much more time to decrease, ap- proximately at hyperbolic speed (rate), which is typical of long memory proc- esses. This evidences high „stability” (persistence) of the process. Moreover, the high values of lower order autocorrelations indicate stronger clustering of transaction activities. The dynamics of duration processes and the effect of clus- tering of transaction activities may be noted in figure 2 containing graphs of the series of the first 5000 observations. ACF - CEZ -0,05 0,00 0,05 0,10 0,15 0,20 0,25 0 100 200 300 400 500 Lags ACF - Agora -0,05 0,00 0,05 0,10 0,15 0,20 0,25 0 100 200 300 400 500 Lags ACF - T PSA -0,05 0,00 0,05 0,10 0,15 0,20 0,25 0 100 200 300 400 500 Lags Intraday Seasonality in Analysis of UHF Financial Data: Models … 135 Figure 2. Plots of trade durations for Agora and TPSA – first 5000 observations The existence of very powerful autocorrelation of durations may result from the midday seasonality of transactional activity. With this in mind, patterns of intraday seasonality were estimated by means of four methods, and further veri- fied empirically to determine whether the method selected translates into effec- tive elimination of high autocorrelation of the series under research. The follow- ing approaches, described extensively in point 3, were applied: 1. Nadaraya-Watson estimator of regression of the duration on the time of the day, determined separately for each day of the week (NW_days); 2. Nadaraya-Watson estimator of regression of the duration on the time of the day, ignoring possible seasonality over a week-long period (NW); 3. cubic splines with knots on every full hour and every half-hour of the ses- sion, determined separately for each day of the week (CS_days); 4. cubic splines with knots on every full hour and every half-hour of the ses- sion, ignoring possible seasonality over a week-long period (CS). In the event of kernel estimators, use was made of the quartic kernel with the optimal bandwidth of 5/178,2 −sn . Next, after the intraday seasonality factor was estimated, durations purged of the seasonality effect were determined pur- suant to formula (1). All algorithms and procedures were implemented using the GAUSS software. Figure 3. Intraday seasonality patterns for CEZ, Agora and TPSA Figure 3 depicts the shape of intraday seasonality patterns (NW and CS methods) for the three companies under analysis, without regard for the effect that the different days of the week have. Figure 4 shows plots of the estimated time-of-day functions for subsequent days of the week for CEZ, Agora and 0 2000 0 2000 4000 S e c o Observations Trade durations - Agora 0 1000 -1000 1000 3000 5000 S e c o Observations Trade durations - TPSA -20180 9S e c o Hours Intraday seasonality - CEZ N W C S -20180 9S e c o Hours Intraday seasonality - Agora N W C S 0 9S e c o Hours Intraday seasonality - TPSA N W C S Roman Huptas 136 TPSA respectively, obtained by means of the kernel estimator (NW) and cubic splines (CS). Figure 4. Intraday seasonality patterns for the days of the week for CEZ, Agora, TPSA The graphs presenting the estimated time-of-day functions have the shape of the letter U turned upside down and reveal unequivocally that durations are subject to daily seasonality. The durations between transactions are markedly shorter after the opening and before the close of the session than at midday. The extent of trading activity between 12:00 a.m. and 2:00 p.m. is noticeably less, due, amongst others, to the lunchtime effect. It should be noted that in the case of companies listed on the American and West European Stock markets, the effect is less manifest than in the case of their Australian counterpart, where the effect is more pronounced, i.e. the hump in the graph is manifestly spikier (cf. (Bauwens, Giot, 2001; Hautsch 2004)). Similarly, trading activity at the open- ing of the session is more intense as traders begin to accommodate the informa- tion of the night before (macroeconomic data, etc). Trading activity at the close of trade can be explained in terms of some investors’ attempt to close their open positions. It is worth noting that intraday seasonality will vary from one day of the week to the next (Figure 4). It seems that regardless of the type of company, trading activity is most intense on Tuesdays and Wednesdays than on any of the other days. Duration statistics purged of intraday seasonality by means of the four above-named methods were included in Table 2. Deseasonalisation of the data 20220 9S e c o Hours Intraday seasonality (NW) The days of the week - Agora Mo n Tue 20220 9 S e c o Hours Intraday seasonality (CS) The days of the week - Agora Mo n Tue -200 300 9 S e c o Hours Intraday seasonality (NW) The days of the week - CEZ Mo n Tue 0500 9S e c o Hours Intraday seasonality (CS) The days of the week - CEZ Mo n Tue -40 60 9 S e c o Hours Intraday seasonality (NW) The days of the week - TPSA Mo n Tue 0100 9S e c Hours Intraday seasonality (CS) The days of the week - TPSA Mo n Tue Intraday Seasonality in Analysis of UHF Financial Data: Models … 137 partly reduced the autocorrelation of transaction durations. From the point of view of effective elimination of seasonality impact on autocorrelation “meas- ured” in terms of the values of the Ljung-Box test statistics, the NW_days (Ta- ble 2) appears to be the most effective method. In the case of TPSA and Agora it was definitely the most successful, i.e. the values of test statistics are the low- est of all four methods used. On the other hand, in the case of CEZ company, it ranked number two, with the CS_days approach ranked the highest. Tabel 2. Descriptive statistics of adjusted transaction durations after deseasonalisation by means of the four methods Stock Method Mean SD Disp. Index ACF(1) Q(5) Q(10) Q(15) Q(20) CEZ NW_days 0.990 2.320 2.340 0.205 1360.150 1897.080 2101.860 2240.300 NW 0.990 2.200 2.220 0.215 1537.830 2111.380 2475.990 2676.680 CS_days 0.980 2.280 2.320 0.190 1290.340 1782.540 1972.950 2121.000 CS 0.970 2.130 2.190 0.209 1535.590 2137.890 2530.380 2759.580 Agora NW_days 0.990 1.980 2.000 0.215 2518.120 3521.420 4333.470 4886.400 NW 0.990 1.980 2.000 0.211 2675.970 3719.670 4600.870 5169.360 CS_days 0.990 1.950 1.970 0.209 2599.490 3632.710 4520.890 5097.310 CS 0.990 1.950 1.970 0.215 2553.890 3604.400 4480.420 5078.150 TPSA NW_days 0.990 1.660 1.670 0.195 7357.560 10568.660 12843.060 14864.480 NW 0.990 1.680 1.700 0.197 7614.770 10964.070 13401.270 15598.530 CS_days 0.990 1.670 1.690 0.197 7527.590 10874.810 13294.540 15400.190 CS 0.990 1.680 1.700 0.200 7793.400 11248.550 13802.000 16085.570 Note: SD – standard deviation; ACF(k) – the value of the k-th order autocorrelation coefficient, Q(k) – the value of the Ljung-Box Q-statistic of k-th order, descriptive statistics in seconds. An analysis of the values of Ljung-Box statistics implies that the NW_days and CS_days approaches should be used. Consequently, in eliminating intraday seasonality, the “day of the week” effect i.e. possible seasonality arising from variations in trading activity over the entire week should be taken into account. Regardless of the deseasonalisation method used, the values of Ljung-Box test statistics for the companies in question dropped by approximately 15%-25%, but still continued to be very high. Thus, the null hypothesis implying lack of autocorrelation continues to be rejected on each reasonable level of signifi- cance. This bears witness to the fact that the dynamics of transaction durations are influenced by factors other than the purely deterministic seasonality effect, which in turn is due to the structure of the share market. 5. Summary Based on the results of empirical data, the application of the kernel estima- tor of regression separately for each day of the week appeared to be the most effective method of elimination of intraday seasonality impact on the autocorre- lation of transaction durations. It is noteworthy, though, that the results for all analytical methods used are highly similar. In the case of splines, their prelimi- Roman Huptas 138 nary averaging of durations between events for the subsequent full or half-hours may become something of a drawback. The extent of data aggregation and the elasticity of the estimated function can be reduced by increasing the number of knots used in the spline. So it appears that the inclusion of intraday seasonality models in the base models is a natural step, and wins over earlier data filtrations and testing if the use of a two-step or one-step approaches will have identical impact on the quality of the estimators determined. References Bauwens, L., Giot, P. (2000), The Logarithmic ACD Model: An Application to the Bid-ask Quote Process of Three NYSE Socks, Annales d’Économie et de Statistique, 60, 117–149. Bauwens, L., Giot, P. (2001), Econometric Modelling of Stock Market Intraday Activity, Kluwer Academic Publishers, Boston. Bauwens, L., Giot, P. (2002), Asymmetric ACD Models: Introducing Price Information in ACD Models, CORE Discussion Paper 9844. Bauwens, L., Veredas, D. (2004), The Stochastic Conditional Duration Model: A Latent Variable Model for the Analysis of Financial Durations, Journal of Econometrics, 119, 381–412. Bień, K. (2006), Model ACD – podstawowa specyfikacja i przykład zastosowania (ACD Model – Basic Specification and Example of Application), Przegląd Statystyczny (Statistical Sur- vey), t.53, z. 3, 83-97. Dacorogna, M. M., Gençay, R., Müller, U., Olsen, R. B., Pictet, O. V. (2001), An Introduction to High-Frequency Finance, Academic Press, San Diego. Doman, M., Doman, R. (2004), Ekonometryczne modelowanie dynamiki polskiego rynku finan- sowego (Econometric Modelling of Dynamics of Polish Financial Market), Wydawnictwo AE w Poznaniu, Poznań. Engle, R. F., Russell, J. R. (1997), Autoregressive Conditional Duration: A New Model for Irre- gularly Spaced Transaction Data, Econometrica, 66, 1127–1162. Hautsch N. (2004), Modelling Irregularly Spaced Financial Data, Springer-Verlag, Berlin, Hei- delberg. O’Hara, M. (1995), Market Microstructure Theory, Blackwell Inc., Oxford. Osińska, M. (2006), Ekonometria finansowa (Financial Econometrics), PWE, Warszawa. Tsay, R.S. (2002), Analysis of Financial Time Series, Wiley Series in Probability and Statistics, John Wiley& Sons, New York. Wewnątrzdzienna sezonowość w analizie danych finansowych UHF: modele i ich empiryczna weryfikacja Z a r y s t r e ś c i. Celem artykułu jest krótkie przedstawienie cech charakterystycznych dla danych finansowych UHF oraz prezentacja metod modelowania i szacowania wewnątrzdziennej sezonowości aktywności transakcyjnej. W ramach tych metod są przedstawione dwa podejścia nieparametryczne: interpolacja za pomocą kubicznych funkcji sklejanych oraz estymacja jądrowa funkcji regresji. Oba prezentowane podejścia są zweryfikowane i porównane empirycznie na podstawie danych z polskiego rynku akcji. S ł o w a k l u c z o w e: dane finansowe UHF, wewnątrzdzienna sezonowość, funkcje sklejane, estymator Nadaraya-Watsona.