INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 17, Issue: 6, Month: December, Year: 2022
Article Number: 4952, https://doi.org/10.15837/ijccc.2022.6.4952

CCC Publications

A financial time series data mining method with different time
granularity based on trend Division

Haining Yang, Xuedong Gao, Lei Han, Wei Cui

Haining Yang*
Department of Management Science and Engineering,
School of Economics and Management,
University of Science and Technology Beijing, Beijing 100083, China
*Corresponding author: yanghaining@apiins.com

Xuedong Gao, Lei Han
Department of Management Science and Engineering,
School of Economics and Management,
University of Science and Technology Beijing, Beijing 100083, China
yanghaining@apiins.com, gaoxuedong@manage.ustb.edu.cn

Wei Cui
School of Economics and Management,
China University of Geosciences (Bei-jing),
Beijing 100083, China
cuiw@cugb.edu.cn

Abstract

Stock research is an important field of Finance and time series research. Stock data research
is also a typical financial time series problem. In the research of financial time series, there are
many methods, such as model building, data mining, heuristic algorithm, machine learning, deep
learning, and so on. VAR, ARIMA and other methods are widely used in practice. ARIMA and
its combination methods have good processing effect on small data sets, but there are over fitting
problems, which are difficult to process large data sets and data with different time granularity. At
present, this paper takes the decision table transformation method of financial time series data as
the research object, and puts forward the trend division method of financial time series based on
different time granularity through the trend division of financial time series. On this basis, it puts
forward the trend extreme point extraction method, and constructs the stock time series decision
table according to the extreme point information and combined with the stock technical indicators,
The decision table is verified by support vector machine based on the decision table. The research
shows that the trend division method under different time granularity can transform the extreme
point information into a decision table, which will not produce over fitting problem in practical
application. It is an effective time series processing method, and provides a new research method
for the future time series research with different granularity.

Keywords: Time series, Trend division, Time granularity, Arima, Support vector machine.

https://doi.org/10.15837/ijccc.2022.6.4952 2

1 Introduction
Time series analysis is an important research field of traditional statistics. After decades of de-

velopment, it has laid its own theoretical foundation. The research object of traditional time series
analysis is generally dynamic data with randomness, and the research method focuses on the construc-
tion of global model. At present, the research of time series mainly focuses on econometric methods
and data mining methods. However, in the actual analysis of time series data, it is often necessary
to analyze the local characteristics of time series, such as finding the frequent patterns, comparing
different sequences or whether the same sequence has similarity in different time periods, etc.

Time series records the state values of observation targets at each time point to form complex high-
dimensional data. It can be divided into two categories. One is simple time series, which is stable
and linear; The other is complex time series, which is non-stationary, nonlinear and multidimensional.
The process of analyzing the current observation value of time series and inferring the value of the
observed object at a future time node is called time series prediction.

Time characteristic is an important feature of time series data. The time characteristic of data is
not only an index but also a division basis, and it has the characteristics of independent variable and
dependent variable. Time series data also has the characteristics of multi time granularity. Taking
stock data as an example, there are data with different granularity, such as 1 minute, 5 minutes, 15
minutes, 30 minutes, 1 hour, day, week, month, quarter, year and so on. There is some uncertain
relationship between various granularity, which also provides a new research direction for time series
prediction.

Stock prices, trading volume indicators and many new indicators derived from them can better
reflect the internal relationship of stock prices. Making full use of these indicator data often has certain
guiding significance for the decision-making of individual stock trends, choices and trading time points.
Based on this, it is necessary to further study the multi granularity time series forecasting method
based on trend division and combined with a variety of indicators, establish a reasonable and effective
stock price forecasting model, enrich and develop the theory and method of stock time series research,
and provide a reliable basis for investors’ decision-making.

To sum up, this paper will process time series with different time granularity according to the
trend division of time series, extract extreme points from general time series, build decision table, and
mine stock time series according to the decision table, hoping to provide a new method for processing
financial time series.

2 Related work

2.1 Research on univariate time series

In the field of econometrics, univariate time series modeling has become a very important research
field in economic management research [1, 2, 3]. Time series analysis is a statistical method to
reveal the dynamic structure and regular characteristics of the system according to the fluctuation
of dynamic data. The core idea is: according to the limited length of time series data observed by
the system, establish the corresponding data model to reflect the dynamic dependencies contained in
the time series, and thus predict the future trend of the system. In the development of time series
model, an important feature is the assumption of statistical equilibrium relationship, one of which is
the assumption of stationarity. Usually, a stationary time series can be effectively described by its
mean, variance and autocorrelation function. In general, the main methods of data research are model
construction [4, 5] and data analysis. In the next part, we will review different methods.

Financial time series is an important research field of time series, which has many data patterns.
The research on the trend and price of stocks is an important research hotspot of financial time series.
At present, the research of stock time series focuses more on fitting the series itself and making time
series prediction based on it. Under this research paradigm, the prediction is often effective in a
certain time range, but the generalization ability is poor. For example, the differential integrated
moving average autoregressive model (ARIMA) can deal with non-stationary time series, but because
the processing process of time series’ stationarity masks the complex and diverse characteristics of

https://doi.org/10.15837/ijccc.2022.6.4952 3

fluctuation patterns in the series, it is also unable to make a more accurate prediction for complex and
changeable time series. The discrete Fourier transform (DFT) algorithm is simple to calculate, and can
concentrate most of the energy of the time series signal into a few coefficients. The time complexity
is also relatively low, but the algorithm discards the high-frequency component of the signal in the
process of data interception, smoothes the local maximum and minimum of the signal, and causes
the omission of information, and it is not suitable for non-stationary sequences [6]. Discrete wavelet
transform (DWT) can consider the local characteristics of time domain and frequency domain at the
same time. On the other hand, under the same data compression, DWT can better retain local details
and provide more accurate approximate results than discrete Fourier transform. Discrete wavelet
transform does not reduce the relative image error, nor does it improve the accuracy of similarity
query. From the experimental results, the effect of discrete wavelet transform and discrete Fourier
transform is not very different [7].

In the 1970s, ARIMA (autoregressive integrated moving average model) was proposed by G.E.P.Box,
an American statistician, and G.M.Jenkins, a British statistician [8]. The time series prediction
method using this model is also called Box-Jenkins method. ARIMA (P, D, q) model includes autore-
gressive (AR) process, moving average (MA) process, autoregressive moving average (ARMA) process
and ARIMA process. Where p is the autoregressive term, D is the number of differences when the
time series is processed smoothly, and Q is the moving average term.

The stability of ARMA (P, q) process depends on the autoregressive part, while its reversibility
depends on the moving average part. The nonstationary of time series is various, but the economic time
series often has the characteristics of linear homogeneous nonstationary. If a random process contains
D unit roots, the random process can be transformed into a stationary autoregressive moving average
process after D times of difference, and the random process is a single integer autoregressive moving
average process, which is recorded as ARIMA (P, D, q). However, these time series analysis methods
require linear models. In order to overcome this defect, nonlinear time series models have become a
research hotspot.

Granger [9] The bilinear time series model is proposed. By using the residual information gener-
ated in the process of time series prediction, the model has the characteristics of concise structure,
few parameters and strong adaptability in practical application, which makes the model have wide
application value in nonlinear time series analysis. With the development of interdisciplinary research,
neural network, wavelet decomposition, Fourier transform and fractal theory and methods are used
for data mining and prediction of time series.

From the existing literature and the mathematical expression of the above time series, it is found
that:

1. The premise of time series analysis is to assume that the past of variables will extend to the
future, that is, the future of the system will not suddenly change by leaps and bounds, but a
gradual process.

2. In time series analysis, we should first test the stability of time series, select the appropriate
time series model after appropriate data processing (such as difference) and passing the test,
and estimate the parameters in the model.

The prediction of stock prices and trends mainly revolves around several aspects. One is technical
indicators and feature engineering. Hossain, e., et al. Used the technical analysis of expert system
based on belief rules (BRBES), combined with the concept of brin belt, to predict the stock prices in
the next five days, and studied the effects of different deep learning methods [10]; Ji, g., et al. Studied
the application of Feature Engineering in stock prediction, proposed improved technical indicators
based on wavelet denoising and a new two-stage adaptive feature selection method, and adopted
random forest model as the stock prediction model [11]; Li, G. Z., et al. Proposed a multi index
feature selection method for stock price prediction based on Pearson correlation coefficient (PCC) and
generalized learning system (BLS) [12].

The methods of stock forecasting mainly focus on deep learning and fusion models. Deng, C.
R., et al. Developed a hybrid stock price index prediction modeling framework using long-term and
short-term memory (LSTM) and multivariate empirical mode decomposition (MEMD), which can

https://doi.org/10.15837/ijccc.2022.6.4952 4

capture the intrinsic characteristics of the complex dynamics of the stock price index time series [13];
Gao, R. Z., et al. Proposed a deep learning method combined with genetic algorithm to predict the
target stock market index [14]; Gao, Z., et al. Proposed a prediction algorithm integrating multiple
support vector regression (SVR) models, and used reasonable weight to combine the prediction results
of multiple models to improve the accuracy of the model [15]; Gupta, U., et al. In order to overcome
the problem of overfitting, a new data enhancement method was proposed in the StockNet model based
on Gru [16]; He, Q. Q., et al proposed a new case-based deep transfer learning model with attention
mechanism [17]; Kanwal, A., et al. Proposed a prediction model based on hybrid deep learning (DL),
which combines deep neural network, short-term memory and one-dimensional convolutional neural
network (CNN) [18]; Kumar, R., et al. Proposed a three-stage fusion model to process time series
data and improve the accuracy of stock market prediction [19]; Li, R. R., et al. Proposed a multi-scale
modeling strategy based on machine learning methods and econometric models [20].

There are some new methods applied to stock time series prediction. Liang, M. X., et al. Used
sequence morphology mining to obtain Candlestick morphology from multidimensional Candlestick
data, calculated the correlation between different morphology and corresponding future trends, and
proposed a new sequence similarity, matching different candle graph sequences with existing patterns
[21]; Wang, C. J., et al. Use the deep learning framework transformer to predict the stock market
index. Through the encoder decoder architecture and multi focus mechanism, research shows that
transformer can better describe the basic rules of stock market dynamics [22].

2.2 Literature review and analysis

Through the above literature review, we find that time series has become a more and more popular
research field, especially the rise of deep learning technology, which partially solves the problem of
time series’ interconnection in time scale, and time series research has become an independent research
direction. However, the traditional time series research attempts to use a single or multiple models
to explain or characterize the time series, and make time series prediction accordingly. Under this
research paradigm, the prediction is often effective within a certain time range, but the generalization
ability is poor (only effective for the trained data, and poor for the new data), which cannot meet
the requirements of the era of big data. Machine learning methods need to rely on artificial feature
selection. Although the deep learning algorithm has achieved good results, it is still in a black box
state with weak interpretability. Moreover, the deep learning algorithm is also prone to butterfly
effect. A little change can have a significant impact on the results of the deep learning model, it is
also easy for decision makers to make wrong conclusions. Some scholars also began to explore the
combination of deep learning algorithm and traditional methods to achieve better results.

3 Methodology

3.1 Comparative analysis of multiple time granularity prediction based on ARIMA
model

As one of the traditional time series prediction models, ARIMA model has the following general
steps:

1. Stability test. The ADF unit root test is used to check whether the sequence is stable. If the
sequence is unstable, the difference operation is performed.

2. White noise inspection. Check whether the sequence is random.

3. Model order determination. ACF and PACF functions are used to determine the order of the
model and determine the final ARIMA model.

4. Model prediction. The ARIMA model is used to predict the sequence data.
There are many evaluation indexes for the performance of the model. In this paper, the mean

absolute error (MAE), mean square error (MSE), root mean square error (RMSE) and the coefficient
of determination R2 are selected to evaluate the performance of the model.

https://doi.org/10.15837/ijccc.2022.6.4952 5

This study selects the Growth Enterprise Market (GEM) index (399006) 1-minute interval, 5-
minute interval, 15-minute interval, 30-minute interval, 60-minute interval and daily interval data,
including the date from August 19, 2019 to August 19, 2022. Specific data statistics are shown in
Table 1:

Table 1: Basic data
Data introduction

data sources Wind database
Number of 1-minute interval data 176176 pieces
Number of 5-minute interval data 34944 pieces
Number of 15-minute interval data 11648 pieces
Number of 30-minute interval data 5816 pieces
Number of 60-minute interval data 2912 pieces
Number of daily interval data 728 pieces

In this paper, the model is built for different granularity data sets, and the performance of the
model is statistically analyzed.

(1) ARIMA model of daily data

This paper uses Python language to write ARIMA model, and takes the last 10 data of the data
set as the test set to judge the generalization ability of the model. First, the stability test of the
sequence is performed. Table 2 shows the results of ADF unit root test:

Table 2: ADF unit root test of daily line data

Differential order T P critical value1% 5% 10%
1 -26.887 0 -3.439 -2.865 -2.569
2 -16.557 2.256 -3.439 -2.865 -2.568

The white noise test for the stationary sequence after the first-order difference shows that the
p value is 2.4656815e-05, which is significantly less than 0.1, proving that the sequence is non pure
random sequence, which can be used for model construction. The sequence diagram after 1st order
difference is shown in Figure 1:

Figure 1: 1st order differential sequence diagram of daily interval data

According to the data after the first-order difference, the ACF and PACF functions are used to
determine the order of the model. The ACF and PACF diagrams are shown in Figure 2:

https://doi.org/10.15837/ijccc.2022.6.4952 6

Figure 2: Daily interval data ACF

Figure 3: Daily interval data PACF

Observing the ACF chart and PACF chart in Figure 3, using the BIC criterion with objective
criteria to determine the order of the model, it is found that when p = q = 0, the amount of BIC
information achieves the minimum value, and the model achieves the optimal fitting effect, that is,
the fitting model of the original series is Arima (0,1,0), and it can also be seen that the stock time
series meets the random walk model, That is, the position of the current time is only related to the
position of the previous time.

The ARIMA model obtained is used for fitting and prediction, and the fitting results are shown
in Figure 4:

https://doi.org/10.15837/ijccc.2022.6.4952 7

Figure 4: Fitting results of daily line model

The prediction of the model on the test set is shown in Table 3:

Table 3: Forecast value of daily interval model
time True value predicted value Absolute value of fitting error

2022/8/8 2675.686 2682.774 7.0876
2022/8/9 2694.8 2674.883 19.9173
2022/8/10 2658.585 2693.935 35.3507
2022/8/11 2721.493 2657.838 63.6549
2022/8/12 2690.831 2720.541 29.7095
2022/8/15 2718.593 2689.979 28.6138
2022/8/16 2731.393 2717.651 13.7426
2022/8/17 2777.91 2730.409 47.5004
2022/8/18 2775.821 2776.774 0.9534
2022/8/19 2734.218 2774.692 40.4743

Model performance indicators are shown in Table 4:

Table 4: Performance indicators of daily interval model

Dataset
index MAE MSE RMSE MAPE

Model training set 37.8757 3270.24 57.186 0.0145
Model training set 33.962 1153.419 33.962 0.0105

Table 4 shows that the performance of the model on the training set is excellent and can basically
meet the requirements, but the prediction effect on the test set is not particularly ideal, and the
prediction error is large, reflecting the poor generalization ability of the model.

Since the modeling process of other granularity data is consistent with the daily data, this paper
will not repeat the research process, but the performance index statistics of ARIMA models with
different time series granularity are shown in Table 5:

https://doi.org/10.15837/ijccc.2022.6.4952 8

Table 5: Performance indicators of ARIMA model with different granularity

Dataset
index Mae MSE RMSE MAPE

1Minute interval model training set - - - -
1Minute interval model training set - - - -
5Minute interval model training set 4.1735 79.4054 8.9109 0.0015
5Minute interval model test set 3.0347 12.84771 3.5843 0.0011
15Minute interval model training set 7.5064 171.1889 13.0839 0.0028
15Minute interval model test set 6.956 62.6869 7.9175 0.0025
30Minute interval model training set 11.0193 354.5312 18.8289 0.0041
30Minute interval model test set 7.342 83.2282 9.1229 0.0026
60Minute interval model training set 16.3431 749.5902 27.3786 0.0062
60Minute interval model test set 9.1569 150.6016 12.2719 0.0033
Note: - indicates the amount of data that the computer cannot process.

According to the performance index statistics of ARIMA models with different granularity, with
the decrease of granularity, the amount of data increases significantly, and the MAE, MSE and RMSE
of the training set model all show a downward trend. However, compared with the training set,
the prediction effect of the model on the test set is poor, reflecting the over fitting phenomenon of
the model, and with the increase of the amount of data, the over fitting phenomenon is becoming
more and more obvious. This also reflects that the traditional time series research attempts to use a
single or multiple models to explain or characterize the time series, and make time series prediction
accordingly. Under this research paradigm, the prediction is often effective in a certain time range,
but the generalization ability is poor, and it is difficult to deal with multi granularity time series
effectively.

3.2 Establishment of multi granularity time series decision table based on trend
partition

In 2011, Preis et al [23] A time renormalization method is proposed to analyze the switching
dynamics of the German futures index (FDAX). Its research shows that the switch point in the
German futures market index tends to have large volatility, trading volume and intensive trading time
interval.

First, at the price P (t) of time t, if we want to judge whether P (t) is a turning point, we must
compare the prices of a window size (S) before and after P (t), that is, P (t−S) and P (t + S). If there
is no price higher than any price P (t) in the range of P (t − S) to P (t + S), then any price P (t) is
defined as the highest price Pmax(S). In the same way, from P (t − S) to P (t + S), if no price is lower
than any price P (t), then any price P (t) is defined as the lowest price Pmin(S), and the turning point
we mentioned earlier is the maximum price Pmax(S) and the minimum price Pmin(S) defined above.
As shown in the figure.

Schematic diagram of price change over time: in the stock price trend chart, select a window size
(S), take s as the range, and judge whether P (t) is the minimum or maximum value within this range
from P (t − S) to P (t + S). It can be seen from the figure that in Pmax(S) and Pmin(S), the blue line
is Pmax(S) , and the red line is Pmin(S).

https://doi.org/10.15837/ijccc.2022.6.4952 9

Figure 5: Price and trend turning point

Next, further discuss the concept of trend from the turning point, as shown in Fig 2 Pmin.1(S) to
Pmin.1(S) The price fluctuation between them showed an upward trend, followed by a downward trend
to Pmin.2(S). In this section, we regard the price change of the Taiwan index period as consisting of
several upward and downward trends. Within a certain range, each trend contains this extreme point,
and this local extreme point forms the reverse trend of the time series. Identifying these extreme
points has important research value.

3.3 Extraction of time series trend extreme points based on time series value
recursion

Time series data is data that grows with time. The longer the data exists, the larger the amount
of data. It is very important to find data points that can represent characteristics and have reference
value from these data when analyzing time series data. The reason is that the data can be compressed
and less data can show the trend of the overall huge amount of data. The trend division of time series
should first consider the characteristics of time series. Taking the stock price of financial time series as
an example, the general stock information is expressed in the form of one minute, five minutes, fifteen
minutes, thirty minutes, days, weeks, months and years. In different time granularity, there are often
great differences in the division of trends.

The original time series trend division method generally revolves around several key points, such as
segmentation range, sliding window, segmentation point and threshold determination. These contents
make the trend division of time series full of uncertainty. Although the results can be continuously
optimized through multiple experiments, a structured trend division method has not been formed,
and its stability is difficult to ensure in the process of use. However, in the subsequent links such as
data prediction, the over fitting phenomenon will be caused due to different parameters.

Dow theory points out that the main trend, medium-term trend and short-term trend are the three
main manifestations of stock price changes, and the newly formed bottom and top prices are the main
basis for judging the formation of the trend. The original time series trend division method generally
revolves around several key points, such as segmentation range, sliding window, segmentation point
and threshold determination. These contents make the trend division of time series full of uncertainty.
Although the results can be continuously optimized through multiple experiments, a structured trend
division method has not been formed, and its stability is difficult to ensure in the process of use.
However, in the subsequent links such as data prediction, the over fitting phenomenon will be caused
due to different parameters.

According to the characteristics of time series, this paper proposes a new time series trend division
method. The main idea is that a trend must have extreme points, and a local maximum extreme point
and two local minimum extreme points before and after constitute an upward (or downward) trend.
Therefore, in the process of trend recognition, it is particularly important to find the extreme points
in the time series. This paper uses Python to write programs to find the extreme points of time series,
and then identify the rising and falling trends in time series. The specific algorithm is shown in Table
6:

https://doi.org/10.15837/ijccc.2022.6.4952 10

Table 6: Extraction of time series trend extreme points based on time series value recursion
Algorithm:
Input: original time series data s = {s1, s2, ..., sn}, step M
Output: index k of maximum point and index H of minimum point of time series s.
(1) For

I = {1, 2, 3, ..., n}, do
(2) If s1 > s1 + m,
(3) K ← 1
(4) Else if s1 < s1 + m,
(5) H ← 1
(6) Else if si − m < si < si + m,
(7) Break
(8) Else if si − m < si > si + m,
(9) K ← I

(10) Else si − m > si < si + m,
(11) H ← I
(12) Organize index K, H
(13) End

In order to verify the feasibility of the above algorithm, a group of time series data is randomly
selected to extract the extreme points. Taking the time series of [1, 6, 30, 20, 15, 7, 13, 9, 2, 6, 0, -7,
-15, -6, 4, 13, 8, 15, 8, 12, 3,0, 20, 7, 3, 9] as an example, using the method proposed in this paper,
the step size is 1, and the extreme points are identified. The visualization results are shown in Figure
6.

Figure 6: Visualization of extreme point extraction of random time series

The extreme points extracted are: maximum points: [30,13,6,13,15,12,20,9]; Minimum point:
[1,7,2, -15,8,8,0,3]. The results are consistent with the expected results, which shows that the proposed
method is effective.

The decision table, also known as the judgment table, is suitable for describing the situation where
there are too many judgment conditions and the conditions are combined to form a variety of decision
schemes. The decision table is composed of state variables (condition items) and decision variables
(action items), and each state variable and decision variable together constitute a rule. The decision
table can list complex problems according to various possible situations. Using the decision table, the
state variables can be understood as inputs and the decision variables as outputs. This is more similar
to the idea of machine learning in data mining, so decision table can be widely used in classification,
clustering, association rule mining and other fields.

https://doi.org/10.15837/ijccc.2022.6.4952 11

1. Determination of decision variables

The above algorithm is used to process the time series data set. When the step size is set to 1
at runtime, the local maximum and local minimum of trends with different time granularity can be
obtained. The maximum point is represented by "o", and the minimum point is represented by "+".
The idea of two classification is adopted, and the local maximum and local minimum are used as the
decision variables of the information table. At the same time, for the stock time series data set, the
only intuitive information we can get is the real-time stock price and trading volume. If only the price
and trading volume are included, the information in the decision table is very little, which is difficult
to meet the requirements of data processing and effectively predict the decision variables. Therefore,
we need to further enrich the information table, Select more indicators related to decision variables to
form the information table.

2. Determination of factors in decision table

In the current stock market research, some people have studied it from the perspective of enterprise
management and finance, forming a fundamental analysis method, while others have studied it from
the perspective of stock price and form construction, forming a technical analysis method. Stock
technical analysis is based on derivative data as the research object, for example, by investigating the
charts of data such as stock price and trading volume, and predicting the future trend according to the
stock history and current performance. Technical indicators provide data support for investors who
analyze stocks from the intraday. Therefore, adding the existing empirical indicators of stock technical
analysis to the decision table will greatly increase the content of the decision table and enable it to
use the decision table to carry out data mining.

In technical analysis, the most important factors include price, quantity, time and space.

• The most basic element of technical analysis is price, which is also one of the key elements of
the stock market; The intrinsic value of stock is subject to change, and its fluctuation is jointly
affected by the operating conditions of the stock issuer and the behavior of the stock market.
When the stock issuer has a significant positive or negative effect, the stock chips will have a
corresponding effect, which will be directly reflected by the rise and fall of the stock price.

• Another important element of technical analysis is trading volume. Trading volume is an impor-
tant factor affecting the formation of stock prices. The standard for confirming the price form
is trading volume. In the stock market, only buying and selling can lead to price fluctuations.
When the stock price is low and there are a large number of purchases, it is generally believed
that the stock price will rise. When the stock price is relatively high and there are a large num-
ber of sales, the stock price will generally fall. Therefore, investors generally regard the bottom
volume and the top volume as one of the signs that the stock price will reverse.

• Time means that we need to think about the trajectory of stock price from the perspective of
time, and the trend of stock price is cyclical. There are two main aspects, one is the stock
price range, the other is the trend. The time here refers to the length of time that the stock
price remains within a specific price range or trend and runs. The trend is difficult to change
significantly in the short term, but it is not unchanged forever. As time changes, new trends
will appear unconsciously.

• Space refers to the size and range of stock price changes, or the range of stock prices. Every
stock has a high point and low point that can be reached due to the influence of the company’s
operation and market factors. When the stock reaches a high point or the low point reverses,
the price between the high and low points is the size of the stock’s operation space.

Common technical indicators and classification are shown in Table 7.

https://doi.org/10.15837/ijccc.2022.6.4952 12

Table 7: Important technical indicators of stock time series

Technical index category Name and abbreviation of technical in-dicators key parameter

Trend indicator

Ma simple moving average: Ma5, ma10 5, 10
MACD exponential smooth moving average:
diff, DEA, MACD 26, 12, 9

MTM dynamic index: MTM 6, 6
Trend indicator: DMI 14, 6
Index average index: expma 12, 50

Reverse trend indicator

Bias deviation rate: bias 12
KDJ random index: K, D, j 9, 3, 3
ROC change rate: ROC 12, 6
RSI relatively strong index: RSI 6

Pressure support index Boll brin belt 26, 2

Volume price index Sobv energy tide: sobv 1Wvad William variation dispersion: wvad 24, 6

Volume index
Standard deviation of vstd trading volume:
vstd 10

Turnover rate: turn 1
Swing index RC change rate index: RC 8

Fluctuation index STD standard deviation: STD 26

3. Based on establishment of time series information table for extreme point extraction

After the above analysis, the information table of extreme points based on trend division includes
the following contents:

• The decision variables are given by the algorithm proposed in this paper.

• The state variables include various basic data and technical indicators de-scribed above, and the
time series information table shown in Table 8 can be obtained.

Table 8: Time series information table
time Status variable State variables Decision(basic data) (technical indicators) variable

minute Opening Maximum Lowest Closing volume TurnoverMA MACDBias KDJ RSI BOL
price price price price

| | | | — — — | | | +
| | | | — — — | | | -

The decision table with different time granularity contains 12 state variables, including 6 basic
data and 6 technical index data, which are opening price, maximum price, minimum price, closing
price, trading volume, trading volume, Ma, MACD, bias, KDJ, RSI and BOL. The extreme points
extracted are used as decision variables.

The following table shows the number of extreme points after processing at different time granu-
larity.

Table 9: Extreme points under different granularity
data sources Number of extreme points
Number of 5-minute interval data 16911
Number of 15-minute interval data 5707
Number of 30-minute interval data 2798
Number of 60-minute interval data 1405
Number of daily interval data 381

https://doi.org/10.15837/ijccc.2022.6.4952 13

Based on the analysis of ARIMA model, this section proposes a multi granularity time series de-
cision table conversion method based on trend division. By using the method of time series value
recursion, the extreme points of time series are extracted, and the extreme points are taken as deci-
sion variables, and the time series decision table is constructed by combining the relevant technical
indicators of stock time series and other data. Transforming time series data into decision table is
the basis of time series classification modeling. Therefore, this chapter is a key content of this paper,
paving the way for the next application.

4 Validation of time series decision table based on support vector
machine

This experiment uses Python language sklearn. The SVC method in the SVM module is used to
implement the support vector machine model. The kernel function is set to "linear" by the kernel, and
the data set division method in the sklearn module is also used to realize the data set division. The
data set is divided according to the ratio of 7:3.

Table 10 shows the accuracy of support vector machine model:
Table 10: Accuracy of support vector machine models with different granularity

data set Accuracy
5-Minute interval dataset 77.82%
15-Minute interval dataset 79.07%It is necessary to
30-Minute interval dataset 76.66%
60-Minute interval dataset 76.30%It is necessary to

Table 11 shows the confusion matrix generated by support vector machine models with different
granularity:

Table 11: Confusion matrix of support vector machine models with different granularity
Predict

5-Minute interval support vector machine TRUE 0 1
0 1980 560
1 565 1969

Predict
15-Minute interval support vector machine TRUE 0 1

0 685 192
1 167 669

Predict
30-Minute interval support vector machine TRUE 0 1

0 320 89
1 107 324

Predict
60-Minute interval support vector machine TRUE 0 1

0 164 54
1 46 158

Accuracy of different granularity support vector machine models with different step sizes removed:

https://doi.org/10.15837/ijccc.2022.6.4952 14

Table 12: Different SVM models with different step removed

data set Remove the accuracy Remove the accuracy Remove the accuracyof step size 1 of steps 1 and 2 of steps 1, 2 and 3
5Minute line dataset 88.58% 91.03% 93.64%
15Minute line dataset 88.48% 90.25% 93.12%
30Minute line dataset 89.91% 90.02% 92.31%
60Minute line dataset 86.87% 88.44% 91.45%

The confusion matrix generated by support vector machine models with different granularity is
shown in the table below:

Table 13: 5-minute dataset
Predict

Remove the support vector machine with step size of 1 TRUE 0 1
0 1489 191
1 196 1515

Predict
Remove support vector machines with steps of 1 and 2 TRUE 0 1

0 1160 120
1 111 1177

Predict
Remove support vector machines with steps of 1, 2 and 3 TRUE 0 1

0 908 63
1 69 1028

Table 14: 15-minute dataset
Predict

Remove the support vector machine with step size of 1 TRUE 0 1
0 498 66
1 65 509

Predict
Remove the support vector machine with steps of 1 and 2 TRUE 0 1

0 401 41
1 44 381

Predict
Remove support vector machines with steps of 1, 2 and 3 TRUE 0 1

0 341 28
1 21 307

https://doi.org/10.15837/ijccc.2022.6.4952 15

Table 15: 30-minute dataset
Predict

Remove the support vector machine with step size of 1 TRUE 0 1
0 254 24
1 33 254

Predict
Remove support vector machines with steps of 1 and 2 TRUE 0 1

0 199 26
1 17 186

Predict
Remove the support vector machines with steps of 1, 2 and 3 TRUE 0 1

0 138 12
1 14 172

Table 16: 60-minute dataset
Predict

Remove the support vector machine with step size of 1 TRUE 0 1
0 133 20
1 17 112

Predict
Remove support vector machines with steps of 1 and 2 TRUE 0 1

0 92 11
1 14 97

Predict
Remove the support vector machines with steps of 1, 2 and 3 TRUE 0 1

0 88 8
1 7 70

From the above data, it is known that using multi-granularity time series decision table based
on trend partition and related classification methods can effectively predict the trend points of stock
time series. The accuracy of prediction is high, and no fitting phenomenon will occur. Better predic-
tion results can be obtained by adjusting tolerance errors and filtering extreme points. The specific
mechanism will be explored in future research.

5 Conclusions
This paper proposed a new time series trend division method, which uses tol-erance error (or

step adjustment) to determine the trend division of time series. This method can process time series
into two modes, rise and fall. Based on the trend partitioning method of time series with different
granularity, the series are divided into rise and fall trends, which avoid the complexity of threshold,
numerical length and multiple trends. Take the easiest way to identify trends. The method is simple
and the time complexity is low. On the other hand, with time granularity as the main measure, the
nested structure of time series with different time granularity based on trend partitioning is obtained.
This structure can fully express the struc-ture characteristics of time series, which provides a good
basis for further study on time series.

A trend-based extreme point extraction algorithm is proposed, which can ex-tract extreme points
with different time granularity. On the basis of getting the extreme points, the high (1) and low
(0) extreme points are used as decision vari-ables. The other information (such as mean, volume,
etc.) of the extreme points is extracted to establish time series decision tables that divide different

https://doi.org/10.15837/ijccc.2022.6.4952 16

time granu-larities based on trends. On this basis, the decision table was validated by random forest
method, and the results were good.

The proposed multi-granularity time series decision table can accurately characterize the data
characteristics of time series and can be used for complex time series mining methods. The data
of decision table is symmetrical, does not produce unbalanced data phenomenon of general decision
problem, has higher mining ac-curacy, and significantly improves the prediction accuracy compared
with the original time series regression and measurement model. It provides a new method for the
study of time series problem, and has strong theoretical significance and application value.

References
[1] Duplakova D, Teliskova M, Duplak J, et al. Determination of Optimal Production Process Us-

ing Scheduling and Simulation Software[J]. International Journal of Simulation Modelling, 2018,
17(4): 609–622.

[2] Janekova J, Fabianova J, Fabian M. Assessment of Economic Efficiency and Risk of The Project
Using Simulation[J]. International Journal of Simulation Modelling, 2019(2): 18.

[3] Junior W S, Montevechi J, Miranda R, et al. Economic Lot-size using Machine Learning, Paral-
lelism, Metaheuristic and Simulation[J]. International Journal of Simulation Modelling, 2019(2):
18.

[4] Xu, W.; Sun, H.Y.; Awaga, A.L.; Yan, Y.; Cui, Y.J. (2022). Optimization approaches for solving
production scheduling problem: A brief overview and a case study for hybrid flow shop using
genetic algorithms, Advances in Production Engineering & Management, Vol. 17, No. 1, 45-56

[5] Lu Zhang, Yan Yan, Wei Xu, Jun Sun, Yuanyuan Zhang, "Carbon Emission Calculation and
Influencing Factor Analysis Based on Industrial Big Data in the “Double Carbon” Era", Compu-
tational Intelligence and Neuroscience, vol. 2022, Article ID 2815940, 12 pages, 2022.

[6] Agrawal R., Faloutsos C, and Swami A. Efficient similarity search in sequence databases[Z].
In D. Lomet, editor, Proceedings of the 4th International Conference of Foundations of Data
Organization and Algorithms (FODO), pages69-84, Chicago, Illinois, 1993. Springer Verlag.

[7] Burrus C S, Gopinath R A., and Guo H. Introduction to wavelets and wavelet transforms[M] A
Primer. Prentice Hall, 1998.

[8] Box G E P, Jenkins G M, Reinsel G C. Time series analysis: forecasting and control[M]. Wiley.
com, 2013

[9] Granger C W J, Andersen A P. An introduction to bilinear time series models[M]. Göttingen:
Vandenhoeck und Ruprecht, 1978

[10] Hossain E, Hossain M S, Zander P O, et al. Machine learning with Belief Rule-Based Expert
Systems to predict stock price movements [J]. Expert Syst Appl, 2022, 206(13).

[11] Ji G, Yu J M, Hu K, et al. An adaptive feature selection schema using improved technical
indicators for predicting stock price movements [J]. Expert Syst Appl, 2022, 200(12).

[12] Li G Z, Zhang A N, Zhang Q Z, et al. Pearson Correlation Coefficient-Based Performance En-
hancement of Broad Learning System for Stock Price Prediction [J]. IEEE Trans Circuits Syst
II-Express Briefs, 2022, 69(5): 2413-7.

[13] Deng, C. R., et al. (2022). "Multi-step-ahead stock price index forecasting using long short-term
memory model with multivariate empirical mode decomposition." Information Sciences 607: 297-
321.

https://doi.org/10.15837/ijccc.2022.6.4952 17

[14] Gao, R. Z., et al. (2022). "Forecasting the overnight return direction of stock market index com-
bining global market indices: A multiple-branch deep learning approach." Expert Systems with
Applications 194: 18.

[15] Gao, Z., et al. "Financial sequence prediction based on swarm intelligence algorithms and internet
of things." Journal of Supercomputing: 21.

[16] Gupta, U., et al. (2022). "StockNet-GRU based stock index prediction." Expert Systems with
Applications 207: 16.

[17] He Q Q, Siu S W I, Si Y W. Instance-based deep transfer learning with attention for stock
movement prediction [J]. Appl Intell, 22.

[18] Kanwal A, Lau M F, Ng S P H, et al. BiCuDNNLSTM-1dCNN-A hybrid deep learning-based
predictive model for stock price prediction [J]. Expert Syst Appl, 2022, 202(15).

[19] Kumar R, Kumar P, Kumar Y. Three stage fusion for effective time series forecasting using
Bi-LSTM-ARIMA and improved DE-ABC algorithm [J]. Neural Comput Appl, 17.

[20] Li R R, Han T, Song X. Stock price index forecasting using a multiscale modelling strategy
based on frequency components analysis and intelligent optimization [J]. Appl Soft Comput, 2022,
124(15).

[21] Liang M X, Wu S C, Wang X L, et al. A stock time series forecasting approach incorporating
candlestick patterns and sequence similarity [J]. Expert Syst Appl, 2022, 205(26).

[22] Wang C J, Chen Y Y, Zhang S Q, et al. Stock market index prediction using deep Transformer
model [J]. Expert Syst Appl, 2022, 208(10).

[23] Preis, T., Schneider, J. J. & Stanley, H. E. Switching processes in financial markets. Proceedings
of the National Academy of Sciences of the United States of America 108, 7674–8 (2011).

Copyright ©2021 by the authors. Licensee Agora University, Oradea, Romania.
This is an open access article distributed under the terms and conditions of the Creative Commons
Attribution-NonCommercial 4.0 International License.
Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/

This journal is a member of, and subscribes to the principles of,
the Committee on Publication Ethics (COPE).

https://publicationethics.org/members/international-journal-computers-communications-and-control

Cite this paper as:
Yang, H.; Gao, X.; Han, L.; Cui, W. (2022). A financial time series data mining method with different
time granularity based on trend Division, International Journal of Computers Communications &
Control, 17(6), 4952, 2022.

https://doi.org/10.15837/ijccc.2022.6.4952

Introduction
Related work
Research on univariate time series
Literature review and analysis

Methodology
Comparative analysis of multiple time granularity prediction based on ARIMA model
Establishment of multi granularity time series decision table based on trend partition
Extraction of time series trend extreme points based on time series value recursion

Validation of time series decision table based on support vector machine
Conclusions