Microsoft Word - 01_R.doc


HUNGARIAN JOURNAL 
OF INDUSTRIAL CHEMISTRY 

VESZPRÉM 
Vol. 35. pp. 85-93 (2007) 

APPLICATION OF EXPLORATORY DATA ANALYSIS TO  
HISTORICAL PROCESS DATA OF POLYETHYLENE PRODUCTION 

J. ABONYI 

University of Pannonia Dept. of Process Engineering, H-8201, Veszprém, P.O.Box 158, HUNGARY 
 

In modern chemical process systems huge amount of data are recorded. These data definitely have the potential to provide 
information for product and process design, monitoring and control. This paper presents a brief survey of simple Exploratory 
Data Analysis procedures have been found to be useful in the qualitative analysis of historical process data. The presented 
Box plots and quantile-quantile plots are applied to an industrial polyethylene plant to analyse different productions of a 
given product and to explore the relationships between different operating and product quality variables. 

Keywords: Exploratory data analysis, Box plot, Quintile-quintile plot 

Introduction 

The major aims of monitoring plant performance are the 
reduction of off-specification production, the identification 
of important process disturbances and the early warning 
of process malfunctions or plant faults [1]. In modern 
production systems huge amount of process operational 
data are recorded with distributed control systems (DCS). 
These data definitely have the potential to provide 
information for product and process design, monitoring 
and control [2]. Process monitoring based on multivariate 
statistical analysis of process data has recently been 
investigated by a number of researchers [3]. The aim of 
these approaches is to reduce the dimensionality of the 
correlated process data by projecting them down onto a 
lower dimensional latent variable space where the 
operation can be easily visualized. These approaches use 
the techniques of principal component analysis (PCA) 
or projection to latent structure (PLS). Beside process 
performance monitoring, these tools can be used for 
system identification [3], ensuring consistent production 
[4] and product design [1]. For these classical data 
analysis approaches, the collection of the data is 
followed by the imposition of a model and the analysis, 
estimation, and testing that follows are focused on the 
parameters of that model.  

Most of operational process data may be characterised 
as historical in the sense that is was not collected on the 
basis of experiments designed to test specific statistical 
hypothesis. Consequently, the resulting databases are 
likely to contain unexpected features (e.g. outliers from 
various sources, unexpected correlation between variables, 
etc.) This observation is important for two reasons: first, 
these data anomalies can completely negate the results 
obtained by standard analysis procedures, particularly 
those based on squared error criteria (a large class 

includes many SPC and chemometrics techniques, like 
PCA). Secondly and sometimes more importantly, an 
understanding of these data anomalies may lead to 
extremely valuable insights [5].  

Pearson suggested using Exploratory Data Analysis 
(EDA) tools for both of these reasons [5]. For 
Exploratory Data Analysis (EDA), the data collection is 
not followed by a model imposition; rather it is 
followed immediately by analysis with a goal of 
inferring what model would be appropriate. EDA is an 
approach/philosophy for data analysis that employs a 
variety of techniques (mostly graphical) to maximize 
insight into a data set, uncover underlying structure, 
extract important variables, detect outliers and anomalies, 
test underlying assumptions, develop parsimonious 
models, and determine optimal factor settings. The 
seminal work in EDA is written by Tukey, [6]. Over the 
years it has benefited from other noteworthy publications 
such as Data Analysis and Regression by Mosteller and 
Tukey [7], and the book of Velleman and Hoaglin [8].  

Most EDA techniques are graphical in nature with a 
few quantitative techniques [9]. The reason for the heavy 
reliance on graphics is that by its very nature the main 
role of EDA is to open-mindedly explore, and graphics 
gives the analysts unparalleled power to do so, enticing 
the data to reveal its structural secrets, and being always 
ready to gain some new, often unsuspected, insight into 
the data. In combination with the natural pattern-
recognition capabilities that we all possess, graphics 
provides, of course, unparalleled power to carry this out.  

The particular graphical techniques employed in EDA 
are often quite simple, consisting of various techniques 
of:  
1. Plotting the raw data (such as data traces and 

histograms).  
2. Plotting simple statistics such as mean plots, standard 

deviation plots and box plots.  


86 

3. Positioning such plots so as to maximize our natural 
pattern-recognition abilities, such as using multiple 
plots per page.  

 
The aim of this paper is to present an application 

relevant survey of some of Exploratory Data Analysis 
procedures that have been found to be particularly useful 
in the qualitative analysis of historical databases of 
production systems. The rest of this paper is organised 
as follows. The next section deals with the description 
of the problem used through the paper to illustrate the 
presented exploratory data analysis approach. Section 3. 
deals with the description of the Box plot and shows its 
application in the comparison of different production of 
a given product. In the third section the application of 
Quantile-quantile plot is proposed for the exploration of 
the differences of the different productions and for the 
analysis the relationship among different operating 
variables. The examples illustrate that the proposed 
EDA based tools are useful to identify the similar 
behaviour of operating and model quality variables.  

Problem Description 

Formulated products (plastics, polymer composites) are 
generally produced from many ingredients, and large 
number of the interactions between the components and 
the processing conditions all have the effect on the final 
product quality. If these effects are detected, significant 
economic benefits can be realized. This consideration 
lead the “Optimization of Operating Processes” project 
of the VIKKK Research Center at the University of 
Veszprém supported by the largest Hungarian polymer 
production company (TVK Ltd., www.tvk.hu). The aim 
of the project is to work out a methodology for the data-
driven improvement of process. Hence, in this paper the 
monitoring of a medium and high-density polyethylene 
(MDPE, HDPE) plant of the TVK Ltd. in Hungary is 
considered. HDPE is versatile plastic used for household 

goods, packaging, car parts and pipe. A brief explanation 
of the Phillips license based low-pressure catalytic 
process is provided in the following. 

Fig. 1 represents the Phillips Petroleum Co. suspension 
ethylene polymerization process. The polymer particles 
are suspended in an inert hydrocarbon. The melting 
point of high-density polyethylene is approximately 
135 °C. Therefore, slurry polymerization takes place at 
a temperature below 135 °C; the polymer formed is in 
the solid state. The Phillips process takes place at a 
temperature between 85-110 °C. The catalyst and the 
inert solvent are introduced into the loop reactor where 
ethylene and an α-olefin (1-hexene) are circulating. The 
inert solvent (isobuthane) is used to dissipate heat as the 
reaction is highly exothermic. A cooling jacket is also 
used to dissipate heat. The reactor consists of a folded 
loop containing four long runs of pipe 1 m in diameter, 
connected by short horizontal lengths of 5 m. The slurry 
of HDPE and catalyst particles circulate through the 
loop at a velocity between 5-12 m/s. The reason for the 
high velocity is because at lower velocities the slurry 
will deposit on the walls of the reactor causing fouling. 
The concentration of polymer products in the slurry is 
25-40% by weight. Ethylene, α-olefin comonomer (if 
used), an inert solvent, and catalyst components are 
continuously charged into the reactor at a total pressure 
of 450 psig. The polymer is concentrated in settling legs 
to about 60-70% by weight slurry and continuously 
removed. The solvent is recovered by hot flashing and 
distillation. The polymer is dried and pelletized. The 
conversion of ethylene to polyethylene is very high 
(95%-98%), eliminating ethylene recovery. The molecular 
weight of high-density polyethylene is mainly determined 
by the type of the catalyst and the temperature of the 
catalyst activation [10]. The main properties of polymer 
products (e.g. Melt Index (MI) and density) are controlled 
by the reactor temperature, monomer, comonomer and 
chain-transfer agent concentration. 

 
Comonomer
feed

Ethylene
feed

Recycling
solvent

Fresh
solvent

Catalyst

Catalyst tank

Loop reactor

Reactor
circulating
pump

Jacket
water
cooler Jacket

water
tank

Water

Bag
filter

Product
flash
tank

Steam

Purge
column

Circulating
pump

Steam

Reflux pump

Steam

Bottom pump

Nitrogen

Nitrogen

Polymer powder

Olefin free
solvent

Flash gas
compressor

Cooler

Water

Water

Water

Cooler

Reflux tank

Recycling
pump

Recycling
solvent

Distillation
column

 
Figure 1: Scheme of the Phillips loop reactor process [10] 


87

An interesting problem with the process is that it is 
required to produce about ten product grades according 
to market demand. Hence, there is a clear need to 
minimize the time of changeover because off-
specification product may be produced during transition. 
The difficulty of the problem comes from the fact that 
there are more than ten process variables to consider. 
Measurements are available in every 15 seconds on 
process variables zk, which are the zk, 1 reactor temperature 
(T), zk, 2 ethylene concentration (C2) and zk, 3 hexene 
concentration (C6) in the loop reactor, zk, 4 the ratio of 
the hexene and ethylene inlet flowrate (C6/C2in), zk, 5 
the flowrate of the isobutane solvent (C4), zk, 6 the 
hydrogen concentration (H2), zk, 7 the density of the slurry 
in the reactor (roz), zk, 8 polymer production intensity 
(PE), and zk, 9 the inlet flowrate of the catalyzator 
(KAT). The product quality yk is only determined later, 
in another process. The interval between the product 
samples is between half and five hours. The yk, 1 melt 
index (MI) and the yk, 2 density of the polymer power 
(ro) are monitored by off-line laboratory analysis after 
drying of the polymer that causes one hour time-delay.  

Since, it would be useful to know if the product is 
good before testing it, the monitoring of the process 
would help in the early detection of poor-quality product. 
There are other reasons why monitoring the process is 
advantageous. Only a few properties of the product are 
measured and sometimes these are not sufficient to 
define entirely the product quality. For example, if only 
rheological properties of polymer are measured (melt 
index), any variation in end-use application that arise 
due to variation of chemical structure (branching, 
composition, etc.) will not be captured by following  
 

only these product properties. In these cases the process 
data may contain more information about events with 
special causes that may effect the product quality [11].  

The modelling and monitoring of processes from 
data involve solving the problem of data gathering, pre-
processing, model architecture selection, identification 
or adaptation and model validation. The process data 
analyzed in this paper have been collected over three 
months of operation. The data have been extracted from 
the distributed control system (DCS) of the process. An 
SQL server has been installed to store and merge this 
data with the product quality database. According to the 
data warehousing methodology, the application relevant 
data have been extracted from this SQL database. As 
one of the objectives is to infer the values of product 
quality from process data obtained at different operating 
regions, a set of transition-free data is used that covers 
the whole range of specifications of the quality properties 
and the process variables over all the possible operating 
regions. The data were pre-processed by normalization 
performed on single variables. 

The aim of the following sections is to present 
exploratory data analysis tools that can be applied for 
the previously presented problem.  

Box plot of operating and quality variables 

Suppose that X is a real-valued random variable for the 
experiment. In our research work, the analysis of 
process and product quality variables is considered. 
Hence, the variables are X ∈ {zk, yk}. 
 

0 5 10 15 20 25 30 35 40
-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time [h]

T
em

p.
 [-

]

 
Figure 2: Example of the change of a process variable (T) 

 
88 

An example of the behaviour of a process variable is 
given in Fig. 2, where the change of the dimensionless 
reactor temperature is shown. At our industrial partner's 
request most of the data shown in this paper are 
normalized. 

The (cumulative) distribution function of X is the 
function F given by F(x) = P(X ≤ x) for x, which is a 
function giving the probability that the random variable 
X is less than or equal to x, for every value x. For a 
discrete random variable, the cumulative distribution 
function is found by summing up the probabilities. For a 
continuous random variable, the cumulative distribution 
function is the integral of its probability density 
function. Suppose that p ∈ [0, 1]. A value of x such that 
F(x−) = P(X < x) ≤ p and F(x) = P(X ≤ x) ≥ p is called a 
quantile of order p for the distribution. Roughly speaking, 
a quantile of order p is a value where the cumulative 
distribution crosses p. Hence, by a quantile, we mean 
the fraction (or percent) of points below the given value. 
That is, the 0.25 (or 25%) quantile is the point at which 
25% percent of the data fall below and 75% fall above 
that value. Note that there is an inverse relation of sorts 
between the quantiles and the cumulative distribution 

values. A quantile of order 1/2 is called a median of the 
distribution. When there is only one median, it is 
frequently used as a measure of the center of the 
distribution. A quantile of order 1/4 is called a first 
quartile and the quantile of order 3/4 is called a third 
quartile. A median is a second quartile. Assuming 
uniqueness, let q0.25, q0.5, and q0.75 denote the first (lower), 
second, and third (upper) quartiles of X. Note that the 
interval from q0.25 to q0.75 gives the middle half of the 
distribution, and thus the interquartile range is defined 
to be IQR = q0.75 − q0.25, and is sometimes used as a 
measure of the spread of the distribution with respect to 
the median. Let q0 and q1 denote the minimum and 
maximum values of X, respectively (assuming that these 
are finite). The five parameters q0, q0.25, q0.5, q0.75, q1 are 
often referred to as the five−number summary. Together, 
these parameters give a great deal of information about 
the distribution in terms of the center, spread, and 
skewness. An example of a cumulative distribution 
function and quantile is given in Fig. 3, where the 
distribution of the reactor temperature shown in Fig. 2 is 
depicted. 

 
89.2 89.4 89.6 89.8 90 90.2 90.4 90.6 90.8 91 91.2
0

10

20

30

40

50

60

70

80

90

100

Temp. [°C]

F(
T

)=
P(

T
 ≤

 x
)

q0.25 q0.5 q0.75 

 
Figure 3: Example of a cumulated distribution function of a process variable (T),  

the q0.25, q0.5, and q0.75 quintiles are also depicted 
 

Tukey’s five number summary is often displayed as 
a boxplot. Box plots are an excellent tool for conveying 
location and variation information in data sets, 
particularly for detecting and illustrating location and 
variation changes between different groups of data [9,12].  

A box plot consists of a line extending from the 
minimum value q0 to the maximum value q1, with a 
rectangular box from q0.25 to q0.75, and tick marks at the 
q0, the median q0.5, and q1 . Hence, the lower and upper 
lines of the “box” are the 25th and 75th percentiles of 
the sample. The distance between the top and bottom of 
the box is the interquartile range. The line in the middle 
of the box is the sample median. If the median is not 
centered in the box that is an indication of skewness. 
Thus the box represents the body (middle 50%) of the 
data. There is a useful variation of the box plot that is 

more specifically identifies outliers. To create this 
variation:  

1. Calculate the median and the lower and upper 
quartiles.  

2. Plot a symbol at the median and draw a box 
between the lower and upper quartiles.  

3. Calculate the interquartile range (the difference 
between the upper and lower quartile) and call it 
IQ.  

4. Calculate the following points:  
L1 = q0 – 1.5 IQ 
L2 = q0 – 3 IQ 
U1 = q1 + 1.5 IQ 
U2 = q1 + 3 IQ 

5. The line from the lower quartile to the minimum 
is now drawn from the lower quartile to the 


89

smallest point that is greater than L1. Likewise, 
the line from the upper quartile to the maximum 
is now drawn to the largest point smaller than U1.  

6. Points between L1 and L2 or between U1 and U2 
are drawn.  

 
The “whiskers” are lines extending above and below 

the box. They show the extent of the rest of the sample 
(unless there are outliers). Assuming no outliers, the 
maximum of the sample is the top of the upper whisker. 
The minimum of the sample is the bottom of the lower 
whisker. By default, an outlier is a value that is more 
than 1.5 times the interquartile range away from the top 
or bottom of the box. The plotted outlier points may be 

the result of a data entry error, a poor measurement or a 
change in the system that generated the data.  

An example of a box plot is given in Fig. 4, where the 
distribution of the reactor temperature given in Fig. 1 is 
shown.  

A single box plot can be drawn for one batch of data 
with no distinct groups. Alternatively, multiple box 
plots can be drawn together to compare multiple data 
sets or to compare groups in a single data set. Such a 
comparison is given in Fig. 5, where the melt index 
(MI) distribution of five different production of the 
same product is shown. Hence, box plot has a 
significant effect on the response with respect to either 
location or variation and the box plot is also an effective 
tool for summarizing large quantities of information. 
 

8 9 .2

8 9 .4

8 9 .6

8 9 .8

9 0

9 0 .2

9 0 .4

9 0 .6

9 0 .8

9 1

9 1 .2

Te
m

p.
 [°

C
]

 
Figure 4: Example of a cumulated distribution function of a process variable (T),  

the q0.25, q0.5, and q0.75 quintiles are also depicted  
 

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

 
Figure 5: Melt Index (MI) of five different production of the same product 

 
Quantile-quantile plot of process and product 
quality variables 

When there are two data samples, it is often desirable to 
know if the assumption of a common distribution is 
justified. If so, then location and scale estimators can 
pool both data sets to obtain estimates of the common 
location and scale. If two samples do differ, it is also 
useful to gain some understanding of the differences. 
The Quantile-quantile (q-q) plot can provide more insight 

into the nature of the difference than analytical methods 
such as the chi-square and Kolmogorov-Smirnov  
2-sample tests [5,9].  

A q-q plot is a plot of the quantiles of the first data 
set against the quantiles of the second data set. Both 
axes are in units of their respective data sets. That is, the 
actual quantile level is not plotted. For a given point on 
the q-q plot, we know that the quantile level is the same 
for both points, but not what that quantile level actually 
is. If the data sets have the same size, the q-q plot is 
essentially a plot of sorted data set A against sorted data 


90 

set B. If the data sets are not of equal size, the quantiles 
are usually picked to correspond to the sorted values 
from the smaller data set and then the quantiles for the 
larger data set are interpolated.  

A diagonal reference line is also plotted. If the two 
sets come from a population with the same distribution, 
the points should fall approximately along this reference 
line. The greater the departure from this reference line, 
the greater the evidence for the conclusion that the two 
data sets have come from populations with different 
distributions. If the two data sets come from populations 
whose distributions differ only by a shift in location, the 
points should lie along a straight line that is displaced 
either up or down from the diagonal reference line. The 
q-q plot is similar to a probability plot, where the 

quantiles for one of the data samples are replaced with 
the quantiles of a theoretical distribution.  

The q-q plot can be used to answer the following 
questions: Do two data sets come from populations with 
a common distribution? Do two data sets have common 
location and scale? Do two data sets have similar 
distributional shapes? Do two data sets have similar tail 
behaviour?  

These questions arise at the qualitative analysis of 
historical databases of production systems. Firstly, an 
example for comparison of two production of two 
different productions of the same product is given in the 
first column of Fig. 6. 

 
Figure 6: Example of a quantile-quantile plots of process variable distributions related to two different productions 

(first column: same products, second column: different products). 
 

This plot shows that the distributions of the 
temperature are different, while the distributions of the 
concentrations are much more similar to each other. 
This difference is much bigger if the temperature related 
to the production of two different products are compared 
(see the second column of Fig. 6). The difference 
between the production of the same and different 
products is much more characteristic if we compare the 
distributions of the quality properties (see Figs 7 and 8). 

This small application example suggests that quantile-
quantile plots can be effectively used to compare 
different productions. 

Another type of application is given in Figs 9 and 10, 
where the similarities between the distributions of 
different process and quality variables are analysed. This 
analysis could be extremely useful to detect dependencies 
between the operating parameters of the process.  

Based on the application of the proposed tools and 
the analysis of the presented figures several rules have 
been extracted. Most of these rules found to be useful 
for our industrial partner, since the extracted knowledge 
and the resulted plots can be effectively used to 
summarise trends of the process variables and estimate 
the quality of the products. 

 
91

 
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1

-0.5

0

0.5

1

1 - MI

2 
- 

M
I

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1

-0.5

0

0.5

1

1 - ro

2 
- 

ro

 
Figure 7: Example of a quantile-quantile plots of quality (Melt index - MI polymer density – ro)  

distributions related to two different productions of the same product 
 

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1

-0.5

0

0.5

1

1 - MI

2 
- 

M
I

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-1

-0.5

0

0.5

1

1 - ro

2 
- 

ro

 
Figure 8: Example of a quantile-quantile plots of quality (Melt index - MI polymer density – ro)  

distributions related to two productions of different products 
 

92 

  
Figure 9: Example of a qunatile-qunatile plot of a process and quality variable distributions  

related to the same production of a product. 
 

Figure 10: Qunatile-qunatile plot of a process variable distributions  

related to the same production of a product. 
 

The behaviour of the control algorithm of the 
advanced model-based control system can be also 
identified from the analysis of these plots. E.g. since the 
density of the slurry in the reactor (roz) is controlled by 
the hexene concentration (C6), these two variables have 
similar distributions as it is shown in Fig. 10. Furthermore, 
the ratio of C2-C6 is also controlled, which makes the 
behaviour of these two variables also similar. Because 
of these relations, the C2-roz distributions become also 
similar.  

The quantile-quantile plot of the production rate 
(PE) and ethylene concentration (C2) is also close to a 
straight line. This is because the highest ethylene 
concentration results in highest reaction speed. The 
previously presented rules are only illustrative, but they 
show that the proposed tools can be effectively used to 

detect relationships between process and product quality 
variables and compare different productions 

Conclusions 

The paper presented a brief survey of simple Exploratory 
Data Analysis procedures that have been found to be 
particularly useful in the qualitative analysis of historical 
databases of production systems. It has been showed 
that box plot is an important EDA tool for determining 
if a factor has a significant effect on the response with 
respect to the quality of a given productions. To analyse 
the relationships between different production, different 
products, and different operating variables quantile-
quantile plots are proposed. 


93

ACKNOWLEDGEMENTS 

The authors would like to acknowledge the support of 
the Cooperative Research Center (VIKKK) (project 
KKK-II-1A), and founding from the Hungarian Ministry 
of Education (FKFP-0073/2001). János Abonyi is 
grateful for the financial support of the János Bolyai 
Research Fellowship of the Hungarian Academy of 
Science and OTKA (Hungarian National Research 
Foundation), No. T037600. The support of our industrial 
partners at TVK Ltd., especially Miklós Németh, Lóránt 
Bálint and dr. Gábor Nagy is gratefully acknowledged. 
 

REFERENCES 

1. LAKSHMINARAYANAN S., FUJII H., GROSMAN B., 
DASSAU E., LEWIN D. R.: New product design via 
analysis of historical databases, Computers and 
Chemical Engineering 24 (2000) 671-676 

2. YAMASHITA Y.: Supervised learning for the analysis 
of the process operational data, Computers and 
Chemical Engineering 24 (2000) 471-474 

3. MACGREGOR J. F., KOURTI T: Statistical process 
control of multivariate processes, Control Eng. 
Practice, Vol.3, No. 3, (1995) 403-414 

4. MARTIN E. B., MORRIS A. J., PAPAZOGLOU M. C., 
KIPARISSIDES C.: Batch process monitoring for 
consistent production, Computers Chem. Eng. Vol. 
20. (1996), pp. S599-S605 

5. PEARSON R .K.: Exploring Process Data, Journal of 
Process Control, 11, (2001), 179-194  

6. TUKEY J.: Exploratory Data Analysis, Addison-
Wesley, (1977) 

7. MOSTELLER F., TUKEY J.: Data Analysis and 
Regression, Addison-Wesley, (1977)  

8. VELLEMAN P., HOAGLIN D.: The ABC's of EDA: 
Applications, Basics, and Computing of Exploratory 
Data Analysis, Duxbury, (1981) 

9. MILITKÝ J., MELOUN M.: Some graphical aids for 
univariate exploratory data analysis, Analytica 
Chimica Acta, 277(2), (1993), 215-221 

10. NAGY G.: The polyethylene, Magyar Kémikusok 
Lapja (MKL), 52(5), (1997) 233-242, In Hungarian  

11. JEACKLE C. M., MACGREGOR J. F.: Product design 
through multivariate statistical analysis of process 
data, American Institute of Chem. Eng. J., 44(5) 
(1998) 1105-1118  

12. CHAMBERS J., CLEVELAND W., KLEINER B., TUKEY P.: 
Graphical Methods for Data Analysis, Wadsworth, 
(1983)