Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3310  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

Uterine EMG Signals Spectral Analysis for Pre-Term 
Birth Prediction 

 
Markos G. Tsipouras 
Department of Informatics and Telecommunications Engineering,  

University of Western Macedonia 
Kozani, GR47100, Greece 

mtsipouras@uowm.gr 
 

Abstract—A methodology for prediction of pre-term births is 
presented in this paper. The methodology is based on the analysis 
of EHG signals and data mining techniques. Initially, spectral 
and non-linear characteristics of the EHG are extracted, forming 
a pattern that is used to train a classifier to discriminate between 
term and pre-term cases. The method has been tested using a 
benchmark EHG database, and the obtained results indicate its 
effectiveness in accurate pre-term/term labour prediction.  

Keywords-preterm delivery; electrohysterogram; EHG signal 
processing; uterine electromyogram; prediction  

I. INTRODUCTION  
According to the world health organization (WHO) 15 

million babies are delivered before the 37th week of gestation 
every year, being more than 1 in 10 cases [1]. Premature birth 
is the leading cause of death in children under 5 years, while 
premature babies develop lifetime disabilities, including 
learning disabilities and visual and hearing problems [2]. In 
addition, premature births create an enormous economic impact 
on the health care systems, estimated at 3 billion £ in England 
and Wales [3], and continues to be significantly higher for pre-
term born children in the first 5 [4] and 10 [5] years of life. 
Thus, developing techniques for premature birth prediction is 
an important task that have gained attention in the past decades, 
since accurate prediction of premature labour can lead to their 
significant limitation. Premature birth prediction has been 
mainly based on calculating risk factors, largely associated 
with the medical condition of the mother, including diabetes, 
hypertension, smoking, abnormalities of the uterus and others 
[6]. An alternative approach that have gained significant 
interest is the analysis of the electrohysterogram (EHG), which 
is the electromyogram (EMG) of the uterus, thus recording the 
muscular activity of the uterine during gestation. Being non-
invasive (recorded on the abdominal surface) and simple to 
obtain, EHG have proven to be a significant monitoring tool 
during pregnancy and labour [7].  

Moving a step forward, several studies have successfully 
used EHG signal to forecast pre-term labour [8-14]. These 
approaches mainly focus on the EHG signal processing, by 
extracting several time, frequency and/or non-linear 
characteristics from it, and then applying statistical analysis [8, 
12] or classification techniques [9-11, 13-14] for pre-term/term 

cases discrimination. In this study, a methodology for EHG 
signal analysis for predicting premature births is presented. The 
methodology is based on the analysis of the EHG signal, 
focusing on its spectral and non-linear characteristics, while the 
prediction is made using intelligent techniques. A benchmark 
EHG database has been used for the development and 
validation of the methodology, and the obtained results indicate 
its effectiveness in accurate pre-term labour prediction. 

II. RELATED WORK 

A. Database 
The Term-Pre-term ElectroHysteroGram DataBase 

(TPEHG DB) [8] is a collection of EHG records (uterine EMG 
records), obtained at the University Medical Centre Ljubljana 
from 1997 to 2005. The records were obtained around the 22nd 
week of gestation or around the 32nd week of gestation, during 
regular check-ups. The database contains 300 records from 
equal number of pregnancies, with 262 records obtained from 
pregnancies with on-term delivery (pregnancy duration > 37 
weeks) and 38 records obtained from pregnancies which ended 
prematurely (pregnancy duration ≤ 37 weeks). Also, 162 
records obtained before the 26th week of gestation (being 143 
term and 19 pre-term) while 138 obtained during or after the 
26th week of gestation (with 119 term and 19 pre-term). For 
each recording, 4 electrodes were used: E1, placed 3.5 cm to 
the left and 3.5 cm above the navel; E2, placed 3.5 cm to the 
right and 3.5 cm above the navel; E3 placed 3.5 cm to the right 
and 3.5 cm below the navel; E4, placed 3.5 cm to the left and 
3.5 cm below the navel. The differences in the electrical 
potentials of the electrodes were recorded, producing 3 
channels: S1=E2–E1; S2=E2–E3; S3=E4–E3. The sampling 
frequency was 20Hz, with each record having duration of 30 
minutes with a 16-bit resolution over a ±2.5 millivolts range. 
An example of EHG recording (all channels) is illustrated in 
Figure 1. 

B. Realted works presented in the literature 
Authors in [8] presented a comparison of various linear and 

non-linear features extracted from EHG to separate term and 
pre-term records. Their analysis included filtering of the signals 
in several different frequency bands, being 0.08–4 Hz, 0.3–4 
Hz and 0.3–3 Hz, and extraction of several features, such as 


Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3311  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

root mean square, median frequency of the signal power 
spectrum, autocorrelation zero-crossing, maximal Lyapunov 
exponent, correlation dimension and sample entropy. Student’s 
t-test was used to evaluate the features for their ability to 
separate term and pre-term groups.  

 
Fig. 1.  EHG recording: S1, S2 and S3 chanels. 

Authors in [9] focused on the analysis of the channel 3 of 
the EHG signals, filtered in the 0.34–1Hz band. Root mean 
squares, peak and median frequency, and sample entropy 
features were extracted from the signals and several different 
classification algorithms were employed, being linear 
discriminant (LDA), quadratic discriminant (QDA), 
uncorrelated normal density, polynomial, logistic, k nearest 
neighbours (kNN), decision tree (DT), Parzen and support 
vector machines (SVM). The dataset was oversampled, using 
the Synthetic Minority Over-Sampling Technique (SMOTE) to 
generate 262 pre-term records, using the 38 already available 
records. Monte Carlo cross-validation (MCCV) (80% holdout, 
100 iterations) and the N-fold cross validation (with N=5) 
techniques were applied. Authors in [10] employed neural 
networks (NNs) for the same problem. In this case the signals 
were filtered in the 0.34 to 1Hz band, and root mean square, 
median frequency, peak frequency and sample entropy features 
were extracted. The dataset was oversampled using the same 
technique as in [9] (SMOTE) to generate 262 pre-term signals 
from the 38 already available. Classification was performed 
using six different NN classifiers (backpropagation feed-
forward NN, Levenberg–Marquardt feed-forward NN, 
automatic NN, radial basis function NN, random NN, 
perceptron linear classifier). MCCV (80% holdout, 30 
iterations) and 5-fold cross-validation techniques were 
employed to validate the obtained results. Following the same 
methodology of [9, 10], authors in [11] analysed channel S3 of 
the EHG signals, filtered in the 0.34–1Hz band, using the same 
features. Again, several classifiers are tested, including self-
organized network Inspired by the immune algorithm (SONIA) 
and fuzzy SONIA. the pre-term class is oversampled using min 
and max to produce 262 pre-term records. Cross-validation was 
performed using 60% of the data for training, 25% for 
validation and 15% for testing (30 iterations). 

Authors in [12] presented a methodology for classification 
of EHG records using median frequencies of power spectra and 
sample entropy. In their study, filtering of the signal using 
several different frequency sub-bands was tested (0.08–4 Hz; 
0.3–4 Hz; 0.3–3 Hz; 0.34–1 Hz). The median frequency 
(frequency value for which the sums of the power above and 
below it, are equal) of the mean power spectrum, estimated by 
the adaptive autoregressive (AAR) method with recursive least 
squares (RLS) algorithm. Also, sample entropy of each signal 
was calculated. Furthermore, clinical information available for 
the EHG records (age, parity, abortions, weight, hypertension, 
diabetes, placental position, bleeding in the first and second 
trimester, funnelling, smoking) was employed. The tested 
classifiers included kNN, LDA, QDA, SVM and DT while 
SMOTE was employed to generate synthetic data so as to 
balance the two classes. Authors in [13] extracted several time 
and spectral features from the signal (integrated EMG, mean 
absolute value, simple square integral value, wavelet length, 
log detector, root mean square value, variance, difference 
absolute standard deviation value, maximum fractal length, 
average amplitude change, peak frequency, median frequency). 
Then, feature analysis was performed based on statistical 
significance, several linear discriminant analysis techniques 
and Gram-Schmidt analysis, resulting to the four most 
discriminating features, which were used as input to a 
classifier. Several artificial NN classifiers were tested, 
including back-propagation feed-forward NN, Levenberg–
Marquardt feed-forward NN, perceptron linear classifier, radial 
basis function NN, random NN, voted perceptron and 
discriminative restricted Boltzmann machine. As in [9], the 
dataset was oversampled SMOTE, to generate 262 pre-term 
records (using the existing 38 records). Cross-validation was 
applied using the MCCV technique (80% holdout, 30 
iterations) and the 5-fold technique. 

Authors in [14] presented a methodology for prediction of 
pre-term delivery based on empirical mode decomposition 
(EMD) combined with wavelet packet decomposition (WPD) 
of the EHG signals. The signals were filtered with cutoff 
frequencies of 0.3–3 Hz, and EMD was performed up to 11 
levels to obtain the intrinsic mode functions, which were 
further analysed with 6-level WPD. From the coefficients 
obtained after the analysis, eight features are extracted (fractal 
Dimension, fuzzy entropy, interquartile range, mean absolute 
deviation, mean energy, mean Teager-Kaiser Energy, sample 
entropy, standard deviation) and particle swarm optimization 
(PSO) method for feature selection was applied. Adaptive 
synthetic sampling approach (ADASYN) was applied to 
increase the number of pre-term signals from 38 to 244. 
Several different classifiers were tested, such as linear 
discriminant analysis, quadratic discriminant, analysis, decision 
tree, kNN, radial basis, and SVM, while 5-fold and 10-fold 
cross-validation techniques were employed. 

III. METHODOLOGY 
In this study, the EHG signals are analysed with respect to 

their spectral characteristics. Initially a set of filters are used to 
filter all channels of each recording to specific frequency sub-
bands, and then the fractional energy of each frequency sub-
band is calculated. Also, non-linear characteristics are extracted 


Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3312  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

from the signals. The extracted features are used to train a 
random forest classifier, to discriminate term and pre-term 
recordings according to their spectral behaviour. The flowchart 
of the proposed methodology is presented in Figure 2. 

 
Fig. 2.  Flowchart of the proposed methodology. 

A. EHG records 
The EHG records used in this study are obtained from the 

TPEHG DB. All available records (300) are included in the 
study, and from each record, all three channels (S1, S2 and S3) 
are analysed. The unfiltered signals are used. 

B. Signal Processing 
Initially each channel is filtered with a high pass filter with 

cut-off frequency of 0.3 Hz and subsequently with a low pass 
filter with cut-off frequency of 4Hz. Then, to access the 
spectral characteristics of each EHG record, a set of filters are 
used to filter each channel to a specific frequency sub-band. 
Several researchers have reported that the uterine electrical 
activity occurs in the 0Hz - 5Hz frequency band [8], with the 
majority laying at the <1Hz spectrum [9]. Furthermore, main 
respiration rates are <0.33 Hz. Based on the above, the 
frequency sub-bands used for this study are: 0.33Hz – 0.65Hz, 
0.65Hz – 1Hz, 1Hz – 2Hz, 2Hz – 3Hz and 3Hz – 4Hz, selected 
so as: (i) to exclude respiration rates, (ii) to additionally 
examine the energy distribution over the 0.33Hz – 1Hz 
frequency band, and (iii) to include information from frequency 
bands > 1Hz. The 0.33Hz – 1Hz band has been the sole focus 
of several studies [9-11, 13-14], however an attempt to look 
into this band with greater detail (i.e. by examining sub-bands) 
has never been attempted in the literature before. 

C. Features Extraction 

1) Fractional Energy  
After filtering all channels in the specific frequency sub-

bands, the energy of each of the filtered signals for all 
channels, is calculated. Based on the energy values, the 
fractional energy of each frequency sub-bad is calculated: _ ∑ 											 1  
where  is the channel (with 1, 2, 3 ,  is the 
frequency sub-band (with 0.3 	– 	0.65 ,	

0.65 	– 	1 , 1 	– 	2 , 2 	– 	3 , 3 	– 	4 , 
 is the energy of the channel  after filtered in the  

frequency sub-band and _  is the fractional 
energy of the channel  after filtered in the  sub-band.  

2) Spectral Entropy 
Spectral entropy (SpEn) is defined as the Shannon entropy 

of the power spectrum density. Thus, it is calculated for each 
channel of the EHG signal as follows:  

	 ∑ _ ∙ _ 			 2  
 is the spectral entropy of the channel  and  is the 

number of frequency sub-bands ( 5). 
3) Approximate Entropy 

Approximate Entropy (ApEn) can detect changes in the 
underlying signal behaviour, which are not directly related to 
peaks or amplitude variations [15].  is calculated for 
each sub-band  of each channel :  

	∑ 1 ∑ 			 3  	∑ 1 		 4  
where  is the EHG signal, ∙  is the Heaviside step function, ‖∙‖ is the Euclidian norm and  and  are parameters defined 
as 0.15  and 2  chosen based on the results of 
previous studies [15] for good statistical validity of ApEn. 

4) Features Vector 
For each channel, five fractional energies, spectral entropy 

and five ApEn values are calculated. Furthermore, a binary 
feature representing the week of recording is used ( ), being 0 
for records obtained before the 26th week of gestation and 1 for 
records obtained during or after the 26th week of gestation. 
Thus, the feature vector for each record includes 12 features for 
a single channel, and 34 features for all channels. 

D. Classification 
The database contains 300 records, with 262 term and 38 

pre-term records. Since this unbalance between the number of 
data in the two classes has a major impact in classification 
algorithms, all researchers employing classification have used 
data balancing techniques (SMOTE, ADASYN, min&max), 
mainly by creating artificial data from the existing to boost the 
number of data in the minority class. To address this issue in 
this study (i) the number of pre-term data is increased by 
repeating all pre-term data twice, and thus the final number of 
data used in the classification is 262 term and 76 pre-term, and 
(ii) an appropriate classifier is selected.  

Classification is performed using the Random Forest (RF) 
classifier [16], which is an ensemble learning technique based 
on the construction of multiple DTs using sub-sets of the initial 
dataset. RF is constructed with 100 DTs. The selection of this 
classifier is based on the basic characteristic of the dataset, i.e. 
being unbalanced, since RF have been reported to appropriate 
for handling unbalanced datasets, against many other well-


Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3313  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

known classifiers (such as neural networks) [17]. Furthermore, 
RF performs two feature selection steps, an initial random 
selection in the feature bagging step and a subsequent selection 
of the feature with the highest normalized information gain for 
each tree node, thus it can cope with the large number of 
extracted features (34 when all channels are used) without the 
need of an additional feature selection technique. The 10-fold 
stratified cross-validation technique has been employed in the 
classification process of this study. Thus, the dataset has been 
divided into 10 equally sized subsets, and then nine of them are 
used to train the classifier, and the final for test. Special care 
has been made so as not to have data from the same subject 
into both training and test phases. Thus, both records of pre-
term data (since each pre-term record is repeated twice), are 
included in the same fold.  

IV. RESULTS 
In order to assess the spectral characteristics of each 

channel of the EHG signals, the features from each channel are 
initially used individually to predict the term/pre-term 
pregnancy outcome. The obtained results are in term of 
sensitivity, positive predictive value (PPV) and classification 
accuracy, and they are presented in Table I. Individual channel 
S3 obtained the best results, however the employment of 
features from all 3 channel has beneficial effect on the 
classification results. 

TABLE I.  RESULTS  

Channel Class Sensitivity (%) PPV (%) 
Classification 
Accuracy (%) 

S1 
Term 95.00% 98.80% 

95.24% 
Pre-term 96.05% 84.88% 

S2 
Term 95.04% 99.20% 

95.56% 
Pre-term 97.37% 85.06% 

S3 
Term 95.80% 99.21% 

96.15% 
Pre-term 97.37% 87.06% 

all 
Term 98.09% 99.23% 

97.93% 
Pre-term 97.37% 93.67% 

 
V. DISCUSSION 
In the proposed methodology for EHG analysis and 

term/pre-term classification, a comparison among the results 
obtained from the features extracted from each channel 
separately and their combination, is presented. The obtained 
results indicate that analysing features extracted from all 
channels (S1, S2 and S3) has a positive impact; although 
results for term class PPV are almost unaffected and the same 
applies for the pre-term class sensitivity, the sensitivity of the 
term class increased by more than 2%, while the PPV result for 
the pre-term class increased by more than 6%. The same 
applies for the classification accuracy results, where the value 
for features from all 3 channels is 97.93%, with the best 
obtained result for an individual channel being 96.15% 
(obtained for channel S3). Most of the works presented in the 
literature are based on the analysis of only the S3 channel [9-
11, 13], since it has been reported to be the most informative 
[8]. This has been confirmed with the findings of this study, 
since channel S3 produced the best results among individual 
channel analysis.  

However, the employment of features from all three 
channels in the analysis lead to significantly more accurate 
prediction of pre-term births; this has not been reported in the 
literature before. Furthermore, the 0.33Hz – 1Hz frequency 
sub-band has been the focus of several studies [9-11, 13-14], 
without however attempting to examine the energy distribution 
within this band. Also, frequency bands > 1 Hz have been 
excluded from these studies, although there has been evidence 
that they contain important information [12]. In this work the 
frequency sub-bands have been selected so as to lift both of 
these limitations by: (i) examine in greater detail the most 
informative sub-band (0.33Hz–1Hz), and (ii) include several 
frequency sub-bands > 1Hz in order to access information in 
higher frequencies. A comparative study of similar methods 
presented in the literature is presented in Table II.  

All methods have been developed and tested using the same 
database, being the TPEHG DB, however having different 
classification datasets, since different data balancing techniques 
have been employed. Most researches employed classification 
techniques [9-11, 13-14] while also statistical analysis of the 
extracted features has also been presented. The proposed 
methodology compares well with other approaches presented in 
the literature, obtaining 97.4% sensitivity, 98.1% specificity, 
93.7% PPV, 99.2% negative predictive value (NPV) and 97.9% 
classification accuracy. These results are the best reported so 
far, improving the best previously reported sensitivity [14] by 
2.3% and the respective classification accuracy by 1.6%. The 
differences are higher with other previously reported results (up 
to 8.4% for sensitivity, 19.1% for specificity, 3.7% for PPV, 
9.2% for NPV and 10.9% for classification accuracy). 
However, it should be noted that although the TPEHG DB was 
employed in all cases, the classification datasets significantly 
differ since large number of synthetic data are generated in 
several cases [9-14], while the size of the classification dataset 
also varies. The issue of balance in the dataset has been 
reported by all researchers employing classification techniques 
[9-11, 13-14], with data balancing techniques being employed 
in all cases. SMOTE has been employed in several works [9-
10, 12-13] to generate additional pre-term data; 262 pre-term 
cases are generated from the 38 already available. Min&max 
was employed in [12] to produce 262 pre-term records, while 
in [14] ADASYN was applied to increase the number of pre-
term signals from 38 to 244. In both cases, the number of 
generated data is many times larger than the number of existing 
data (for the pre-term class), thus significantly disturbing the 
term/pre-term ratio in the initial dataset, which is 12.67% pre-
term and 87.33% term, closely related to the actual reported 
percentage [1]. Furthermore, the obtained results regarding the 
pre-term class are mainly (or solely) based in synthetic data.  

In this study, no synthetic data are generated or employed 
in the classification. Instead, to address the unbalanced dataset 
issue, all pre-term data are included twice in the dataset 
(resulting to 76 pre-term records). Thus, there are no artificial 
generated data and the obtained results concern only real cases. 
Also, the term/pre-term ratio has been kept to values close to 
the actual recorded ratio (22.48% pre-term and 77.51% term 
records in the classification dataset). Furthermore, the folds 
selection has been made so as this repetition not to disturb the 
validation procedure by having the same record into both 


Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3314  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

training and test phases; each pair of repeated pre-term data is included in a single fold.  

TABLE II.  COMPARATIVE STUDY OF METHODS PRESENTED IN THE LITERATURE, USING THE TPEHG DB  

Reference 
Channel/ 
frequency 

Analysis/Features Analysis/ Classification Data Balancing 
Results 

Validation 
Results 

[8] 

All /  
0.08-4Hz, 
0.3-4Hz, 
0.3-3 Hz 

Linear and non-linear/ 
root mean square, median 

frequency of the signal power 
spectrum, autocorrelation zero-
crossing, maximal Lyapunov 

exponent, correlation 
dimension and sample entropy. 

Student’s t-test    

[9] 
Ch. 3 /  

0.34-1Hz 

Root mean square, median 
frequency, peak frequency, sample 

entropy 

9 classifiers/ 
LDA, QDA, uncorrelated 

normal density, polynomial, 
logistic, kNN, DT, Parzen, 

SVM 

SMOTE  
(generated 262 

pre-term records) 

MCCV 
 (80% holdout,  
100 iterations) 

 
5-fold CV 

Sensitivity: 96.7% 
Specificity: 90.0% 

[10] 
Ch. 3 /  

0.34-1Hz 

Root mean square, median 
frequency, peak frequency, sample 

entropy 

6 NN classifiers/ 
backpropagation feed-

forward NN, Levenberg-
marquardt feed-forward NN, 
automatic NN, radial basis 
function NN, random NN, 
perceptron linear classifier 

SMOTE  
(generated 262 

pre-term records) 

MCCV 
 (80% holdout,  
30 iterations) 

 
5-fold CV 

Sensitivity: 96.1%  
Specificity: 91.9% 

[11] 
Ch. 3 /  

0.34-1Hz 

Root mean square, median 
frequency, peak frequency, sample 

entropy 

7 classifiers/ 
SONIA network. MLP, 
Fuzzy-SONIA, kNN, 

Dynamic Self- 
organising Multilayer 

network, decision tree, SVM 

Min & Max  
(produce 262 pre-

term records) 

Training: 60%, 
Validation: 

25%, Testing: 
15%  

(30 iterations) 

Sensitivity: 89% 
Specificity: 91% 
PPV: 90% 
NPV: 90% 
Accuracy: 90% 

[12] 
All / 

0.34-1Hz 
0.3-4Hz 

AAR/ median frequency, sample 
entropy, clinical information (age, 

parity, abortions, weight, 
hypertension, diabetes, placental 
position, bleeding in the first and 

second trimester, funnelling, 
smoking) 

5 classifiers/ 
kNN, LDA, QDA,  

SVM and DT  
SMOTE  

Sensitivity: 96% 
Specificity: 79% 
Accuracy: 87% 

[13] 
Ch. 3 /  

0.34-1Hz 

Integrated EMG, mean absolute 
value, simple square integral value, 
wavelet length, log detector, root 

mean square value, variance, 
difference absolute standard 

deviation value, maximum fractal 
length, average amplitude change, 
peak frequency, median frequency 

7 NN classifiers/ 
back-propagation feed-

forward NN, Levenberg–
Marquardt feed-forward NN, 
perceptron linear classifier, 
radial basis function NN, 

random NN, voted 
perceptron, discriminative 

restricted Boltzmann machine 

SMOTE  
(generated 262 

pre-term records) 

MCCV 
 (80% holdout, 
30 iterations) 

 
5-fold CV 

Sensitivity: 91%, 
Specificity: 84% 

[14] 
Ch. 3 /  

0.3-1Hz 

EMD (11 levels) & WPD (6 levels) / 
fractal Dimension, fuzzy entropy, 

interquartile range, mean absolute 
deviation, mean energy, mean 
Teager-Kaiser Energy, sample 

entropy, standard deviation 

6 classifiers/ 
linear discriminant 

analysis, quadratic 
discriminant analysis, 

decision tree, kNN, radial 
basis functions, SVM 

ADASYN  
(increase pre-term 
signals from 38 to 

244) 

5-fold and 10-
fold CV 

Sensitivity: 95.1% 
Specificity: 97.3% 
Accuracy: 96.3% 

this study 
All /  

0.33-5Hz 
Fractional energies, SpEn, ApEn Random Forests 

Resampling  
(increase pre-term 
signals from 38 to 

76) 

10-fold CV 

Sensitivity: 97.4% 
Specificity: 98.1% 
PPV: 93.7% 
NPV: 99.2%  
Accuracy: 97.9% 

 
Most of the studies presented in the literature experimented 

with several classification algorithms to identify the most 
appropriate. In this study RF classifier has been selected, since 
it has been reported to be able to deal with unbalanced datasets 
[17]. To validate this selection, several well-known classifiers 
are additionally tested, including kNN, neural networks, SVM, 
NNs and deep NNs. All alternative classification algorithms 
presented significantly lower results, with the neural 

approaches (both classical and deep) constantly classifying 
samples biased to the larger (term) class.  

VI. CONCLUSIONS 
A methodology for premature birth prediction, based on 

EHG analysis, is presented in this work. The methodology has 
been evaluated using a publicly available database, and the 
obtained results indicate its ability for accurate pre-term labour 
prediction. The results clearly demonstrate that using 


Engineering, Technology & Applied Science Research Vol. 8, No. 5, 2018, 3310-3315 3315  
  

www.etasr.com Tsipouras: Uterine EMG Signals Spectral Analysis for Pre-Term Birth Prediction 
 

information from all 3 recorded channels (instead of only one, 
which is common practice in the literature) and assessing 
frequency sub-bands >1Hz (usually not included in the analysis 
in the literature) leads to improved classification accuracy. 
Furthermore, the combination of spectral and non-linear 
characteristics carries sufficient information for this task. 
During the validation procedure, only real data were used (no 
synthetic recordings were included), while the term/pre-term 
data ratio was kept close to the one reported from WHO, thus 
the obtained results are more robust. The methodology 
achieved higher sensitivity and classification accuracy than the 
best presented in the literature so far. Future work will mainly 
focus on the application of the methodology on different data, 
so as to validate its findings into larger and multicentre 
datasets. 

REFERENCES 
[1] March of Dimes, PMNCH, Save the Children, WHO. Born Too Soon: 

The Global Action Report on Preterm Birth. Eds CP Howson, MV 
Kinney, JE Lawn. World Health Organization. Geneva, 2012 

[2] L. Liu, S. Oza, D. Hogan, Y. Chu, J. Perin, J. Zhu, J.E. Lawn, S. 
Cousens, C. Mathers, R.E. Black, “Global, regional, and national causes 
of under-5 mortality in 2000–15: an updated systematic analysis with 
implications for the Sustainable Development Goals”, Lancet, Vol. 388, 
pp. 3027-35, 2016 

[3] L. J. Mangham, S. Petrou, L. W. Doyle, E. S. Draper, N. Marlow, “The 
cost of preterm birth throughout childhood in England and Wales”, 
Pediatrics, Vol. 123, No. 2, pp. e312-e327, 2009 

[4] S. Petrou, Z. Mehta, C. Hockley, P. Cook-Mozaffari, J. Henderson, M. 
Goldacre, “The impact of preterm birth on hospital inpatient admissions 
and costs during the first 5 years of life”, Pediatrics, Vol. 112, No. 6, pp. 
1290-7, 2003 

[5] S. Petrou, “The economic consequences of preterm birth duringthe first 
10 years of life”, BJOG: an International Journal of Obstetrics and 
Gynaecology, Vol. 112, No. S1, pp. 10-15, 2005 

[6] J. D. Iams, “Prediction and early detection of preterm labor”, The 
American College of Obstetricians and Gynecologists, Vol. 101, No. 2, 
pp. 402–412, 2003 

[7] C. Buhimschi, M. Boyle, R. Garfield, “Electrical activity of the human 
uterus during pregnancy as recorded from the abdominal surface”, 
Obstetrics & Gynecology, Vol. 90, pp. 102–111, 1997 

[8] G. Fele-Zorz, G. Kavsek, Z. Novak-Antolic, F. Jager, “A comparison of 
various linear and non-linear signal processing techniques to separate 
uterine EMG records of term and pre-term delivery groups”, Medical & 
Biological Engineering & Computing, Vol. 46, pp. 911-922, 2008  

[9] P. Fergus, P. Cheung, A. Hussain, D. Al-Jumeily, C. Dobbins, S. Iram, 
“Prediction of preterm deliveries from ehg signals using machine 
learning”, PLOS ONE, Vol. 8, No. 10, Art. No. e77154, 2013 

[10] I. O. Idowu, P. Fergus, A. Hussain, C. Dobbins, H. Al-Askar, “Advance 
artificial neural network classification techniques using EHG for 
detecting preterm births”, 8th International Conference on Complex, 
Intelligent and Software Intensive Systems, Birmingham City 
University, Birmingham, UK, July 2-4, 2014 

[11] A. J. Hussain, P. Fergus, H. Al-Askar, D. Al-Jumeily, F. Jager, 
“Dynamic neural network architecture inspired by the immune algorithm 
to predict preterm deliveries in pregnant women”, Neurocomputing, Vol. 
151, pp. 963–974, 2015  

[12] A. Smrdel, F. Jager, “Separating sets of term and pre-term uterine EMG 
records”, Physiological Measurement, Vol. 36, pp. 341–355, 2015 

[13] P. Fergus, I. Idowu, A. Hussain, C. Dobbins, “Advanced artificial neural 
network classification for detecting preterm births using EHG records”, 
Neurocomputing, Vol. 188, pp. 42–49, 2016 

[14] U. R. Acharya, V. K. Sudarshan, S. Q. Rong, Z. Tan, C. M. Lim, J. E. 
W. Koh, S. Nayak, S. V. Bhandary, “Automated detection of premature 
delivery using empirical mode and wavelet packet decomposition 

techniques with uterine electromyogram signals”, Computers in Biology 
and Medicine, Vol. 85, pp. 33-42, 2017  

[15] S. M. Pincus, A. L. Goldberger, “Physiological time-series analysis: 
what does regularity quantify?”, American Journal of Physiology, Vol 
266, pp. H1643-H1656, 1994 

[16] L. Breiman, “Random forests”, Machine Learning,  Vol. 45, No. 1, pp. 
5–32, 2001 

[17] I. Brown, C. Mues, “An experimental comparison of classification 
algorithms for imbalanced credit scoring data sets”, Expert Systems with 
Applications, Vol. 39, pp. 3446–3453, 2012 

 
AUTHOR PROFILE 

 
Markos G. Tsipouras was born in Athens, Greece, in 1977. He received the 
diploma degree in Computer Science from the University of Ioannina, Greece, 
in 1999, and M.Sc. and Ph.D degrees in computer science, in 2002 and 2008 
respectively, from the same department. He has participated in several 
European and National R&D projects as a researcher/developer and he has 
published 100 papers in peer-reviewed scientific journals and conference 
proceedings. Also, he has published 7 book chapters, and he has co-authored 
one book. His research interests include digital signal and image processing, 
medical informatics, artificial intelligence, fuzzy logic, data mining, decision 
support systems and expert systems.