Microsoft Word - 39-3193_s1_ETASR_V9_N6_pp5088-5092


Engineering, Technology & Applied Science Research Vol. 9, No. 6, 2019, 5088-5092 5088 
 

www.etasr.com Samad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … 

 

Performance Evaluation of Learning Classifiers of 

Children Emotions using Feature Combinations in the 

Presence of Noise 
 

Abdul Samad 

Department of Computer Science 
Hamdard University 
Karachi, Pakistan 

asamad23@gmail.com 

Aqeel-ur-Rehman 

Department of Computer Science 
Hamdard University 
Karachi, Pakistan 

aqeelrehman1972@gmail.com 

Syed Abbas Ali 

Department of Computer & Information 
Systems Engineering, NED University of 
Engineering & Technology, Pakistan 

saaj@neduet.edu.pk 
 

 

Abstract—Recognition of emotion-based utterances from speech 

has been produced in a number of languages and utilized in 

various applications. This paper makes use of the spoken 

utterances corpus recorded in Urdu with different emotions of 

normal and special children. In this paper, the performance of 
learning classifiers is evaluated with prosodic and spectral 

features. At the same time, their combinations considering 

children with autism spectrum disorder (ASD) as noise in terms 

of classification accuracy has also been discussed. The 

experimental results reveal that the prosodic features show 

significant classification accuracy in comparison with the spectral 

features for ASD children with different classifiers, whereas 

combinations of prosodic features show substantial accuracy for 

ASD children with J48 and rotation forest classifiers. Pitch and 

formant express considerable classification accuracy with MFCC 
and LPCC for special (ASD) children with different classifiers. 

Keywords-spoken utterances; special children; learning 

classifiers; noise; features 

I. INTRODUCTION AND RELATED WORK 

In the modern era of human-computer interaction, 
emotional speech recognition is a field of vast concern. SER 
has a great influence on human behavior and is a key point to 
build relations. Different emotions have their own 
characteristics that make it memorable in their own way [1]. 
Furthermore, “EmoChildRu” has been introduced [2] as the 
first child emotion database created to recognize speech and 
voice emotions from children’s behavior. Two child emotional 
speech examination probes are reported in the context of the 
corpus: grown-up audience members and programmed 
listeners. Automatic classification results are fundamentally the 
same as human discernment, despite the fact that the precision 
is underneath 55 % for both, demonstrating the trouble of child 
emotion recognition from speech under natural conditions. To 
improve the state of people with ASD, a few CAL procedures 
have been executed. Authors in [3] depict the investigation of 
fluctuated CAL strategies actualized to enhance the everyday 
life states of such individuals. Furthermore, the authors briefed 
about the CAL strategies involved in various applications 
improving correspondence, behavioral and social abilities of 
such special children. In [4], it was noticed that it is not easy to 

observe mental and emotional conditions in autism spectrum 
conditions (ASC) aspects. A technique was proposed to 
recognize their speech in [5] independently from the speaker. 
In this technique, MEL & BARK scale, Equivalent Rectangular 
Bandwidth in filter space along with gamma toned features 
were utilized at the front end, whereas at the back end, Fuzzy C 
Means (FCM), Multivariate Hidden Markov Models (MHMM) 
and Vector Quantization (VQ) approaches were applied. 
Individual words and short sentences in Tamil language were 
used to evaluate the performance by three variants. The data of 
two speakers were tested against the features of eight speakers. 
A prominent real situational database was organized in [6] to 
detect fear thorough the feature classifier “interjection” through 
speech in extreme emotional and real world emergencies. 
MFCCs along with Support Vector Machine with variant 
interjections were utilized to categorize speech emotions. In 
[7], Urdu language has been taken to recognize emotions from 
primary age children. The authors used 3 different prosodic 
features with 5 different classifiers and four emotions. They 
reported that J-48 classifier achieved the highest accuracy. A 
profound architecture was utilized in [8], which uses a 
convolutional network for extricating space shared highlights 
and a long short-term memory network for arranging emotions 
utilizing space particular features. A complete cross-corpora 
exploration of different avenues regarding various speech 
emotions areas uncovers that transferable features give 
increases extending from 4.3% to 18.4% in speech emotions 
recognition. The fundamental of Deep Neural Networks 
(DNNs) is to perceive human emotions from a speech signal. 
Mel-frequency Cepstral Coefficient (MFCC) is selected as a 
one of the frequently used speech features from crude sound 
information. In next step, DNN nourished the extricated speech 
features to prepare the system. Also, a hand crafted database 
was presented improving the utilization of the system [9]. The 
work-related recognition, classification, emotion detection 
children with ASD is still an open topic of research, while 
researchers are now more concerned to help these children by 
making them realize the emotions in the real world. Under this 
scope, an autism-based game “emotify” has been developed 
[10]. It comprises two levels of difficulty and attempts to teach 
children about neutral, anger, sadness and happiness emotions. 

Corresponding author: Abdul Samad



Engineering, Technology & Applied Science Research Vol. 9, No. 6, 2019, 5088-5092 5089 
 

www.etasr.com Samad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … 

 

At the second level, children are helped in expressing their 
feelings which would be evaluated and examined. Machine 
learning approaches are exploited to develop a multilingual 
emotion recognition system. This paper evaluates the 
performance of learning classifiers when dealing with prosodic 
and spectral features and their combinations considering 
special (ASD) children as noise in terms of classification 
accuracy. 

II. LEARNING CLASSIFIERS AND FEATURES 

In daily routine conversations the prosodic features play a 
vital role [11]. The parameters used in expressing the speech to 
perceive the feelings of users are speech rate, length, pitch, 
formant, intensity, Mel frequency cepstrum coefficient 
(MFCC) and LPCC [12]. Two spectral features and three 
prosodic features (intensity, pitch, and formant) and their 
potential combinations are utilized in this research. 

• Pitch: Pitch and frequency are correlated. The analysis of 
every speech frame is obtained by their statistical values 
throughout the sample. These values [13] depict the clear 
picture of properties of audio parameters. 

• Intensity: Intensity demonstrates the prosodic feature 
encoding and the emotion based spoken utterance 
expressions [12]. 

• Formant: Formant is a critical recurrence segment of speech 
which gives quantifiable results of the consonants and 
vowels of the speech signal [12].  

Four learning classifiers were used in the experimental 
framework. The evaluation of the performance of these 
classifiers regards spectral and prosodic features and their 
combinations. The comprehensive description of the learning 
classifiers can be found in [14]. The classifiers are: 

• J48: A family of decision tree algorithms used to figure the 
feature vector for different examples. The classes for the 
recently produced events are being learned on the basis of 
the training examples. With the support of tree grouping 
calculation the elementary dispersion of the information is 
successfully justifiable [15].  

• Multi-Layer Perceptron (MLP): A class of deep artificial 
neural network that contains three layers at minimum. The 
first is the input layer while the last one is the output layer. 
The middle is a hidden layer and different MLPs can have 
various numbers of invisible layers [16].  

• Rotation forest: Rotation forest [15] eliminates randomly 
any subsets of classes, performs Bootstrap on the remaining 
data and finally performs PCA and establishes free decision 
trees.  

• Logit Boost Classifier: Boosting [16] works on the principal 
that a set of weak learners can be used to create a strong 
classifier. Logit Boost provides higher weights for 
misclassified classifiers. 

III. CORPUS COLLECTION AND RECORDING SPECIFICATION 

The corpus has been collected from both categories of 
children (normal and with ASD) in Urdu and it comprises of 

200 samples, equally divided for both cases. As per the 
research methodology, ASD children have been considered as 
noise in the experimental framework. The recording 
specification has been considered in standard conditions with 
Signal-to-Noise Ratio ≥ 45dB. Microsoft Windows 7 sound 
recorder has been utilized to record the emotion based spoken 
utterances of normal and special (ASD) children. The 
configuration is 16 bit, Mono, PCM with a testing rate of 
48KHz with Microphone hazard and awareness of 54dB±2dB 
and 2.2W separately, 3.5mm mash stereo and link length of 
1.8m. The choice of a spoken utterance incorporated these 
qualities a) semantically impartial, b) simple to investigate, c) 
reliable with any circumstance exhibited, and d) having 
comparable importance for every dialect. The sentence was: ― 
“Mujhe Khelna Hai” or “I have to play”. 

IV. EXPERIMENTAL RESULTS AND DISCUSSION 

The performance of learning classifiers is evaluated in the 
experimental framework for normal and special children 
making use of prosodic and spectral features and their 
combinations on spoken utterances recorded in Urdu. The 
corpus collection comprises of 200 spoken utterance samples in 
different emotions equally distributed in normal and special 
children. The experimental framework further classifies inter 
and intra feature combinations with four different classifiers 
(logit boost, MLP, J48 and rotation forest) with the following 
feature configurations: 1) separate prosodic and spectral 
features, 2) combinations of all three prosodic features 
(intensity, formant, and pitch) with two spectral features 
(LPCC and MFCC). The objective of the proposed framework 
is to identify the behavior of the four classifiers in a single 
feature or different combinations of spectral and prosodic 
features on spoken utterances of special (ASD) and normal 
children in terms of classification accuracy. The experimental 
results for both corpuses taken from normal and special (ASD-
treated as noise) children in Urdu follow. 

A. Prosodic Features 

Pitch demonstrates great precision in portraying the states 
of children under all four classifiers while classifying special 
children more precisely than normal children (Table I). The 
classification accuracy of rotation forest with prosodic feature 
pitch for ASD (noisy) children spoken utterances was 
significantly better than the accuracy of the other classifiers. 
Intensity also shows good classification accuracy for ASD 
children for all classifiers except from rotation forest. All 
learning classifiers demonstrate higher classification accuracy 
with formant for ASD children in comparison with normal 
children  

B. Spectral Features 

MFCC has significant accuracy with MLP and logit boost 
classifier in case of normal children, on the other hand, only 
rotation forest shows considerable classification accuracy for 
ASD children. LPCC has very good accuracy only from logit 
boost classifier in both normal and special children. The 
outcome demonstrates that in any study of LPCC the logit 
boost classifier ought to be utilized. 

 



Engineering, Technology & Applied Science Research Vol. 9, No. 6, 2019, 5088-5092 5090 
 

www.etasr.com Samad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … 

 

 
Fig. 1.  Prosodic features 

TABLE I.  CLASSIFICATION ACCURACY FOR PROSODIC FEATURES 

Feature Classifier 

ASD children 

classification 

accuracy (%) 

Normal children 

classification 

accuracy (%) 

Intensity 

MLP 81.3 65.6 

Logit boost 81.5 90.5 

Rotation forest 76.5 64.5 

J48 81.3 65.6 

Pitch 

MLP 80 71.4 

Logit boost 93.8 72 

Rotation forest 94.4 76.7 

J48 80 71.4 

Formants 

MLP 50 11 

Logit boost 90 78.6 

Rotation forest 99 63.2 

J48 98 60 

 

TABLE II.  CLASSIFICATION ACCURACY FOR SPECTRAL FEATURES 

Features Classifier 

ASD children 

classification 

accuracy (%) 

Normal children 

classification 

accuracy (%) 

MFCC 

MLP 64.7 85.7 

Logit boost 64.7 85.7 

Rotation forest 83.3 61.1 

J48 10 50 

LPCC 

MLP 50 10 

Logit boost 84.6 62.9 

Rotation forest 9 50 

J48 11 50 

 
Fig. 2.  Spectral features 

C. Inter Combination of Prosodic and Spectral Features 

The prosodic feature pitch shows significant accuracy with 
the combination of two other prosodic features. In combination 
with intensity, pitch illustrates considerable classification 
accuracy with logit boost and J48 for ASD (special) children. 
Pitch with formant performs well in classifying special (ASD) 
children with all classifiers except logit boost. In combination 
with intensity and formants, MLP and rotation forest show 
significant classification accuracy in comparison with the other 
two classifiers for ASD children. J48 has comparable accuracy 
in classifying special and normal children.  

 

 
Fig. 3.  Inter and intra combinations of features 

D. Intra Combination of Prosodic and Spectral Features 

In Intensity-MFCC, MLP performs better in classifying the 
normal children, while logit boost is considerably good in 



Engineering, Technology & Applied Science Research Vol. 9, No. 6, 2019, 5088-5092 5091 
 

www.etasr.com Samad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … 

 

classifying noisy spoken utterances for special children. In this 
case, rotation forest and J48 are performing averagely. 
Similarly, in Intensity-LPCC combination, the MLP is good 
and accurate. Nevertheless, logit boost more accurately 
classifies the normal children’s spoken utterances. Rotation 
forest and J48 perform averagely. 

TABLE III.  CLASSIFICATION ACCURACY FOR PROSODIC AND 
SPECTRAL FEATURES 

Feature 

combination 
Classifier 

ASD children 

classification 

accuracy (%) 

Normal children 

classification 

accuracy (%) 

Pitch, intensity 

MLP 88 91.3 

Logit boost 90 88 

Rotation forest 87.5 87.5 

J48 80 71.4 

Pitch, formant 

MLP 90.5 81.5 

Logit boost 88 91.3 

Rotation forest 90.9 84.6 

J48 98 60 

Intensity, 

formant 

MLP 85.7 77.8 

Logit boost 92 95.7 

Rotation forest 89.5 76 

J48 88.3 88.3 

MFCC, LPCC 

MLP 56.5 56 

Logit boost 93.8 71.9 

Rotation forest 98 57.1 

J48 11 50 

Pitch, MFCC 

MLP 95.2 85.2 

Logit boost 77.8 85.7 

Rotation forest 94.7 79.3 

J48 92.3 65.7 

Pitch, LPCC 

MLP 84.2 72.4 

Logit boost 89.5 75.9 

Rotation forest 85.7 77.8 

J48 92.3 65.7 

Intensity, 

MFCC 

MLP 98 80 

Logit boost 82.1 95 

Rotation forest 94.4 76.7 

J48 69.7 93.3 

Intensity, 

LPCC 

MLP 80.8 86.4 

Logit boost 87 84 

Rotation forest 85.7 77.8 

J48 69.7 93.3 

Formant, 

MFCC 

MLP 53.5 80 

Logit boost 98.4 75 

Rotation forest 98 57.1 

J48 97 60 

Formants, 

LPCC 

MLP 46.4 45 

Logit boost 94.1 74.2 

Rotation forest 97 61.5 

J48 98 60 

 

Table III provides the results of the learning classifiers with 
combinations of prosodic and spectral features for classifying 
the noisy spoken utterances in terms of classification accuracy. 
The most significant results can be observed of the 
combination of LPCC with pitch and formant in classifying the 
noisy (special children) utterances, whereas, pitch and formant 
with MFCC also show substantial classification accuracy for 
special (noisy) children. Other combinations such as intensity 
and LPCC and intensity with MFCC also provide considerable 
results in classification. 

V. CONCLUSION 

In this paper, the performance of learning classifiers has 
been evaluated considering prosodic and spectral features and 
their combinations for children with ASD in terms of 
classification accuracy. The experimental frame comprises four 
different classifiers with different inter and intra combinations 
of prosodic and spectral features. The experiments were 
conducted on a sample of 200 individuals equally taken from 
normal and children with ASD which were considered as noise. 
The conclusions of the experimental results are: 

• The spectral features show significant classification 
accuracy with prosodic features (pitch & formant) with 
rotation forest and J48 classifiers. 

• Separate analysis of the spectral and prosodic features 
reveals that the classification accuracy of prosodic features 
is considerably better than of spectral features. 

• The intra feature combinations of spectral features with 
pitch and formant demonstrate better classification accuracy 
for different classifiers. 

The authors are now focusing on developing an 
experimental framework to perform the same methodology 
with different emotions for evaluating the performance of 
classifiers to special children (ASD) in terms of classification 
accuracy. 

REFERENCES 

[1] S. Ramakrishnan, “Recognition of emotion from speech: A review”, in: 
Speech enhancement, modeling and recognition algorithms and 

applications, pp. 121-137, InTech, 2012 

[2] E. Lyakso, O. Frolova, E. Dmitrieva, A. Grigorev, H. Kaya, A. A. Salah, 

A. Karpov, “EmoChildRu: Emotional child russian speech corpus”, 
Lecture Notes in Computer Science, Vol. 9319, Springer, Cham, 2015 

[3] S. Dewan, A. Singh, L. Singh, S. Gautam, “Role of emotion recognition 

in computive assistive learning for autistic person”, Indian Journal of 
Science and Technology, Vol. 9, No. 48, 2016 

[4] O. Golan, Y. Sinai-Gavrilov, S. Baron-Cohen, “The Cambridge 

mindreading face-voice battery for children (CAM-C) complex emotion 
recognition in children with and without autism spectrum conditions”, 

Molecular Autism, Vol. 6, No. 1, Article ID 22, 2015 

[5] R. Arunachalam, Revathi, “A strategic approach to recognize the speech 
of the children with hearing impairment different sets of features and 

models”, in: Multimedia Tools and Applications, Springer, 2019 

[6] S. A. Yoon, G. Son, S. Kwon, “Fear emotion classification in speech by 
acoustic and behavioral cues”, Multimedia Tools and Applications, Vol. 

78, No. 2, pp. 2345-2366, 2019 

[7] S. Khan, S. A. Ali, J. Sallar, “Analysis of children’s prosodic features 
using emotion based utterances in Urdu language”, Engineering, 

Technology & Applied Science Research, Vol. 8, No. 3, pp. 2954-2957, 
2018 



Engineering, Technology & Applied Science Research Vol. 9, No. 6, 2019, 5088-5092 5092 
 

www.etasr.com Samad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … 

 

[8] A. Marczewski, A. Veloso, N. Ziviani, “Learning transferable features 
for speech emotion recognition”, Thematic Workshops of ACM 

Multimedia, Mountain View, USA, October 23-27, 2017 

[9] M. F. Alghifari, T. S. Gunawan, M. Kartiwi, “Speech emotion 
recognition using deep feedforward neural network”, Indonesian Journal 

of Electrical Engineering and Computer Science, Vol. 10, No. 2, pp. 
554-561, 2018 

[10] A. Rouhi, M. Spitale, F. Catania, G. Cosentino, M. Gelsomini, F. 

Garzotto, “Emotify: emotional game for children with autism spectrum 
disorder based-on machine learning”, 24th International Conference on 

Intelligent User Interfaces Companion, New York, USA, March 16-20, 
2019 

[11] K. S. Rao, S. G. Koolagudi, Emotion recognition using speech features, 
Springer, 2013 

[12] P. Shen, C. Zhou, X. Chen, “Automatic speech emotion recognition 

using support vector machine”, International Conference on Electronic, 
Mechanical Engineering and Information Technology, Harbin, China, 

August 12-14, IEEE 2011 

[13] A. S. Utane, S. L. Nalbalwar, “Emotion recognition through speech”, 
2nd National Conference On Innovative Paradigms in Engineering & 

Technology, Nagpur, Maharashtra, India, February 17, 2013 

[14] S. A. Ali, S. Zehra, M. Khan, F. Wahab, “Development and analysis of 
speech emotion corpus using prosodic features for cross linguistic”, 

International Journal of Scientific & Engineering Research, Vol. 4, No. 
1, pp. 1-8, 2013 

[15] S. A. Ali, A. Khan, N. Bashir, “Analyzing the impact of prosodic feature 

(pitch) on learning classifiers for speech emotion corpus”, International 
Journal of Information Technology and Computer Science, Vol. 2, pp. 

54-59, 2015 

[16] M. Swain, A. Routray, P. Kabisatpathy, “Databases features and 
classifiers for speech emotion recognition: a review”, International 

Journal of Speech Technology, Vol. 21, No. 1, pp. 93-120, 2018