Format Template


Vol. 2, No. 1 | January – June 2018 
 

SJCMS | P-ISSN: 2520-0755 | E-ISSN: 2522-3003 © 2018 Sukkur IBA University – All Rights Reserved 
36 

Holy Qur'an Speech Recognition System Distinguishing the 

Type of prolongation

Bilal Yousfi1, Akram M. Zeki2, Aminah Haji1

Abstract: 
The act of learning and teaching of the Holy Quran has become a scientific practice to 

Muslims around. The stakeholders are faced with a huge challenge when it comes to the principle 

of application of Tajweed (that is, the rules guiding the pronunciation during the recitation of 

the Quran). There are several efforts made by previous systems on the development of feasible 

guiding techniques to the act of Tajweed. Unfortunately, liking the major control variables of 

the practices of Tajweed in those approaches were neglected. In order to fill this gap, this 

research presents a speech recognition system that distinguishes the types of Madd (elongated 

tone) or prolongation and the type of Qira’at (method of recitation) related to Madd. The 

proposed system is capable of recognising, identifying, pointing out the mismatch and 

discrimination between two types of Madd namely, The greater connective prolongation and 

The Exchange Prolongation rules for Hafss and Warsh for the verses that contains the two rules, 

that were made by the expert found in a database. Furthermore, this study used Mel-Frequency 

Cepstral Coefficient (MFCC) and Hidden Markov Models (HMM) as feature extraction and 

feature classification respectively. 

Keywords: Holy Quran; Tajweed; Qira’at; Sound, Mel-Frequency; Hidden Markov Models. 

1. Introduction 
The Holy Qur’an was revealed with 

Tajweed rules, and it’s important for readers to 

apply those rules during recitation. According 

to Qira’at science, each Qira’at has its own 

rules of Tajweed. Table 1 shows the difference 

between Hafss and Warsh in terms of the 

greater connective prolongation and the 

exchange prolongation. The act of Tajweed is 

the body of knowledge perfecting and laying 

the path to the understanding of the articulation 

of the Holy Quran letters and reaching the 

utmost level in pronouncing them properly. It 

is quite obvious that the application of the rules 

of recitation of Holy Quran can be performed 

by giving every letter of the Qur’an its rights 

and dues. Various lexical characteristics 

emerge when reciting the Qur’an and 

observing the rules that apply to those letters in 

different situations. For instance, an onwards 

break and the need for reciters to halt for either 

                                                           
1 Kulliyyah of Information and Communication Technology (KICT) 
2 International Islamic University Malaysia (IIUM) 
Corresponding Email: : yousfi.bilal@hotmail.fr   

a short period of time or longer. This is a 

unique attribute that influences people. This 

tends to be habitual and in some certain 

situation, even without due consideration of 

rules, it becomes an important part of learning 

the Holy Quran. However, the facts still lie 

with variation of short or long period needed. 

Many other features and rules that need to be 

taken in to consideration make it necessary for 

researchers to formulate techniques to ease the 

learning and teaching the act of Tajweed. 

The act of Tajweed brings along some 

well-defined rules of recitation of Al-Quran. 

Noticeably, those rules create a big difference 

between normal Arabic speeches and the 

Quranic verses. Madd or prolongation is one of 

the significant tajwid rules. It stands for 

extending or prolonging sound with a letter of 

the Madd. It is divided into two groups: 

The Original Madd and The Secondary 

Madd. The later presentations do not involve a 

mailto:yousfi.bilal@hotmail.fr


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

37 

hamzah before it, and there should not be a 

hamzah or sukoon after it. Whereas the former 

has a longer timing (or the possibility of longer 

timing) than that of the natural Madd. This 

comes into being because of the present of a 

hamzah or a sukoon before and after it. 

Considering these presentations and with the 

availabilities of two rules of prolongation 

(mudud);the greater connective prolongation 

and the exchange prolongation rules under The 

Secondary Madd. This research focuses on 

their dynamics. 

The Exchange Prolongation: rule substitute 

prolongation which occurs when a hamza (ء) 

precedes a half Madd (ا    or ي     or و). This 

Madd is only found within one word and 

occurs when the hamza has the respective 

diacritic on it, for example. if the harf Madd 

‘waaw’ follows a hamza, the hamza has a 

dammah on it [1]. 

The Greater Connective Prolongation : rule 

of the greater connective prolongation occurs 

If the pronoun/possessive pronoun )ها )ه is at 

the end of a word and it has a vowel of a 

dhammah or a kasrah, is between two voweled 

letters, and the first letter of the next word is a 

hamzah, it is permissible to lengthen according 

to the type of recitation [1]. 

The rules of prolongation are directly 

attached to Tajweed rules, these are mostly 

studied independently, however, their correct 

application is most reliable when performing 

subjective assessment in the present of 

Tajweed experts. That is, where experts of 

Tajweed are involved in the treatment of the 

rules governing the Tajweed application. 

Unfortunately, it’s difficult to get experts at 

any time while in Quranic learning and 

teaching application practices. In most cases, 

many people will like to find Tajweed expert 

who will listen to them and point out mistakes 

if any while in Quranic learning session. Thus, 

it has become an important task to develop a 

learning software that will aid the practices of 

Tajweed in the act of learning Quran. This will 

guide people to practise reading of the Quran 

in correct way of Mudud spelling. Crucial to 

this is utilizing a speech recognition system for 

distinguishing the type of Madd as well as the 

type of Qira’at.  

The remaining part of this paper is 

organized as follows. The current section is 

section one, followed by section two which 

represents this research related work. Section 

three describes the speech recognition 

approach, section four present the research 

methodology and section five presents the 

outcomes of this research. Finally, section six 

presents the conclusion. 

TABLE 1. The difference between the 
Greater Connective Prolongation and the 

Exchange Prolongation according to Hafss 
and Warsh Qira’at. 

The Type of 

mudd 
Hafss Warsh 

The Exchange 

lengthening  مد

 البدل

 lengthene

d 2 counts 

 lengthene

d 2, 4, or 6 

counts  

The greater 

connective 

prolongation م

 د الصلة الكبرى

lengthene

d 4 or 5 

counts 

lengthene

d 6 counts 

2. Previous Work 

Previously, great effort has been made 

mostly for the study of Holly Quran Arabic 

speech recognition. There are many reviews of 

the evolution of Arabic Holy Qur’an ASR. 

Several theories were proposed for evaluating 

high accuracy for Arabic Qur’an speech 

recognition [2]. 

One of the most important researches in 

this area presented by [3].  The use of 

Multilayer Perceptron for classification of the 

pronunciation of Qalqalah Kubra (a Tajweed 

rule) has been presented. Feature extraction 

technique using MFCC has been utilized to 

extract the characteristics from Quranic 

verses’ recitation. The technique used was able 

to achieve recognition rate within the range of 

(95% to 100%). Thus, the study has 

contributed to identifying correct and incorrect 

Qalqalah Kubra pronunciation.  

The needs for people to be checking 

Tajweed rules in Quran verses by using 


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

38 

interactive way of learning without the 

guidance and presence of an expert has been 

emphasized in [4]. Thus, the design, 

implementation and evaluation of an 

automated Tajweed checking rules engine for 

Qur’anic learning were also presented. This 

has been validated with MFCC features and 

HMM model, where an accuracy of 91.95% 

(ayates) and 86.41% (phonemes) were 

obtained.  

Generally, people were mostly involved in 

independent practice of listening of recitation 

of famous reciters with the aim of learning 

from that. This is appropriate but Makhraj 

might be missing, thus a novel technique based 

on correct Makhraj has been proposed in [5]. 

The technique was evaluated with s MFCC as 

feature extraction and Mean Square Error 

(MSE) as a pattern matching technique. Thus, 

accuracy of the approach based on False Reject 

Rate (FRR) and Wrong Recognition (WR) has 

been obtained, where the percentage of FRR 

for all recitation is 0% and the accuracy of the 

system is 100%. 

A novel model which distinguishes the 

type of recitation of holy Quran has been 

proposed in [6]-[7]. Feature extraction 

technique used is Mel-frequency Cepstral 

Coefficients (MFCC). Hidden Markov Model 

(HMM) is also employed for classification. 

Similar to other techniques, this has also aimed 

at improving learners techniques of recitation 

of the Holy Quran [8]-[9]. 

3. Speech Recognition for 
Distinguishing the type of mud 

according HAFSS or WARSH 

Speech recognition technique involve the 

process of recording speech or acoustic signal 

which will be accurately and efficiently 

convert into a set of words [10]. The steps 

involve producing a speech recognition system 

is presented in Figure 1.  

The aim of ASR is to extract, characterise, 

and recognise, the information about speech 

identification. The system consists of three 

basic stages as shown in fig 1.: pre-processing, 

where the recording speech (verses) signals is 

passed through the pre-processing block to 

remove the noise and separate desirable voice 

from undesirable once and detect the start point 

and end points of verses. 

 
Fig. 1. Block Diagram of speech Recognition 
System. 

Feature extraction is the process of 

extracting parameters that are unique to each 

word from the input sample of speech. This can 

be used to differentiate between a wide set of 

distinct words. The Mel-Frequency Cepstral 

Coefficient (MFCC) is considered the most 

evident example of a feature set [11]. This is 

widely utilized in speech related studies. 

However, it is closely related to the 

logarithmic perceptional ability of the humans. 

As a way to extract the coefficients, the speech 

sample is taken as the input.  Pre-emphasis is 

applied to pass the signals through a filter 

which emphasises higher frequencies. This 

process will increase the energy of signal at 

higher frequency. After pre-emphasis, the 

signals are directed for frame blocking and 

windowing.  Frame blocking is the process of 

segmenting the speech samples obtained into 

frames with the length within the range of 20 

to 40 msec of N samples, with adjacent frames. 

Windowing is aimed at minimising the 

discontinuities of a signal at the beginning and 

at the end of each frame. This step follows by 

converting each frame from the time domain 

into the frequency domain by utilizing DFT. 

Then to generate the Mel filter bank. This is 


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

39 

done by a set of triangular filters that are used 

for each frame with actual frequency with Mel 

frequency as middle frequency. The triangular 

filter represents the process of Mel scaling in 

the signal. The next step is the computation of 

logarithmic of signal energy. The goal of 

logarithmic signal energy process is to adapt 

with the system just like human ear. In order to 

obtain the MFCC, the result of energy 

logarithmic is processed with Discrete Cosine 

Transform (DCT). Equation 1 present the 

approximate empirical relationship to compute 

the Mel frequencies for a given frequency f 

expressed in Hz: 

Mel (f) = 2595*log10 (1+f/700).             (1) 

Figure 2 shows the steps involved in 

MFCC feature extraction.  

 
Fig. 2. Block diagram of the computation steps of 
MFCC. 

The features classification or pattern 

recognition is the process of identifying 

similarities of spoken words between an 

extract feature from the input signal and set of 

acoustic models stored in the database. HMM 

[12] techniques were used by many researchers 

for speech recognition. 

4. Methodology 
The research methodological approach 

focus on prolongation type recognition and is 

presented in Figure 3. 

The first step involves data collection of the 

Quranic recitation samples from different 

experts Qari (Reciter). These experts were 

known to have Ejazah in Hafss and Warsh. 

Each of them recite specific verses that 

contained the two kind of prolongation (mad): 

The greater connective prolongation and The 

Exchange Prolongation rules for Hafss and 

Warsh. The samples were for collected many 

times in correct way. Thus, the entire samples 

are stored as the raw dataset that are prepared 

for pre-processing. 

 
Fig. 3. Prolongation type identification for Qur'an 
flow chart. 

The collected raw data are pre-processing 

to remove the noise contained in the speech 

signal and separate the desirable voice from 

undesirable once. This reduce the group of 

attributes which assure only the information 

that wants to be conveyed. The MFCC 

algorithm is applied to extract and generate 

features vector which is extensively used as an 

input for recognition purposes. The 

classification is done by HMM model. It 

calculates the HMM parameters. Training 

phase is characterised by extracting features 

using large number of samples "training data", 

and testing phase is characterised by extracting 

features from testing data "data speech". 

Testing data (the user recorded) are matched 

with voice features stored in the database, to 


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

40 

provide responses based on whether they 

recited correctly or incorrectly. Then the 

comparison acknowledges the users’ level of 

accuracy.  A codebook models (stored 

template) in the database that is constructed 

from training data is used for the experimental 

records. 

5. Experiment and Results 
An experiment was carried out intended to 

present some of the recognition scenarios. The 

acoustic model of some Holy Qur’an verses 

speech signal contained the two kinds of 

prolongation were used to show the differences 

between that exist among them. This is based 

on the different type of recitation of Hafss and 

Warsh as shown in Figure 4, 5 and 6, and 

Figure. 7 and Figure. 8. The recognition was 

carried out based on the guidelines presented 

in section 4. 

This research focus on five words from 

verses of the holy Qur’an. These verses have 

been chosen for each of the greater connective 

prolongation and The Exchange Prolongation 

as shown in table 2 and table 3, which was 

recited by the two famous types of Qira’at; 

namely the Qira’at of Hafss from Asim and the 

Qira’at of Warsh from Nafi. 

 
TABLE 2. Verses selected for the Greater 

Connective Prolongation. 

VERSES  Surah 

 Al-An’am "[6:71]َحْيَراَن لَهُ أَْصَحاٌب "
 االنعام

 Al-An’am "[6:71]يَْدُعونَهُ إِلَى اْلُهدَى "
 االنعام

 Al-Kahf [18:110يُْشِرْك بِِعبَادَةِ َرب ِِه أََحدًا ]
 الكهف

 [90:7أَيَْحَسُب أَن لَّْم يََرهُ أََحدٌ ]
 

Al-Balad 

 البلد

 Al-Araf [7:142]فَتَمَّ ِميقَاُت َرب ِِه أَْربَِعيَن 
 االعراف

 
The experimental test and results of this 

research is presented in this section through the 

following: 

 Data collection or training phase. 

 Recognition phase. 
The first phase involves collection of the 

recitation of the sample data, that will aid in 

extracting and training the features. The data 

are found at the Reciter’s database. The 

database is selected from Internet. Verses of 

the Holy Qur'an for each of exchange 

prolongation and the greater connective 

prolongation are the key objects used. This are 

tested on the most popular reciters such as 

Sheikh Al-Hosry. Five (5) verses are recited 

by three (3) reciters with two (2) categories of 

Qira’at which are Warsh and Hafss on The 

Exchange Prolongation as well as The Greater 

Connective Prolongation respectively. A total 

of sixty (60) of data samples are obtained. All 

the samples are passed through the extraction 

stage in order to extract and represent the 

features in the form of frequency on Mel scale. 

Delta coefficients of Mel Coefficients are 

calculated and then, trained and recognized 

using HMM. This are used as reference 

patterns and stored as Reciter’s database. 

Recognition phase involves verifying the 

recitation of new reciters to the pre-stored 

value against the entire reciter in reciter’s 

database. The results are tested against the 

specified objectives of proposed system. The 

developed system is tested by performing the 

MFCC algorithm for features extraction from 

the Qur’anic recitation of samples data used 

and then, matching/testing against the trained 

HMM model of data templates, using the same 

classification of HMM method. The HMM 

algorithm is anticipated to get the best results 

of identification system. 

The recognition accuracy rate is calculated 

using equation 2: 

Accuracy = (number of correct samples / 

total samples) X 100             (2) 

The experimental results of the testing 

process are presented here. The experiment 

reveals the extracted features of 10 verses of 

the Qur’anic recitation which were directly 
compared with the data based on the Model. 

As a result, the test result on the training data 

obtained for this study is at 60% and 50% for 

The Exchange Prolongation according to 

Hafss and Warsh and 40%, 70% for the greater 

connective prolongation according to Hafss 

and Warsh respectively (see Table 4). 


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

41 

The gathered results have shown some 

enhancements compared to the previous 

findings. The research contributions lie with 

the improve performance and efficiency of the 

proposed technique. 

Although, the system faces some 

drawbacks, with the extra noise due to audio 

file compression and poor quality during the 

recording process. Yet, high-performance 

measure was achieved. 

 
TABLE 3. The verses selected for the 

Exchange Prolongation. 

 
Fig. 4. Power spectrum plot of spoken word  له
 the Greater Connective ”َحْيَراَن لَ هُ  أَْصَحابٌ “

Prolongation  According to Warsh. 

 
Fig. 5. Power spectrum plot of spoken word  له
 The Greater Connective ”َحْيَراَن لَ هُ  أَْصَحابٌ “

Prolongation  According to Hafss. 

 
Fig. 6. 2D Plot of acoustic vector of spoken word 
 The Greater Connective ”َحْيَراَن لَ هُ  أَْصَحابٌ “له 

Prolongation  According to Warsh and Hafss. 

 
Fig. 7. Speech signals of spoken word  َحْيَراَن لَ هُ  “له
 The Greater Connective ”أَْصَحابٌ 

Prolongation  According to Warsh. 

VERSES Surah 

َ ِذْكًرا َكثِيًرا“  يَا أَيَُّها الَِّذيَن آَمنُوا اْذُكُروا َّللاَّ

[33:41]” 

Al-Ahzab 

 االحزاب

ُ َربُّ اْلعَالَِميَن   “ َوَما تََشاُؤوَن إَِّلَّ أَن يََشاء َّللاَّ

[81:29]” 

Al-Takwir 

 التكوير

َع إِيَماِنِهْم “  Al-Fath ”[48:4]اْلُمْؤِمنِيَن ِليَْزدَادُوا إِيَمانًا مَّ
 الفتح

ب ِِهْم “ َوإِنَّ الَِّذيَن أُْوتُواْ اْلِكتَاَب لَيَْعلَُموَن أَنَّهُ اْلَحقُّ ِمن رَّ

[2:144] 

At-Tawba 

 التوبة

 [12:16] ”َوَجاُؤواْ أَبَاهُْم ِعَشاء يَْبُكونَ “

 
Yusuf 

 يوسف


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

42 

 
Fig. 8. Speech signals of spoken word  َحْيَراَن لَ هُ  “له
 The Greater Connective ”أَْصَحابٌ 

Prolongation  According to Hafss. 

TABLE 4. Model tuning results. 

Prolongation 

type 

The Exchange 

Prolongation 

The Greater 

Connective 

Prolongation  

Qira’at type Warsh Hafss Warsh Hafss 

# of utterances 10 10 10 10 

Correct 06 05 04 07 

Wrong 04 05 06 03 

% Accuracy 60% 50% 40% 70% 

 
Figure 9 shows the recognition accuracy 

rate of each kind of prolongation type where 

y-axis contains results and x-axis contains the 

types of Madd. 

 
Fig. 9. The accuracy rate of proposed system. 

6. Conclusion  

This paper has developed on theory and 

practice based for developing a high-

performance Tajweed system that assist in 

proper Tajweed Qur’anic recitation based on 

the automatic speech recognition system. The 

research utilized, Mel-frequency Cepstral 

Coefficients (MFCC) and HMM (Hidden 

Markov Model) algorithms to enable the 

validation of the proposed system. Several 

experiments were carried out. The 

experimental results on a database indicate that 

the feature extraction method and recognition 

method used for this research appropriate for 

Arabic recognition system are feasible. 

There are other several techniques such as 

Liner Predictive Coding (LPC) and Artificial 

Neural Network (ANN) could also be used for 

similar research approach. The findings from 

those might be different, therefore, this 

research recommend future work to focus on 

using discriminative training techniques which 

might improve the discrimination between 

some confusable pronunciation alternatives. 

ACKNOWLEDGMENT 

The authors would like to thank the 

Research Management Centre and 

the Faculty of Information and 

Communication Technology, the International 

Islamic University Malaysia for their supports. 

 
REFERENCES 

[1] K. C. Czerepinski and A. D. A. R. Swayd, 
Tajweed Rules of the Qur’an. Dar Al-

Khair Islamic Books Publisher, 2006.  

[2] B. Yousfi and A. M. Zeki, “Automatic 
Speech Recognition for the Holy Qur ‘an, 

A Review,” in The International 

Conference on Data Mining, Multimedia, 

Image Processing and their Applications 

(ICDMMIPA2016), 2016, p. 23. 

[3] H. A. Hassan, N. H. Nasrudin, M. N. M. 
Khalid, A. Zabidi, and A. I. Yassin, 

“Pattern classification in recognizing 

Qalqalah Kubra pronuncation using 

0
10
20
30
40
50
60
70
80

Warsh Hafss Warsh Hafss

The Exchange
Prolongation

The Greater
Connective

Lengthening


Bilal Yousfi (et al.), Holy Qur'an Speech Recognition System Distinguishing the Type of prolongation        (pp. 36 - 43) 

Sukkur IBA Journal of Computing and Mathematical Sciences - SJCMS | Volume 2 No. 1 January – June 2018 © Sukkur IBA University                                                                                                           

43 

multilayer perceptrons,” IEEE 

symposium on Computer Applications 

and Industrial Electronics (ISCAIE), 

2012, pp. 209–212. 

[4] D. Raja-Jamilah Raja-Yusof , Fadila 
Grine, N. Jamaliah Ibrahim, M. Yamani 

Idna Idris, Z. Razak, and N. Naemah 

Abdul Rahman, “Automated tajweed 

checking rules engine for Quranic 

learning,” Multicult. Educ. Technol. J., 

vol. 7, no. 4, pp. 275–287, 2013.  

[5] A. N. Wahidah et al., “Makhraj 
recognition using speech processing,” 7th 

International Conference on Computing 

and Convergence Technology (ICCCT), 

2012, pp. 689–693. 

[6] B. Yousfi and A. M. Zeki, “Holy Qur’an 
speech recognition system distinguishing 

the type of recitation,” 7th International 

Conference on Computer Science and 

Information Technology (CSIT), 2016, 

pp. 1–6. 

[7] B. Yousfi, A. M. Zeki, and A. Haji, “Holy 
Qur’an Speech Recognition System 

Mudud Tajweed Rule Checking,” Int. J. 

Islam. Appl. Comput. Sci. Technol, pp. 

10–18, 2016. 

[8] B. Yousfi and A. M. Zeki, “Holy Qur’an 
speech recognition system Imaalah 

checking rule for warsh recitation,” IEEE 

13th International Colloquium on Signal 

Processing & its Applications (CSPA), 

2017, pp. 258–263. 

[9] B. Yousfi, A. M. Zeki, and A. Haji, 
“Isolated Iqlab checking rules based on 

speech recognition system,” 8th 

International Conference on Information 

Technology (ICIT), 2017, pp. 619–624. 

[10] N. Zerari, B. Yousfi, and S. Abdelhamid, 
“Automatic Speech Recognition: A 

Review,” Int. Acad. Res. J. Bus. Technol., 

vol. 2, no. 2, pp. 63–68, 2016. 

[11] S. B. Davis and P. Mermelstein, 
“Comparison of parametric 

representations for monosyllabic word 

recognition in continuously spoken 

sentences,” in Readings in speech 

recognition, Elsevier, 1990, pp. 65–74. 

[12] L. R. Rabiner, “A tutorial on hidden 
Markov models and selected applications 

in speech recognition,” Proc. IEEE, vol. 

77, no. 2, pp. 257–286, 1989.