Microsoft Word - ETASR_V11_N6_pp7968-7973


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7968 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
Photometric Ligature Extraction Technique for Urdu 
Optical Character Recognition 

 
Majida Kazmi 

Faculty of Electrical and Computer Engineering 
NED University of Engineering and Technology 

Karachi, Pakistan 
majidakazmi@neduet.edu.pk 

Fauzia Yasir 

Faculty of Electrical and Computer Engineering 
NED University of Engineering and Technology 

Karachi, Pakistan 
fyasir@neduet.edu.pk 

Samreen Habib 

Neurocomputation Lab, NCAI 
NED University of Engineering and Technology 

Karachi, Pakistan 
habib@cloud.neduet.edu.pk 

Muhammad Saad Hayat 

Deptartment of Electrical Engineering 
NED University of Engineering and Technology 

Karachi, Pakistan 
hayat@cloud.neduet.edu.pk 

Saad Ahmed Qazi 

Faculty of Electrical and Computer Engineering 
Neurocomputation Lab, NCAI 

NED University of Engineering and Technology 
Karachi, Pakistan 

saadqazi@neduet.edu.pk 
 

Abstract-Urdu Optical Character Recognition (OCR) based on 

character level recognition (analytical approach) is less popular 
as compared to ligature level recognition (holistic approach) due 

to its added complexity, characters and strokes overlapping. This 

paper presents a holistic approach Urdu ligature extraction 

technique. The proposed Photometric Ligature Extraction (PLE) 

technique is independent of font size and column layout and is 

capable to handle non-overlapping and all inter and intra 
overlapping ligatures. It uses a customized photometric filter 

along with the application of X-shearing and padding with 

connected component analysis, to extract complete ligatures 

instead of extracting primary and secondary ligatures separately. 

A total of ~ 2,67,800 ligatures were extracted from scanned Urdu 

Nastaliq printed text images with an accuracy of 99.4%. Thus, 

the proposed framework outperforms the existing Urdu Nastaliq 

text extraction and segmentation algorithms. The proposed PLE 

framework can also be applied to other languages using the 

Nastaliq script style, languages such as Arabic, Persian, Pashto, 
and Sindhi. 

Keywords-ligature; holistic; Urdu OCR; Nastaliq; photometric 

filter; Urdu printed text images  

I. INTRODUCTION 

OCR technology is used to obtain machine editable text 
from text images. It allows the digitization of valuable printed 
and handwritten data covering cultural and historical 
milestones [1]. The commercial OCR systems that are now 
available report near to 100% recognition rates for languages 

using the Latin alphabet, such as English, German, and French. 
Arabic and Chinese OCR systems are also well-developed. 
Despite the significant research interest in this area, OCR 
systems for many languages, including Urdu, are still in the 
development stage [2-3]. Urdu is Pakistan’s official language 
having a large collection of valuable printed and handwritten 
data in the form of books, novels, magazines, and newspapers. 
Most of these valuable data are not accessible digitally. The 
Urdu language has 39 basic characters, 28 of which are Arabic. 
It is mostly written in the Nastaliq script style, which is a 
complex calligraphic style, written diagonally from right-to-left 
with varying inter and intra word spaces, overlapping of 
characters and strokes, incorrect or filled loops and lack of 
fixed baseline [4-5] as shown in Figure 1. 

 
Fig. 1.  Major challenges in Nastaliq text: Intra overlapping ligatures (red), 
inter overlapping ligatures(green), false and filled loops (blue) and missing 
baseline (yellow). 

Urdu OCR is primary composed of five stages: Image 
acquisition, pre-processing, segmentation, classification and 
recognition, and post-processing [6]. Image acquisition collects 
digital images through camera shots, scanned text images, or 

Corresponding author: Majida Kazmi


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7969 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
generated synthetic images [6]. Pre-processing aims to enhance 
the quality of an acquired image [6]. Noise and skew removal, 
binarization, contrast enhancement, etc. are mainly performed 
in this step with the use of classic image processing techniques. 
Segmentation decomposes a source image into characters, 
ligatures, or words [7-8]. This step usually employs projection 
profile and Connected Component Analysis (CCA). 
Classification aims to correctly classify the 
extracted/segmented features (ligatures, characters, words, 
etc.). The most common classifier methods are Decision Tree 
(DT), Statistical Classifier (SC), Neural Networks (NNs) [9, 
10], Hidden Markov Models (HMMs), and Support Vector 
Machines (SVMs). Finally, post-processing corrects the 
recognition errors in the obtained text [10]. The techniques 
used for OCR post-processing include manual error correction, 
dictionary-based error correction, and context-based error 
correction [12-13].  

Among the above stages, segmentation at character, 
ligature, or word level is the most challenging stage in Urdu 
OCR. Based on these levels, Urdu OCR can be divided into 
two categories: analytical approach at character level [14-15] 
and holistic approach at ligature level [7, 16-17]. The analytical 
approach segments text at character level either explicitly or 
implicitly. The explicit segmentation requires an extensive 
knowledge of characters as it explicitly divides handwritten or 
printed text into characters. Many researchers have adopted the 
explicit character segmentation [17-21]. On the other hand, 
implicit segmentation is an integration of the segmentation and 
recognition processes. Successful work has been reported by 
researchers for implicit segmentation [22-26] due to the smaller 
number of segments. However, both algorithms require a 
massive amount of training data for better results. The holistic 
approach is also referred to as segmentation-free method. It 
extracts at ligature or word level. Groups of isolated (non-
joiner) characters and non-isolated (joiner) characters (Figure 
2(a)) are termed as ligatures. These ligatures are grouped to 
form words. Ligatures are further classified as primary and 
secondary ligatures. Primary ligatures represent the main body 
of a word, while dots or diacritic marks are the secondary 
ligatures (Figure 2(b)). 

 
(a) 

 
(b) 

 
Fig. 2.  Word breakdown. (a) Ligatures in an example word, (b) ligature 
components. 

Avoiding character level segmentation has made the 
holistic method extremely popular [3, 27-32]. Authors in [27] 
followed the projection technique for text line extraction. The 
main body and diacritics were identified based on the distance 
between the horizontal base and the average line. The 

technique was tested on a small data set that was not specific to 
Nastaliq script, consisting of 1050 single characters and 
ligatures, with 98.86% accuracy. Authors in [28] used the 
horizontal projection technique. CCA was applied before text 
segmentation. The horizontal span of each secondary 
component on the baseline was calculated for the re-association 
of diacritics to their respective primary ligature. However, this 
approach assumed to work on text files instead of text images 
to extract complete ligatures. Similarly, authors in [29] applied 
the vertical projection profile method for the association of 
secondary ligatures by calculating the start and end point of 
diacritics. The proposed method reported 100% and 99% 
accuracy in baseline identification and ligature extraction 
respectively on scanned images with 48 font size but this 
technique ignores intra-overlapping ligatures and is also font 
size dependent.  

Authors in [3] employed the horizontal projection method 
along with dilation to merge secondary and primary ligatures 
before line separation from the image. Authors in [30] used 
only 300 ligature samples to evaluate their proposed method, 
reporting 91.3% accuracy in segmentation and 78% in 
diacritics association. Authors in [31] proposed an extraction 
ligature technique based on 6 heuristic conditions reporting an 
accuracy of 99.02% on 45 images. Authors in [32] proposed 
the line segmentation technique with the connected component 
analysis method on images to collect width, height and 
centroids of ligatures reporting 99.80% accuracy. However, 
this technique does not segment multi-column scripts and 
overlapped inter and intra ligatures. Many recognition 
techniques carry out separate classifications of primary and 
secondary components [3, 27-32] to reduce the number of 
distinct recognizable classes. Such techniques face significant 
challenges in re-associating the secondary components with 
their primary components to recognize the entire ligature. The 
complexity at character segmentation has shifted the focus 
towards the holistic approach, i.e. the recognition of words or 
ligatures in the text. Segmenting text at the character level is 
more complex than the recognition of words and ligatures due 
to character overlapping, varying inter and intra word spaces, 
context sensitivity, different forms of characters according to 
their position in a word or a ligature, and the cursive script 
style. The literature review reveals that Urdu OCR is an open 
field for the researcher to design a system capable of 
incorporating factors such as intra and inter ligature 
overlapping, multi-column text images with borders, font 
variation, and mass data of ligatures for classification.  

An efficient ligature extraction technique for Urdu OCR is 
proposed in this paper. The proposed method is capable to 
extract complete ligatures efficiently unlike separating primary 
and secondary components. The proposed technique is 
independent of font size and column layout, and is capable to 
handle all overlapping and non-overlapping ligatures by 
addressing the issue of intra overlapped ligatures as well as the 
complex association of the secondary components. It extracts 
complete ligatures, rather than separating primary and 
secondary components, thus secondary ligatures do not need to 
reassociate with their primary ligature in the classification and 
recognition steps. The proposed framework is designed for 


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7970 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
Urdu but is applicable to other languages that follow the 
Nastaliq style, such as Arabic, Persian, Pashto, and Sindhi. 

II. THE PROPOSED METHODOLOGY 

The proposed framework for ligature extraction is depicted 
in Figure 3. It consists of 3 steps: image acquisition, image 
binarization, and PLE. Urdu printed text images from novels, 
religious books written in Nastaliq style, in single and double 
columns and varying font sizes were downloaded from 
different sources [33] and are referred to as Iimg. First, the Iimg is 
converted into binarized images Ith by using hard thresholding. 
The resultant Ith is a mono-chrome image with white 
background and black text (Figure 4). Then, an efficient 
process of PLE is applied on each Ith.  

 
Fig. 3.  Framework for Urdu ligature extraction. 

A. Photometric Ligature Extraction (PLE)  

The proposed PLE used a customized photometric filter 
which is specifically designed to decompose an image based on 
the photometric similarity. The stepwise description of PLE 
process follows: 

• In the first step, PLE deploys a photometric filter to extract 
text lines (Llines) from the image (Ith). The algorithm in 
Figure 4 demonstrates the working of the photometric filter. 
This filter scans the binarized image from top to bottom to 
detect text using the logical AND operator. The size of the 
photometric filter is adjusted with the width (W) of the 
image as (1xW). The output of the photometric filter is then 
saved in an array. The resultant array is a stream of zeroes 
and ones, on which unary AND operation is performed to 
get a single bit value, i.e. 0 or 1. The 0 value indicates the 
presence of black pixel/s in the row, otherwise the value 
will be 1. 

• In the second step, the image Llines is first rotated 
counterclockwise by 90º. The photometric filter is then 
applied to each line of Llines to extract both overlapped and 
non-overlapped ligatures. 

• The overlapped ligatures are corrected in this step by 
applying X-shear transformation and padding 
simultaneously on each Llig to overcome the most 

challenging issue of inter and intra ligatures overlapping. 
The output of this step consists of the sheared and padded 
ligatures Lsheared-lig. 

• In this step, the Lsheared-lig images are classified into two 
classes based on the extent value of the first encountered 
ligature in image using CCA. Height, width, centroid, etc. 
are major properties obtained through the CCA method. 
The developed methodology utilized another component 
property termed as extent which is defined as the ratio of 
contour area to the bounding rectangle area. The extent 
value is a key feature in distinguishing secondary and 
primary ligatures with 99% accuracy. If the extent value of 
ligature is less than the hard threshold value, then dilatation 
operation is carried out on the encountered ligature 
producing Lligs, dilated. This process reduces the distance 
between the primary and the secondary component of a 
ligature.  

• In the last step, the photometric filter is again applied to all 
dilated and non-dilated ligatures Lligs,dilated and Lligs,non-dilated 
to extract complete ligatures as final output Lextracted-ligs. 

 
Fig. 4.  The photometric filter algorithm. 

 
Fig. 5.  The result of image binarization (Ith) on the input image Iimg. 


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7971 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
B. Demonstration of the Proposed PLE 

The stepwise demonstration of the proposed PLE technique 
is shown in Figure 6. The input of the PLE technique is a 
mono-chrome image with white background and black text 
(Figure 5).  

 
(a) 
 

(b) 

 
(c) 

 
(d) 

 
(e) 

 
Fig. 6.  PLE framework illustration. (a) An extracted line (L
line
). (b) 

Segmented ligatures from the sentence in (a). The red encircled ligatures are 
overlapped. (c) The overlapping issue of ligatures obtained in (b) is resolved 
by X-shearing of ligatures encircled as green. (d) List of ligatures 
(Lligs,dilated,Lligs,non-dilated) after morphological operation (dilation). Correctly 
extracted ligatures (Lextracted-ligs) after PF applied on (Lligs,dilated,Lligs,non-dilated). 

In the first step, text lines are extracted one by one from the 
text image by applying the photometric filter (Figure 6(a)). In 
the next step, each extracted line is first rotated 
counterclockwise and then again passes through the 
photometric filter to extract both overlapped (marked as red 
circle) and non-overlapped ligatures (Figure 6(b)). The issue of 
inter and intra overlapping is resolved (see Figure 6(c), marked 
as green circles) by applying X-shearing and padding 
simultaneously on each ligature. Figure 6(d) depicts the list of 

dilated and non-dilated ligatures. The dilation process reduces 
the distance between the primary and secondary component of 
a ligature. Finally, the photometric filter is again applied on 
these ligatures to get the final output as shown in Figure 6(e). 
This step will further enhance the correct separation of 
ligatures. 

III. RESULTS AND ANALYSIS 

The proposed Urdu ligature extraction framework was 
evaluated on downloaded Urdu printed text images. The 
technique was tested on a total of 600 novel and book images. 
The working dataset mainly comprised of non-overlapping 
lines with no boundary across images. First, the photometric 
filter was applied on the images and extracted lines with an 
accuracy of 99.6%. A total of 13,200 lines were extracted from 
600 images. These lines were then segmented into ligatures. A 
total of 267,800 ligatures were extracted after the complete 
execution of all the steps of the proposed PLE with an overall 
accuracy of 99.4%. Table I compares the proposed ligature 
extraction framework with previously reported methods. 
Authors in [27] evaluated their approach on 1050 ligatures with 
98.86% accuracy in primary and secondary stroke extraction. 
Authors in [29] achieved 99% accuracy in ligature and 
diacritics extraction. Authors in [30] tested their system on 300 
sample images out of which 274 were segmented correctly with 
91.3% accuracy. Authors in [30] analyzed 45 Urdu images to 
classify and associate the connected components with 99.02% 
accuracy. Authors in [32] used 10,063 text lines to test their 
algorithm and reported an accuracy of 99.8%. 

However, as discussed above, due to the limited data set of 
ligatures, researchers have mostly deployed algorithms on their 
own datasets to check the accuracy of ligature 
segmentation/extraction. Therefore, the accuracy depends upon 
the complexity of the text images used for segmentation and re-
association of primary and secondary components. 
Segmentation algorithms achieving segmentation accuracy near 
99% apply CCA for primary and secondary component 
segmentation in [28, 31-32] and projection profile method in 
[3, 27, 29-30] and then reassociate the secondary components. 
These studies also ignore the extraction of inter overlapped 
ligatures. The last row of Table I presents the findings of the 
proposed technique. The proposed solution resolved the 
problem of inter ligature overlapping with an accuracy of 
99.4%. However, the efficiency of the proposed method is 

reduced due to the redundant use of the word . This complete 
ligature remains unaffected even after X-shearing because the 
primary component Alif " ا  " lies in the region of the second 
main component ‘يک ‘and the diacritics also overlap with the 
neighboring primary components. It was observed that spacing 
between diacritics that lie below the main body sometimes 
leads to incorrect line segmentation. 

IV. CONCLUSION 

This paper presented an efficient ligature extraction 
technique for the extraction of Urdu ligatures in Nastaliq fonts. 
The technique used a customized photometric filter along with 
the application of X-shearing and padding with CCA that result 
in the efficient extraction of overlapped and non-overlapped 
ligatures. The proposed framework achieves an accuracy of 


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7972 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
99.4%. The efficiency of PLE technique can be enhanced by 
overcoming the association of secondary ligatures to respective 
main component before the extraction of text lines. This work 

can also be deployed for other Nastaliq script-based languages 
like Persian, Pashto, Saraiki, Panjabi, etc.. 

TABLE I.  COMPARISON WITH RELEVANT HOLISTIC APPROACHES 

Work 

Ligature Extraction 

Limitations 
Accuracy 

(%) 

Extracted 

overlapped 

ligatures 
Technique Result 

[32] CCA Separate primary and secondary components 
Declares the secondary ligature as primary 
ligature when the size exceeds the threshold value 

99.8 No 

[30] PP Separate primary and secondary components 
- Extracted only 300 ligatures 
- Low diacritics association accuracy 

91.3 No 

[31] CCA Separate primary and secondary components 
- Relies on zonal information 
- Tested on only 45 images 

99.02 No 

[29] PP Separate primary and secondary components 
- Cannot extract overlapped ligatures 
-Dependent on font size of 48 

99.0 No 

[27] PP Separate primary and secondary components 
-Did not specify the script style 
-The system was tested on 1050 ligatures 

98.86 No 

[28] CCA Separate primary and secondary components 
The proposed method was directly tested on text 
files 

97.4 No 

Proposed 
PLE 

Photometric filter, 
X-shearing and 

stretching, CCA, and 
dilation 

Complete ligature extraction 

- Uneven baseline recognition 
-Narrow spacing between the ligatures of the 

word  which reduces efficiency 

99.4 Yes 

CCA:  Connected Component Analysis, PP:  Projection Profile 
 

REFERENCES 

[1] A. Wali and S. Hussain, "Context Sensitive Shape-Substitution in 
Nastaliq Writing System: Analysis and Formulation," 2007, pp. 53–58, 
https://doi.org/10.1007/978-1-4020-6268-1_10. 

[2] S. T. Javed and S. Hussain, "Segmentation Based Urdu Nastalique 
OCR," in Progress in Pattern Recognition, Image Analysis, Computer 
Vision, and Applications, 2013, pp. 41–49, https://doi.org/10.1007/978-
3-642-41827-3_6. 

[3] I. U. Din, Z. Malik, I. Siddiqi, and S. Khalid, "Line and Ligature 
Segmentation in Printed Urdu Document Images," presented at the 3rd 
International Conference on Computational and Social Sciences, Oct. 
2015. 

[4] S. Naz, A. I. Umar, S. B. Ahmed, S. H. Shirazi, M. Imran Razzak, and I. 
Siddiqi, "An Ocr system for printed Nasta’liq script: A segmentation 
based approach," in 17th IEEE International Multi Topic Conference 
2014, Dec. 2014, pp. 255–259, https://doi.org/10.1109/INMIC.2014. 
7097347. 

[5] H. R. Khan, M. A. Hasan, M. Kazmi, N. Fayyaz, H. Khalid, and S. A. 
Qazi, "A Holistic Approach to Urdu Language Word Recognition using 
Deep Neural Networks," Engineering, Technology & Applied Science 
Research, vol. 11, no. 3, pp. 7140–7145, Jun. 2021, https://doi.org/ 
10.48084/etasr.4143. 

[6] N. H. Khan and A. Adnan, "Urdu Optical Character Recognition 
Systems: Present Contributions and Future Directions," IEEE Access, 
vol. 6, pp. 46019–46046, 2018, https://doi.org/10.1109/ACCESS.2018. 
2865532. 

[7] S. Chanda and U. Pal, "English, Devnagari and Urdu Text 
Identification," in Proc. international conference on document analysis 
and recognition, 2005, pp. 538–545. 

[8] A. Rana and G. S. Lehal, "Offline Urdu OCR using Ligature based 
Segmentation for Nastaliq Script," Indian Journal of Science and 
Technology, vol. 8, no. 35, pp. 1–9, Dec. 2015, 
https://doi.org/10.17485/ijst/2015/v8i35/86807. 

[9] M. Alghobiri, "A Comparative Analysis of Classification Algorithms on 
Diverse Datasets," Engineering, Technology & Applied Science 
Research, vol. 8, no. 2, pp. 2790–2795, Apr. 2018, https://doi.org/ 
10.48084/etasr.1952. 

[10] S. R. Basha, J. K. Rani, and J. J. C. P. Yadav, "A Novel Summarization-
based Approach for Feature Reduction Enhancing Text Classification 
Accuracy," Engineering, Technology & Applied Science Research, vol. 
9, no. 6, pp. 5001–5005, Dec. 2019, https://doi.org/10.48084/etasr.3173. 

[11] I. A. Doush, F. Alkhateeb, and A. H. Gharaibeh, "A novel Arabic OCR 
post-processing using rule-based and word context techniques," 
International Journal on Document Analysis and Recognition (IJDAR), 
vol. 21, no. 1, pp. 77–89, Jun. 2018, https://doi.org/10.1007/s10032-018-
0297-y. 

[12] Y. Bassil and M. Alwani, "OCR Post-Processing Error Correction 
Algorithm using Google Online Spelling Suggestion," arXiv:1204.0191 
[cs], Apr. 2012, Accessed: Dec. 01, 2021. [Online]. Available: 
http://arxiv.org/abs/1204.0191. 

[13] K. Kukich, "Techniques for automatically correcting words in text," 
ACM Computing Surveys, vol. 24, no. 4, pp. 377–439, Dec. 1992, 
https://doi.org/10.1145/146370.146380. 

[14] S. Naz, K. Hayat, M. Imran Razzak, M. Waqas Anwar, S. A. Madani, 
and S. U. Khan, "The optical character recognition of Urdu-like cursive 
scripts," Pattern Recognition, vol. 47, no. 3, pp. 1229–1248, Mar. 2014, 
https://doi.org/10.1016/j.patcog.2013.09.037. 

[15] S. A. Husain, "A multi-tier holistic approach for Urdu Nastaliq 
recognition," in International Multi Topic Conference, 2002. Abstracts. 
INMIC 2002., Karachi, Pakistan, Dec. 2002, pp. 84–84, 
https://doi.org/10.1109/INMIC.2002.1310191. 

[16] S. T. Javed, S. Hussain, A. Maqbool, S. Asloob, S. Jamil, and H. Moin, 
"Segmentation Free Nastalique Urdu OCR," International Journal of 
Computer and Information Engineering, vol. 4, no. 10, pp. 1514–1519, 
Oct. 2010. 

[17] U. Pal and A. Sarkar, "Recognition of printed Urdu script," in Seventh 
International Conference on Document Analysis and Recognition, 2003. 
Proceedings., Edinburgh, UK, Aug. 2003, pp. 1183–1187, 
https://doi.org/10.1109/ICDAR.2003.1227844. 

[18] Z. Ahmad, J. K. Orakzai, and I. Shamsher, "Urdu compound Character 
Recognition using feed forward neural networks," in 2009 2nd IEEE 
International Conference on Computer Science and Information 
Technology, Beijing, China, Aug. 2009, pp. 457–462, https://doi.org/ 
10.1109/ICCSIT.2009.5234683. 

[19] S. A. Sattar, S. Haque, and M. K. Pathan, "A Finite State Model for 
Urdu Nastalique Optical Character Recognition," International Journal 
of Computer Science and Network Security, vol. 9, no. 9, pp. 116–122, 
2009. 

[20] S. T. Javed, "Investigation into a segmentation-based OCR for the 
Nastaleeq writing system," M.S. thesis, National University of Computer 
and Emerging Sciences, Lahore, Pakistan, 2007. 

[21] S. Mir, S. Zaman, and M. W. Anwar, "Printed Urdu Nastalique Script 
Recognition Using Analytical Approach," in 2015 13th International 


Engineering, Technology & Applied Science Research Vol. 11, No. 6, 2021, 7968-6973 7973 
 

www.etasr.com Kazmi et al.: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition 

 
Conference on Frontiers of Information Technology (FIT), Islamabad, 
Pakistan, Dec. 2015, pp. 334–340, https://doi.org/10.1109/FIT.2015.65. 

[22] S. B. Ahmed, S. Naz, M. I. Razzak, S. F. Rashid, M. Z. Afzal, and T. M. 
Breuel, "Evaluation of cursive and non-cursive scripts using recurrent 
neural networks," Neural Computing and Applications, vol. 27, no. 3, 
pp. 603–613, Apr. 2016, https://doi.org/10.1007/s00521-015-1881-4. 

[23] R. P. Thakkar Mitesh, "Handwritten Nastaleeq Script Recognition with 
BLSTM-CTC and ANFIS method," International Journal of Computer 
Trends and Technology, vol. 11, no. 3, 2014, https://doi.org/10.14445/ 
22312803/IJCTT-V11P128. 

[24] S. Naz et al., "Offline cursive Urdu-Nastaliq script recognition using 
multidimensional recurrent neural networks," Neurocomputing, vol. 177, 
pp. 228–241, Feb. 2016, https://doi.org/10.1016/j.neucom.2015.11.030. 

[25] S. Naz et al., "Urdu Nastaliq recognition using convolutional–recursive 
deep learning," Neurocomputing, vol. 243, pp. 80–87, Jun. 2017, 
https://doi.org/10.1016/j.neucom.2017.02.081. 

[26] S. Naz, A. I. Umar, R. Ahmad, S. B. Ahmed, S. H. Shirazi, and M. I. 
Razzak, "Urdu Nasta’liq text recognition system based on multi-
dimensional recurrent neural network and statistical features," Neural 
Computing and Applications, vol. 28, no. 2, pp. 219–231, Feb. 2017, 
https://doi.org/10.1007/s00521-015-2051-4. 

[27] S. Sardar and A. Wahab, "Optical character recognition system for 
Urdu," in 2010 International Conference on Information and Emerging 
Technologies, Karachi, Pakistan, Jun. 2010, https://doi.org/10.1109/ 
ICIET.2010.5625694. 

[28]  N. Sabbour and F. Shafait, ``A segmentation-free approach to Arabic 
and Urdu OCR,'' Proc. SPIE, vol. 8658, p. 86580N, Feb. 2013. 

[29] S. Nazir and A. Javed, "Diacritics Recognition Based Urdu Nastalique 
OCR System," The Nucleus, vol. 51, no. 3, pp. 361–367, Sep. 2014. 

[30] A. F. Ganai and A. Koul, "Projection profile based ligature segmentation 
of Nastaleeq Urdu OCR," in 2016 4th International Symposium on 
Computational and Business Intelligence (ISCBI), Olten, Switzerland, 
Sep. 2016, pp. 170–175, https://doi.org/10.1109/ISCBI.2016.7743278. 

[31] G. S. Lehal, "Ligature Segmentation for Urdu OCR," in 2013 12th 
International Conference on Document Analysis and Recognition, 
Washington, DC, USA, Aug. 2013, pp. 1130–1134, https://doi.org/ 
10.1109/ICDAR.2013.229. 

[32] I. Ahmad, X. Wang, R. Li, M. Ahmed, and R. Ullah, "Line and Ligature 
Segmentation of Urdu Nastaleeq Text," IEEE Access, vol. 5, pp. 10924–
10940, 2017, https://doi.org/10.1109/ACCESS.2017.2703155. 

[33] "Kutubistan," Kutubistan. https://kutubistan.blogspot.com/ (accessed 
Dec. 01, 2021).