Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Mobile Application Based Translation of Sign Language to Text Description in Kannada Language https://doi.org/10.3991/ijim.v12i2.8071 Ramesh M. Kagalkar!!" Visvesvaraya Technological University (VTU), Belgaum, Karnataka, India rameshvtu10@gmail.com Shyamrao V Gumaste MET League of College, Nashik, Maharashtra, India Abstract—Sign language is a main mode of communication for vocally dis- abled. This language use set of representation which is finger sign, expression or mixture of both to express their information among others. This system pre- sents a novel approach for mobile application based translation of sign action analysis, recognition and generating a text description in Kannada language. Where it uses two important steps training and testing. In training set of 50 dif- ferent domains of video samples are collected, each domain contains 5 samples and assign a class of words to each video sample and it will be store in data- base. Where in testing test sample under goes preprocessing using median filter, canny operator for edge detection, HOG for feature extraction. SVM takes input as a HOG features and predict the class label based on trained SVM model. Fi- nally the text description will be generated in Kannada language. The average computation time is minimum and with acceptable recognition rate and validate the performance efficiency over the conventional model. Keywords—Gesture recognition, Image processing, Sign language, Video pro- cessing. 1 Introduction The activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities. Due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology. In image processing, input is taken as an image later perform processing on that image based on the re- quirement. Many types of input are to be taken in image processing such as a video, image or collect frames from video and after output is produced in the form of an image or set of parameter related to image. The use of image processing for improve the image quality and gather useful information in the image this process is called as 92 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language feature extraction. This image processing techniques can be used for detecting the hand gesture or analyzing actions or many other purposes in various fields. In the system developed these image processing techniques are used for communication purpose in such community where people are vocally impaired and hearing impaired. The communication by the body language is understood as the fundamental trade- mark include that characterizes a hard of hearing group. The critical piece of commu- nication via gestures acknowledgment plots in general society human progress is to guarantee that hard of hearing people have correspondence of chance and full com- mitment in the public arena. Communication via gestures is spoken to fundamentally by ceaselessly changing diverse hand shapes and development by an underwriter. Sign based correspondence is a physical action by using arms, hands, fingers and eye with which we can talk with idiotic and in need of a hearing aide people. Seeing hu- man action from picture game plans is a champion among the most troublesome is- sues in computer vision with various imperative applications, for instance, tunning video perception, content-based video recuperation, human-robot collaboration, and splendid home. The errand is troublesome not in light of between class assortments, camera advancements, establishment confusing and fragmentary obstacle, also to some between class spreads and similarities, for instance, running as opposed to run- ning or walking. Earlier tackles human movement affirmation in video frequently used overall representations. The dynamic signals affirmation applications require the picking up of a high data rate of hand positions by and large gave using development taking after gloves that are set up to do unequivocally recording finger joint developments through flex sen- sors in an immovably fitting glove. Hand flag gives a trademark and characteristic correspondence philosophy for human–computer participation. Successful human computer interactions (HCIs) must be created to allow computer to ostensibly see persistently hand movements. In any case, vision-based hand taking after and move- ment affirmation is a trying issue due to the capriciousness of hand signs, which are rich in diversities on account of high Degrees of adaptability (DOF) required by the human hand. In order to viably fulfill their part, the hand movement HCIs need to meet the requirements with respect to progressing execution, affirmation precision, and vigor against changes and jumbled foundation. The gesture based communication correspondence understanding includes seman- tic examination of hands taking after, hands shapes, hands presentations, sign verbali- zation moreover with basic etymological information talked with head advancements and outward appearances. Motion based correspondence is from different points of view assorted structure talked tongue, for instance, facial and hand stating, references in virtual checking space, and syntactic complexities as illuminated. The huge incon- venience in signal based correspondence affirmation stood out from talk affirmation is to see at the same time assorted correspondence properties of a guarantor, for in- stance, hands and body advancement, external appearances and body act. These property must be viewed as at the same time for a better than average affirmation structure. The second huge issue went up against by motion based correspondence affirmation structure engineers is taking after the underwriter in the confuse of other information by researchers to describe a model for spatial information containing the iJIM ‒ Vol. 12, No. 2, 2018 93 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language components made in tail able in the video. This is studied by many experts as check- ing space. An imperative test stood midst of the correspondence through marking talk. This paper addresses the problem of action recognition that is how to determine the type of action that is happening in a video. Here the problem of video representation is considered that means how to encode videos in a robust way? Which type of repre- sentation is suitable for a wide variety of action classes, tasks and video types? This paper shows the system which is used for recognition of hand gesture of sign lan- guage and translate it into corresponding in Kannada language. Hence the proposed system has wide scope for vocally disable individual to express the feelings and talent on paper, who is residing in rural areas of Karnataka state of India. The rest of the paper is organized as in section 2 gives the detail survey discussion of related papers work carried out so far in this area. In section 3 illustrate the over- view of system architecture and description of its phase. In the section 4 implementa- tion of system is outlined where training and testing algorithms steps are discussed. The results and discussion of proposed system is discussed in section 5. Finally the section 6 conclusion of the work is discussed. 2 Literature Outline In the literature survey section, the history of the earlier work done in this area and there issues are discussed. It contains a record of all the research going in this area. In this section detailed study of the earlier work done on the sign language recognition is discussed. M. R. Abid et al [1] describes in this paper, the state-of-the art Dynamic sign language recognition (DSLR) system for smart home interactive applications. The novel DSLR system comprises two main subsystems: an image processing (IP) module and a Stochastic linear formal grammar (SLFG) module. IP Module used the Bag-of-features (BOFs) and a local part model approach for bare hand dynamic ges- ture recognition from a video. The SLFG module analyzes the sentences of the sign language (i.e., Sequences of gestures) and determines whether or not they are syntac- tically valid. The DSLR system is not only able to rule out ungrammatical sentences, but it can also make predictions about missing gestures, which, in turn, increases the accuracy of our recognition task. And this module makes the aggregate performance of the DSLR system as accurate as 98.65%. Houssem Lahiani et al [2] proposed a system based on SVM for recognizing various hand gesture. The system consist of four steps: hand segmentation, smoothing, feature extraction & classification. With this system all steps can be done by the smartphone. In this paper, for image acquisi- tion, frontal camera of the smartphone is used. After that frames are getting from the video, the color sampling is done which is followed by making binary representation of the hand, and then contours representing the hand were described with convex polygons to get information about fingertips and finally the input gesture was recog- nized using proper classifier. Rishabh Agrawal et al [3] represent the system to recognize hand gesture for hu- man computer interaction, using computer vision and image processing techniques. The proposed system can successfully replace such devices needed for interacting 94 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language with a personal computer and it uses the commercial depth +rgb camera called Senz3D, which is cheap and easy to buy as compared to other depth cameras. The proposed method works by analyzing 3D data in real time and uses a set of classifica- tion rules to classify the number of convexity defects into gesture classes. This results in real time performance and negates the requirement of any training data. Jian Wu et al [4] proposed, fusing information from an inertial sensor and SEMG sensors. An information gain-based feature selection scheme is used to select the best subset of features from a broad range of well-established features. Four popular classification algorithms are evaluated for 80 commonly used ASL signs on four subjects. The experimental results show 96.16% and 85.24% average accuracies for intra-subject and intra-subject cross session evaluation, respectively, with the selected feature sub- set and a support vector machine classifier. The significance of adding sEMG for ASl recognition is explored and the best channel of sEMG is highlighted. Md. Mohiminul Islam et al.[5] come with this system to present a real time HGR System based on ASL, recognition with greater accuracy. This system acquires gesture images of ASl with black background from mobile video camera for feature extraction. For feature extraction “ K Convex Hull” algorithm is used which can detect fingertip with high accuracy. In this system, Artificial neural network (ANN) is used with feed forward, back propagation algorithm for training a network using 30 feature vectors to recog- nize 37 signs of American alphabets and numbers properly which is helpful for HCI system. The total gesture recognition rate of this system is 94.32% in real time envi- ronment. Deniz Ekiz et al [6] present a smartwatch application that recognizes im- portant sign sentences. This method represents a smart watch app that collects 3d accelerometer and 3d gyroscope data from the watch and recorded 8 questions and 13 sentences from 5 people who are fluent in sign language. This system use dynamic time warping to compute the distances between the gestures and templates in all data dimensions. The resulting distances serve as input for discriminating the gestures. We evaluated the discriminative ability of logistic regression. Ohene-Djan et al. [7] locate another arrangement, he utilize Mak-Messenger that has manual decision strategy for signs from a catch board. This approach of physically picking pictures wasn't triple- crown on account of the deficiency of ease of use. 3 System Overview For the developed approach for translation of sign language to text description where a single significant transformation is carried out for a Kannada text description performed. To represent the processing efficiency, a set of sign action consider in training for formulating a text description. This sign action frames are then processed to evaluate the performance for sign language detection. Word processing is carried out as a recursive process of a sign action symbol representation, where each frame data are processed for a HOG features. The frame data are extracted based on the frame reading rate and multiple frames are processed in successive format to extract the region of interest. A system outline to process the real time sign action and to give an optimal frame processing for sign recognition a word level process is performed. iJIM ‒ Vol. 12, No. 2, 2018 95 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language To perform the word processing, the basic approach of the developed system is shown in figure 1. The scope of the system is to provide a platform for dumb individuals to share their views among every one. Analyze human motion from images and video and develop- ing application like Facebook where dumb people communicate with each other. Video is collection of video frames where frames are collected from video later and after that this frames are used for processing. There is a need to analyze the frame and based on that frame action is identified. Preprocessing purpose the blur from images is removing to improve result. Feature extraction can be done using HOG algorithm and we train the support vector machine by using feature collected from algorithm. In real time web camera to get image and that image to extract HOG feature later that feature used for test the support vector machine. Based on the training support vector machine predict the class label as output. In above chapter the architecture of the system is studied in detail. The overall systems perform preprocessing, Feature ex- traction, Classification, Detecting the hand gesture and finally generating text descrip- tion. The proposed system consists of two major phases training and testing. Where the training process is carried out for a developed database as outlined in the next section [8]. 3.1 Training Phase In training module the images extracted from the captured video and are trained by using SVM after that stored in the database by assigning class label. Figure 1 shows the overview of the system in the training section. All the trained images are used to extract features and which are further used for testing. Firstly, through live video the different frames are captured since a video is nothing but a set of images. Then the training is performed on that captured frames. After that, every Image is processed by filtering technique (noise removal, edge detection or shape detection) and applying Histogram oriented gradient (HOG) algorithm is used for feature Extraction. HOG algorithm defines the objects (hand) and motion shapes in the images by describing the intensity gradient and edge detection. After that a gray scale image is generated. This gray image used as input. The output is a list of points on the image each associ- ated to a vector of low-level descriptors. These points are said key points and their descriptors are invariant by rescaling, in-plane rotating, and noise addition and in some cases by changes of illuminant. The gesture captured from the images are used to generate exact meaning and are ranked in English language. Thus whatever done in training is based on hand gesture and motion used to create exact meaning. Thus in training section, meaning of each gesture are insert into database [9]. 3.2 Testing Phase This module test live video and gets the result in terms of segmentation of frames. In this phase, a video is processed and divided into frames and these frames are fur- ther processed by applying the purifying algorithm to remove noise from images. 96 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Median blur technique is used to filter image. The lower part of figure shows the testing phase. After elimination of noise, the features of images are extracted and these features are linking with training videos to recognize text. The system undergo following step to yield the desired result. Concept that focuses on the components or elements of a structure or system and unifies them into a coherent and functional whole according to a particular approach in achieving the objective(s) under the given constraints or limitations. A block diagram is a specialized, structure provides a high- level overview of major system components, key process participants, and important working relationships. In figure1 block diagram functions used for implementation are represented. Video processing, image processing, feature extraction and action detection using SVM classifier are functions used for application implementation. Fig. 1. Overview of the system. 4 System Implementation For the developed approach of sign language detection, where a single significant transformation is carried out, a Kannada text detection is then performed. To repre- sent the processing efficiency, a set of cue symbols is used for formulating a word. This word symbols are then processed to evaluate the performance for sign language detection. Text processing is carried out as a recursive process of a single cue symbol representation, where each frame data are processed for a shape feature. The frame data are extracted based on the frame reading rate and multiple frames are processed in successive format to extract the region of interest. A system outline to process the sign video data and to give an optimal frame processing for sign recognition a text level process is performed. To perform the text processing, the basic approach of the developed system is shown in figure 1. iJIM ‒ Vol. 12, No. 2, 2018 97 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language The text symbols extract from a given video sample, where the video data is pro- cessed frame wise manner and the recurrent frame information are eliminated as re- dundant bits. To perform the frame coding the video frame under process, is pro- cessed using a joint adjacent matching and a singleton region matching algorithm is used for frame processing. In the join adjacent region processing, in this frame pro- cessing, each characters are extracted as a set of image data and processed in a recur- rent manner to extract the feature described. On the process of the text recognition process the video frames are extracted based on the frame rate of the video sample. The video data are processed as an energy correlation, where the video sample is processed in time frame slices. Energy interpolation for video sign recognition gives the advantage of information retrieval based on energy mapping, where energy details are used as an informative parameter for sign language detection. The extracted HOG features are passed to classifier to make a final decision. The classifier logic performs a classification based on searching the best match feature using SVM approach. The recognized character is processed for mapping into the class level [11]. These pro- cessing systems have a variant output format, such as text output. In this section all the methods and techniques that are used for the system implementation are discussed below. For training and testing follows below steps, 1. Video Acquisition: It acquires the video from the user and performs translation of videos into its frames (Multiple images). Then each frame will undergoes prepro- cessing before extracting features and the preprocessing step will be discussed in the next section. Meanwhile video is continue to capture from web camera and is divided into multiple frames. Every individual image called as a frame. Video framing is the process of extracting frames in giving video using video attributes like frame rate. For example the video duration is the 2 min and 29 seconds and frame rate will be 1000 FPS (Frame per second) then extracted frames will be 14900 frames. 2. Preprocessing: This section consists preprocessing on video like noise and blur elimination. Video contain huge amount of frames in that frame contain visual dis- tortion like a video shoot in low light area, voice distortion, light conditions etc. The preprocessing is a common stage in every image processing area. The princi- pal motivation behind preprocessing is to diminish commotion on the edge and im- prove the picture highlight for further handling. The median filter uses the nonline- ar filtering techniques to remove blur from input frame. By using filtering tech- niques to improve image quality and output result. The main idea behind median filtering is replacing every entry with neighboring entry[12]. 3. Feature Extraction: HOG is widely used to extract the feature from the input im- age. HOG is an image processing algorithm. This will detect the object in the im- age by using feature vector. Whenever extract the feature from the image it pro- duced output in the form of feature vectors. Feature contains the local shape infor- mation this can use for many tasks such as classification, detection of objects, track the object. 4. Classification: SVM characterization is basically a double i.e. binary (two-class) order system, which must be altered to deal with the multiclass undertakings in cer- 98 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language tifiable circumstances. SVM order utilizes elements of picture to the group[13]. The characterization utilizes prepared video and group testing video with specific depiction and gives yield. 5. Text Generation: At last when the classification process is completed the equiva- lent grammatical text description will be generated with the help of the class labels which are assigned during the training phase. 4.1 Algorithm for Implementation The proposed system uses two phases training and testing to perform sign to text description. In the both phase uses logical operation to perform the activity such as training and testing algorithm. Training algorithm. Training algorithm is used to train the system with the help of the available data set. The training data set involve different algorithms to train the system [14]. The steps of training algorithm is as follows, Algorithm: Training Input: Consider the different set of sign action video from the signers of different domain. Output: Set of features extracted from each real time sign action signers and is stored in database. Steps: Begin Step 1. Consider the real time sign action from the signer with static background with high camera resolutions. Step 2. Perform frame extraction where the real time sign action data is processed frame wise manner and the recurrent frame information are eliminated as redundant bits. Step 3. Perform preprocessing to remove noise and blurriness effect using median filter algorithm from the real time sign action. Step 4. Convert the gray scale image form extracted frames because it require less processing time than that of the colored image. Step 5. Detect the edge from this sample with the canny edge detection technique. Mathematically Canny edge technique is expressed as, Where, iJIM ‒ Vol. 12, No. 2, 2018 99 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Step 6. Compute the features with the HOG technique. These features are stored into the database in .csv file along with the associated class labels and also mapped features are buffered into an array to formulate the database. Apply HOG feature extraction algorithm to extract the feature from a given image. This algorithm imple- mentation follows 4 steps as, 1. Gradient Computation: The initial step of estimation in many component loca- tors in picture pre-handling is to guarantee standardized shading and gamma val- ues. 2. Orientation Binning: The second step of estimating is making the cell histograms. Every pixel in the cell makes a choice for an introduction construct histogram channel depends with respect to the qualities found in the inclination calculation. 3. Descriptor Block: To represent changes in light and complexity, the angle quali- ties must be privately standardized, which requires gathering the phones together into bigger, spatially associated squares. 4. Block Normalization: Dalal and Triggs investigated four unique techniques for piece standardization. They provide equation for a frame at that point the standard- ization element can be one of the accompanying: L2-standard which is, f = !!! ! !!!!!!!!! ! ! !!!!!!!!! (4) Where, f: A frame v: video sample e: detected object/element from the video Step 7. Similarly repeat the step 1 to 6 for all sign video’s from the data base and extract all features and will be stored in data base one by one. End Testing algorithm. eIn this testing of sign video sample from the data base is con- sidered and before testing it is compulsory to train the data base samples [15-16]. The testing algorithm follows the same procedure as that of the training algorithm with the exception that there is no need to capture the video explicitly the system can capture the video with the web camera installed. The steps of training algorithm is discussed below, Algorithm: Testing Input: Sign action video from the data base. Output: The text description in Kannada language. Steps: Begin Step 1. Consider the sign action from the signer with static background with high camera resolutions. 100 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Step 2. Perform frame extraction where the real time sign action data is processed frame wise manner and the recurrent frame information are eliminated as redundant bits. Step 3. Perform preprocessing to remove noise and blurriness effect using median filter algorithm from the real time sign action. Step 4. Convert the gray scale image form extracted frames because it requires less processing time than that of the colored image. Step 5. Detect the edge from this sample with the canny edge detection technique. Mathematically Canny edge technique is expressed as, Where, Step 6. Compute the features with the HOG technique. These features are stored into the database in .csv file along with the associated class labels has to assign and also mapped features are buffered into an array to formulate the database. Apply HOG feature extraction algorithm to extract the feature from a given image. The algorithm implementation follows 4 steps as, 1. Gradient Computation: The initial step of estimation in many component loca- tors in picture pre-handling is to guarantee standardized shading and gamma val- ues. 2. Orientation Binning: The second step of estimating is making the cell histograms. Every pixel in the cell makes a choice for an introduction construct histogram channel depends with respect to the qualities found in the inclination calculation. 3. Descriptor Block: To represent changes in light and complexity, the angle quali- ties must be privately standardized, which requires gathering the phones together into bigger, spatially associated squares. 4. Block Normalization: Dalal and Triggs investigated four unique techniques for piece standardization. They provide equation for a frame at that point the standard- ization element can be one of the accompanying: L2-standard which is f = !!! ! !!!!!!!!! ! ! !!!!!!!!! Where, f: A frame v: video sample e: detected object/element from the video iJIM ‒ Vol. 12, No. 2, 2018 101 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Step 7. Apply SVM classifier to match the features to make a final decision. The classifiers also predict the meaning of action. The recognized character is processed for mapping into an equivalent text description. Step 8. The equivalent grammatically correct text description is generated. 5 Results and Discussion For the realization of the proposed system, the need of Kannada sign dataset (KSD) is developed using all the constraints to sign and character representation. The Kan- nada (!"#$) is the official local language of the southern Indian state of Karnataka. It is a Dravidian language spoken by about 44 million people in the Indian states of Karnataka, Andhra Pradesh, Tamil Nadu and Maharashtra. The Kannada script is widely used for writing Sanskrit texts in Karnataka. Several minor languages, such as Tulu, Konkani, Kodava, Sanketi and Beary, also use alphabets based on the Kannada script. For the sign action recognition there would be a huge amount of database is generated. As the actions has no limit for development different type of actions with different views and with different perspective are recorded. In this system by studying the signs of the deaf community we group these actions into 50 different domains and each domain has 5 examples [17-18]. Out of these action videos 5 videos samples are trained and tested by the system because of the available machine configuration. That means the system is able to be run only these 5 samples trained video s and the detail of these video is available in the table1. In the Meeting_Video1 is actually a sign action performed by the deaf and cap- tured by the system through the web camera. The first example in the table meeting _Video1 has a size 7.25 MB the meaning of that sign action is Nice to meet you. The system will take total of 10 seconds time to process this sign action to translate and it generate text description in kannada meaning like “!!""##$$%% !!""##$$%%&&"" !!""##$$” as the expected result. Similarly the Time_Video1 sign action video having meaning is What is todays date for this system will take near about 12 second to process and produce the result in kannada as “!!""##$$%% !!""##$$ !!""##?” is as per expectation hence the remark for that is successful. The processing time is 2 seconds more than the previous example because in this example the action part of the user is more [19]. In the table 7.2 the count of expected words actual generated words and expected words are same hence accuracy rate for all the 5 examples are is 100% because we have selected testing samples from Database which are already trained. In this sec- tion, time require to process video is shown. The system in turn uses time in millisec- onds which is required for the processing of each frame is shown in figure. Three different graphs for each video are drawn. First graph is for preprocessing time, se- cond graph is for edge detection time and third graph is for the feature extraction. As the number of frames increases the time required for processing is also increases [20]. 102 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 1. Over all summary of the implemented system. Sr. No Video Sample Time Duration (Sec) Processing Time (Sec) Expected Output Actual Output Recogni- tion Rate in % 1. Meeting _Video1 20 10 !"#$% !"#$%&" !"#$. Nice to meet you. !"#$% !"#$%&" !"#$. Nice to meet you. EWC= 4 AWC= 4 Accura- cy=100% 2. Time _Video1 25 12 !"#$% !"#$ !"#? What is todays date? !"#$% !"#$ !"#? What is todays date? EWC= 4 AWC= 4 Accuracy= 100% 3. Place_Vi deo1 30 13 !"# !"#$% !"#. This is dangerous place. !"# !"#$% !"#. This is danger- ous Place. EWC= 4 AWC=4 Accuracy= 100% 4. Gossips _Video1 50 20 !"#$ !"# !"# !"#$? Hello What is your Name? !"#$ !"# !"# !"#$? Hello what is your Name? EWC= 5 AWC= 5 Accuracy= 100% 5. Theater _Video1 26 10 !"# !"#$%& !"# !"#$ !"#$%&'#(. I go to movie thea- ter. !"# !"#$%& !"# !"#$ !"#$%&'#(. I go to move theater. EWC= 4 AWC= 4 Accuracy= 100% 5.1 Result Analysis for Video Samples The time taken by each Median filter, Canny and HOG algorithm are discussed and processing time required for video sample 1(Meeting_Video1). The following table and graph shows the time required for preprocessing, Canny Edge detection and HOG algorithm. Preprocessing (Median filter). The table 2 gives the details about the frames ex- tracted and processing time in milliseconds of a video sample and its corresponding graph is shown in figure 2 where the preprocessing time Vs frame extracted is plotted. iJIM ‒ Vol. 12, No. 2, 2018 103 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 2. Preprocessing time Sr. No. Video sample Frames Extracted Processing Time (millisecond) 1. Meeting_Video1 Frame 1 400 Frame 2 450 Frame 3 400 Frame 4 450 Frame 5 400 Frame 6 450 Frame 7 450 Fig. 2. Graph for preprocessing of each frame. Canny Edge Detection (Canny Edge Detector). The table 3 gives the details of edge detection processing time and its feature extraction of sample video and its cor- responding graph is shown in figure 3. 104 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 3. Edge detection Sr. No. Video Frames Extracted Processing Time (millisecond) 1. Meeting_Video1 Frame 1 450 Frame 2 400 Frame 3 400 Frame 4 400 Frame 5 500 Frame 6 450 Frame 7 500 Time in milliseconds require for canny edge detection technique for each frame of videos is shown. As size of frames increase, the time requires processing that frames also get increases. Fig. 3. Graph for edge detection time. Feature Extraction (HOG). Time required for feature extraction from each frame is shown in following table 4 and its graph is also shown in figure 4. The HOG algo- rithm require more time as compared to edge detection and preprocessing. iJIM ‒ Vol. 12, No. 2, 2018 105 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 4. Feature extraction time. Sr. No. Video Frames Extracted Processing Time (millisecond) 1 Meeting_Video1 Frame 1 3000 Frame 2 4500 Frame 3 4500 Frame 4 3500 Frame 5 5000 Frame 6 5000 Frame 7 4500 Fig. 4. Graph for feature extraction. 5.2 Analysis of Overall Videos In this section, overall analyses of video samples are discussed here. In the table 5 video samples of different domain are taken as an input and it shows the number of frames extracted for each video. Dissimilar frames are considered and the similar frames are discarded. In figure 5 shown a graph for comparison of frames. 106 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 5. Number of frames comparison. Sr. No. Domain Number of Frames Extracted 1. Meeting 7 2. Theater 5 3. Gossips 6 4. Place 8 5. Gratitude 7 Fig. 5. Graph for comparison of frames. From this analysis the average number of frames and the overall accuracy of ex- traction of the frame can be calculated in percentage as, Average number of frames = !"!!""!!!!!!"#$%&#!'!!"#$%&!!"#$!!"#!!!"#$%& !"#$%&!!"!!"#$%&!!"#$%!!"#!!"#!$%& Therefore, Average number of frames = !!!!!!!!! ! = 6.5 ~ 7 frames So finally 7 frames are to be extracted from the video to get the expected output in text description from that video. 5.3 Analysis of Processing Time The following table 6 shows total processing time taken by each video. It is the av- erage time taken by each algorithm i.e. Median Filter, Canny and HOG. Processing time is directly related to the video size and the number of objects present in each frame. More the video size, processing time will be more. In figure 6 shows a graph for time comparison. iJIM ‒ Vol. 12, No. 2, 2018 107 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Table 6. Time comparison. Sr. No. Domain Time in Milliseconds 1. Meeting 10000 2. Theater 21000 3. Gossips 47000 4. Place 10000 5. Gratitude 12000 Fig. 6. Graph for time comparison. From this analysis the average number of time and the overall accuracy come up with the processing time can be calculated in percentage as, Average processing time = !"#$%$&'()!!"#$%&&'()!!"#$!!"#$%!!"!!"#!!!"#$%!!"#$!!"##$%$&'!!"#$%& !"#$%&!!"!!"#$%&!!"#$%!!"#!!"#!$%& Therefore, Average number of frames = !""""!!"###!!"#!!!!""""!!"### ! = 20000 millisec- onds That means an average there is 20000 milliseconds (20 seconds) are required to process each video. If any video are taken and trained into the system then definitely it will take minimum 20 seconds to gives an output. This processing time is an aver- age so it may be varies. 108 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language 5.4 Analysis of Recognition Rate for Trained Video For the understanding of the working capability of the system the analysis system result is necessary. In this section the recognition rate analysis of trained video sample is done. The recognition rate of the system can be calculated based on the number of words in a generated text description to the expected words which must have been generate by the system as per its design. Fig. 7. Analysis of recognition rate for trained samples. The trained sample are such videos having features are extracted in training stage are stored into the database. Hence when these examples are used for the training it generates the exact text description as expected by the signer. Therefore the accuracy of all trained samples is 100. The figure 7 shows all the trained example explained in the table 1 with its respective accuracy rate. The accuracy of the recognition rate is calculated as, Accuracy = !"#$%&!!"!!"#$!%!!"#$%!!"!!!!!!"#!!!"#$%&'(&)*!! !"#$%&!!"!!"#$%&$'!!!"#$%!!"!!!!!!"#!!!"#$%&'(&)* * 100 Here the Meeting_video1 sample has expected meaning is “Nice to meet you.” And the actual generated text is “Nice to meet you” The expected and actual output is same here hence the word count is also same therefore by putting it in above formula it gives its recognition accuracy rate is, Accuracy rate for meeting_video1 = !! ! !! !"" = 100%. Likewise accuracy rate for all the trained example in table 7.2 is calculated and the graph is generated. The average accuracy for trained video sample is calculated as, iJIM ‒ Vol. 12, No. 2, 2018 109 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language Average accuracy = !"#!!"!!""#$!"%!!"!!"#!$!"#!"!!"#$%!!"#$%& !"#$%&!!"!!!!!!"#$%!!"#$%& Average accuracy of trained video = !""!!""!!""!!""!!""!!""!!""!!""!!""!!"" !" = 100 Hence the average accuracy of trained video is 100 % because the video samples are already trained during training. 6 Conclusion The developed system is processed for real time sign action to yield grammatically correct words. Where different samples are considered for the feature extraction, the class label assigned to extracted features and finally generate text. The purposed ap- proach results in higher retrieval accuracy as compare to conventional processing system. This system result in lower descriptive feature with a minimum processing frames which hence achieved objective of the higher accuracy and lower processing overhead. The system can be further continue to minimize the processing time and high recognition rate a different technique can be applied for future work. In future we are looking at development of system which is signer independent and will gener- ate summery. 7 References [1] Muhammad Rizwan Abid, Emil M. Petriu, Fellow, IEEE, and Ehsan Amjadian, “Dynamic Sign Language Recognition for Smart Home Interactive Application using Stochastic Lin- ear Formal Grammer ”, IEEE Transactions On Instrumentation And Measurement, Vol. 64, No. 3, March 2015. [2] Houssem Lahiani, Mohamed Elleuch and Monji Kherallah, “Real Time Hand Gesture Recognition System for Android Devices”, 2015, 15th International Conference on Intel- ligent Systems DeSign and Applications (ISDA). https://doi.org/10.1109/ISDA.2015. 7489184 [3] Rishabh Agrawal and Nikita Gupta, “Real Time Hand Gesture Recognition for Human Computer Interaction”, 2016 IEEE 6th International Conference on Advanced Computing. [4] Jian Wu, Student Member, IEEE, Lu Sun, and Roozbeh Jafari, Senior Member, IEEE, “A Wearable System for Recognizing American Sign Language in Real Time Using IMU and Surface EMG Sensors”, IEEE Journal of Biomedical and Health Informatics, Vol. 20, No. 5, September 2016. [5] Md. Mohiminul Islam, Sarah Siddiqua, and Jawata Afnan, “Real Time Hand Gesture Recognition using Different Algorithm Based on American Sign Language”, ISBN.978-1- 5090-6004-7/17/ ©2017 IEEE. [6] Deniz Ekiz, Gamze Ege Kaya, Serkan Bu!ur, Sıla Güler, Buse Buz, Bilgin Kosucu and Bert Arnrich, “Sign Sentence Recognition with Smart Watches”, ISBN.978-1-5090-6004- 6/17/ ©2017 IEEE. [7] Ohene-Djan, J., Zimmer, R., Bassett-Cross, J., Mould, A. and Cosh, B., Mak- Messenger and Finger-Chat, Communications Technologies to Assist in Teaching of Signed Lan- 110 http://www.i-jim.org Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language guages to the Deaf and Hhearing. In: IEEE International Conference on Advanced Learn- ing Technologies, 2004. pp. 744 – 746. [8] M. R. Abid, L. B. S. Melo, and E. M. Petriu, “Dynamic Sign Language and Voice Recog- nition for Smart Home Interactive Application”, in Proc. IEEE Int. Symp. Med. Meas. Appl. (MeMeA), May 2013, pp. 139–144. [9] X. Sun, M. Chen, and A. Hauptmann, “Action Recognition via Local Descriptors and Ho- listic Features,” in IEEE Conference on Computer Vision and Pattern Recognition, Mi- ami, FL,USA, 2009, pp. 58–65. [10] T. Wenjun, W. Chengdong, Z. Shuying, and J. Li, "Dynamic Hand Gesture Recognition using Motion Trajectories and Key Frames," in Proc. second International Conference. Advance Computing Control (ICACC), March 2010, pp. 163–167. [11] Ramesh M. Kagalkar and Dr. S.V. Gumaste, “Curvilinear Tracing Approach for Extracting Kannada Word Sign Symbol from Sign Video”, International Journal of Image, Graphics and Signal Processing, Volume 9, pp.18-27, Published Online September 2017 in MECS (http://www.mecs-press.org/) https://doi.org/10.5815/ijigsp.2017.09.03 [12] Ramesh M. Kagalkar and Dr. S.V. Gumaste, “ANFIS Based Methodology for Sign Lan- guage Recognition and Translating to Number in Kannada Language”, International Jour- nal of Recent Contributions from Engineering, Science & IT (DBLP indexed Journal), Volume 5, Issue No. 1, pp. 54-66, 2017. [13] Ramesh M. Kagalkar and Dr. S.V.Gumaste, “Gradient Based Key Frame Extraction for Continuous Indian Sign Language Gesture Recognition and Sentence Formation in Kan- nada Language: A Comparative Study of Classifiers”, International Journal of Computer Sciences and Engineering, Volume 4, Issue 9, 2016. [14] Ramesh M. Kagalkar and Dr. S.V.Gumaste, “Review Paper: Detail Study for Sign Lan- guage Recognition Techniques”, CiiT International Journal of Digital Image Processing, Volume 8, No 3, 2016. [15] Rashmi Hiremath and Ramesh M. Kagalkar, ”Methodology for Sign Language Video In- terpretation in Hindi Text Language”, International Journal of Innovative Research in Computer and Communication Engineering, Volume. 4, Issue 5, May 2016. [16] Rashmi Hiremath and Ramesh M. Kagalkar, “Methodology for Sign Language Video Analysis into Text in Hindi Language”, CiiT International Journal of Fuzzy Systems, Volume 8, No 5, 2016. [17] Ramesh M Kagalkar and Nagaraj H.N., “New Methodology for Translation of Static Sign Symbol to Words in Kannada Language”, International Journal of Computer Applica- tions 121(20):25-30, July 2015. [18] Ramesh M. Kagalkar, and Nagaraj H.N., and Dr. S.V.Gumaste, “International Journal of Advanced Research in Computer and Communication Engineering”, Vol. 4, Issue 7, July 2015. [19] ”A Novel Technical Approach for Implementing Static Hand Gesture Recognition”, Inter- national Journal of Advanced Research in Computer and Communication Engineering, Volume. 4, Issue 7, July 2015. [20] Amitkumar and Ramesh M. Kagalkar, “Sign Language Recognition for Deaf Sign User”, International Journal for Research in Applied Science and Engineering Technology (IJRASET), Volume 2, Issue 12, December 2014. iJIM ‒ Vol. 12, No. 2, 2018 111 Paper—Mobile Application Based Translation of Sign Language to Text Description in Kannada Language 8 Authors Ramesh M. Kagalkar is Research Scholar, VTU-RRC, Visvesvaraya Technologi- cal University (VTU), Belgaum, Karnataka, India. Shyamrao V Gumaste is Professor, Dept. Computer Engineering, MET League of College, Nashik, Maharashtra, India. Article submitted 04 December 2017. Resubmitted 04 January 2018. Final acceptance 05 March 201^8. Final version published as submitted by the authors. 112 http://www.i-jim.org iJIM – Vol. 12, No. 2, 2018 Mobile Application Based Translation of Sign Language to Text Description in Kannada Language