LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 128 Chunking Phrase to Predict Pause Break in Pontianak Malay Language Arif Bijaksana Putra Negara a1 , Yulia Magdalena a2 , Rudy Dwi Nyoto a3 , Herry Sujaini a4 a Informatics Department, Tanjungpura University Prof.Dr.H.Hadari Nawawi Street, Pontianak, Indonesia 1 arif.bpn@informatika.untan.ac.id 2 ymyuliamagdalena@gmail.com 3 rudydn@informatika.untan.ac.id 4 herry_sujaini@yahoo.com Abstract Pause break is one of the indicators of speech to be easily understood in the Text-to-Speech System. This research aims to improve the accuracy of pause prediction in Pontianak Malay Language Sentences based on earlier research using a chunking phrase. This research is done as one of the efforts to preserve Pontianak Malay Language in order not to become extinct as a local language. Chunking method uses RegexpParser function in Natural Language Toolkit to crop sentences into phrases based on the Part of Speech type. In this research, the authors have developed a new grammar and pause break rule that is different from the earlier research to increase the accuracy of pause prediction. The data used is 500 Pontianak Malay Language sentences that have been recorded by a Pontianak Malay Language native speaker to get the pause break analysis. The pause consists of a short pause (symbolized as “/1) and a long pause (symbolized as “/2”). The tests were a test of pause break compatibility in one sentence and a test using f-measure, recall, and precision parameters. Based on the tests that have been done, the new grammar rule and pause break rule from this research have a better prediction accuracy than the earlier research with the correct predictive value of sentences increasing by 23% from the earlier rule. Keywords: Pause Break, Chunking, Grammar Rule, Pause Break Rule, Accuracy, Text-to- Speech, Pontianak Malay 1. Introduction A language is a communication tool used in human life. In Indonesia, besides Indonesian as the national language, there are many languages born and developed in certain regions and are called local languages. Pontianak Malay language is a Malay dialect spoken by the people of Pontianak City, Kubu Raya Regency, and Mempawah Regency and has similarities with Malay Peninsula Malay (Johor-Riau) [1]. This language has been used as communication tools in Pontianak. Based on the results of the population census conducted by Statistics Indonesia, the percentage of Malay language usage used by the people of West Kalimantan reached 20.45% (1,615,978 million people) of the total population of West Kalimantan [2]. The efforts to preserve the Pontianak Malay language in order not to become extinct and abandoned because of the influence of globalization must still be done, especially by using text-to-speech technology. Text-to-speech is a process in which input text is first analyzed, then processed and understood, and then the text is converted to digital audio and the spoken [3]. To develop a speech synthesis to Pontianak Malay Language in order to preserve the local language, predicting pauses from text is an essential part of the text-to-speech system. The presence of pauses supports listeners in parsing the speech stream and enables them to better digest the incoming information [4]. Speech pauses are obtained from beheading phrases. Phrases are grammatical units consisting of one or more words [5]. To get phrases from a sentence can use mailto:4herry_sujaini@yahoo.com LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 129 the chunking method by structuring speech based on grammar rules. Speakers and listeners produce and process language in chunks [21]. In addition to being a component in parsing, chunkers are also used for the development of different natural language processing applications such as information retrieval, information extraction, named entity recognition, etc [22]. The use of chunking helps readers understand the provisional structure of a text and then aids the reader in restructuring and organizing the content of each sentence. The chunking method can use the RegexpParser function in the Natural Language Toolkit to cut sentences into phrases based on the Part of Speech (PoS) type [6]. A regex parser uses a regular expression defined in the form of grammar on top of a POS-tagged string. Grammar rules are needed to define the structure of a chunk. Chunk represents sentence fragments that occur when reading all sentences [7]. Based on this, a pause break can be determined using phrases from the chunking method. Research on chunking or can be called shallow parsing in Pontianak Malay has been done, where the grammar rules were developed by structuring sentences into S-P-O-K (subject, predicate, object, and adverb) rule [8]. The test results obtained in the form of total f-measure value is 0.64. Recall and precision values for single sentences are 0.78 and 0.74, and compound sentences are 0.67 and 0.57. The ruled that used is only grammar rule and did not check for the pause’s type. Of the 168 sentences, the match value with speaker pauses is 40.4% or 68 sentences. The researcher then explained this is because the rule is based on the sentence structure so the phrases did not refer to the pause phrase from the speaker. Pause is an essential element in the analysis of a text, which also gives good control over interactions during the processes of text reading and explanation of understanding [24]. Insertion of the right amount of pauses at the right places adds to the naturalness of the synthesized speech [9]. Appropriate pausing in the speech can enhance the intelligibility and make the speech more persuasive [18]. Pause also was used to indicate that upcoming words are important and give a sign to the listeners that they should pay attention to those words [19]. There are two factors that influence the speech pausing style, speaker doubts when speaking and breathing method [10]. Abney (1991) explained that when we were reading a sentence, we tend to group words into phrases [7]. Thus, a pause occurs not only based on the influence of the S-P-O-K rules but can be influenced by the speakers themselves. There is some research about the pause break prediction that has been done which is related to this study. Research about a pause break in English Corpus has been done by using nltk_lite’s regular expression chunk parser [11]. There were two tests, one to the input without full stop and comma with 40.5% value, and the other is input with full stop and comma with 43.5%.In this research, nltk_lite’s regular expression chunk parser can be used to predict the pause in the English corpus. There is research for the Chinese language based on a maximum entropy model. This used the PoS model and PoS model and lexical to predict phrase break. The result is 62.91% accuracy for PoS model and 65.24% accuracy for PoS and lexical model [12]. In other research, a pause can be predicted by the Hidden Markov model in the Indonesian Language [13]. The research uses the PoS tag tool as one of the features for HMM from Wicaksono’s research in 2010 [14]. The result of the recall test is 13.2%, precision with 36.4%, and f-score with 19.4%. Based on the description above, the researcher intends to develop new grammar rules and pause rules based on the analysis of speaker’s pause to categorize chunk phrases in Pontianak Malay language by chunking method to increase the accuracy of pauses prediction in Pontianak Malay sentences so it can be used to develop a good Pontianak Malay Language speech synthesis system. This new PoS tag for Pontianak Malay Language also made in this research. LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 130 2. Research Methods Figure 1. Research Methods 2.1. Data Preparation The data used is a corpus of 500 Malay Pontianak language sentences from “Sepok”, a Pontianak Malay Language Book [15] consisting of single sentences and compound sentences and each sentence is recorded and spoken by a male speaker who is fluent in the Malay dialect of Pontianak with a daily speaking style. The recordings are stored in a WAV audio format, with 16-bit resolution and 44100 Hz sampling rate. 2.2. Pause Tagging in Wavesurfer The prepared sound file is then processed using the wavesurfer application to mark the phonemes and pause event. The pause event occurred when the sound wave signal in wavesurfer is flat, which is to identify that the speaker is taking a pause when he is speaking. Each pause event is marked with a “sil” and stored in a file with the format * breaks. 2.3. Categorizing the Pause Index and Marking the Pause in the Pontianak Malay Sentence Text After all sound files are marked, the “sil” data is analyzed and categorized as a paused index. The sentence then will be marked with a paused index by matching the duration of pause from the sound file that has been marked with “sil”. Table 1 presents the pausing index to determine how long duration for pause “1” and pause “2”. Table 1. Pause Index Pause Index Explanation Duration of pauses ( in second) 0 No pause 0 - < 0.025 1 Short pause 0.023 - <= 0.33 2 Long pause > 0.33 , Comma , . End of sentence . In table 1, the duration of pauses for pause index “0” is 0 until 0.025 seconds. To mark a paused index for 1 (symbolized as “/1”) is for the duration of sil in the sound file in 0.023 until 0.33 second. For the pause index 2 (symbolized as “/2”) or can be called long paused is for the duration of sil that bigger than 0.33 second. For a comma and full stop, the symbol is the same. 2.4. PoS (Part-of-Speech) Tagging in Pontianak Malay Language Sentences The 500 Malay Pontianak language is tagged with Pontianak Malay part-of-speech tagger made in this research. Part of Speech Tagging or word class labeling is a process that gives a word LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 131 class label to each word in sentence or text [20]. PoS Tagging is one of the stages of Natural Language Processing to determine the class of words [23]. Word class consists of adjectives, nouns, verbs, adverbs, prepositions, pronouns, conjunction, etc. This part-of-speech tagger is made for Pontianak Malay Language based on the other PoS set references [8][16][17]. Table 2 presents the Pontianak Malay part-of-speech tag. Table 2. Part-of-Speech Tag for Pontianak Malay No PoS Description Example 1 VBR Reduplicatio n Verb Jalan-jalan, poto-poto 2 VBK Conjugation Verb Bersalam- salam, berputar- putar 3 VBT Transitive Verb Makai, nenggek, njajah 4 VBI Intransitive Verb Betanyak, balek, nuron 5 IN Prepostion di, ke, dari , pade 6 UH Interjection Oi, woi, alamak 7 AR Articulus Sang, si 8 RP Particle pon, lah, jak 9 JJ Adjective kaye, lawar, pandai, budoh 10 CON Conjunction dan, kalok 11 OP Open Parenthesis ( { [ 12 CP Close Parenthesis ) } ] 13 . Sentence Terminator .! ? … 14 . Comma , 15 : Colon : : 16 SYM Symbol *%#&@ 17 CR Currency Rp, $ 18 MD Modal nak, haros 19 NEG Negation bukan, jangan , tadak 20 SL Slash / 21 DS Dash - 22 QT Quotation " ' 23 WP WH- Pronoun Ape, siape, berape 24 WDT WH- Determiner Ape, siape, barangsiap e 25 DT Determiner ini, ni , tu, tu, tuh 26 FW Foreign Word wonderful, story No PoS Description Example 27 US Unit Symbol Gr, Kg, Cm 28 CDP Primary Numeral Satu, duak, tige 29 CDO Ordinal Numeral Kesatu, Keduak, ketige 30 CDI Irregular Numeral Beberape, segale, semue 31 CDF Fraction Numeral Setengah, seperempa t 32 CDA Auxiliary Number biji, ekor, buah, orang 33 CDC Collective Numeral ratusan, ribuan, pulohan 34 RB Adverb paleng, sementara 35 WPR B WH-Adverb Cemane, ngape 36 FRB Adverb of Frequency jarang, sering, kadang- kadang 37 DRB Adverb of Degree agak, hamper, cukop 38 TRB Adverb of Time udah, belom, dulok, sekarang 39 PRP Personal Pronoun aku, saye, kau, die 40 PRL Locative Pronoun sanak, sine, situk 41 PRN Number Pronoun satu- satunye, dua- duanye 42 NNP Proper Noun Eropa, Indonesia, Belanda 43 NNG Genitive Common Noun bukunye, rumahnye 44 NNC Countable Common Noun buku, rumah, karyawan mailto:*%25#&@ LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 132 No PoS Description Example 45 NNU Uncountable Common Noun aek, gula, nasi, ujan No PoS Description Example 46 NN Common Noun Martabat, janji There is 46 Part-of-Speech tags that made in this research. We can look in table 2, for example for words like “Oi, Woi, Alamak” in table 2 number 6 is categorized as PoS “UH” or Interjection. So, if there is a sentence like “Alamak!”, it will be tagged in PoS became “Alamak/UH ./!”. 2.5. Grammar Rule Development Pause event data from point 2.3 and the corpus tagged with PoS from point 2.4 then be analyzed to make grammar rule and pause rule. Grammar rule is for the chunking process. This grammar rule classifies phrases into six types of phrases: TP (Questioning Phrases), BP (Numeric Phrases), NP (Noun Phrases), KP (Connection Phrases), VP (Verb Phrases), and AP (Adverb Phrases). New grammar rule is made by analyzing the pause event from speaker that occurred in the sentences make the pause segment into a chunking phrase rule with the help of regular expression. Table 3. Regular Expression Characters Meaning Characters regular expressions of characters meaning <> Determination of part-of-speech tags ? nothing or one of the previous items * Nothing or more than previous items + One or more than previous items | Matching one item with another The result of the analysis is 19 new grammar rules for Malay Pontianak language based on the pause event from the native speaker. Figure 2. Grammars Rule for Chunking Process LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 133 The purpose of the grammar rules in Figure 2 is to be used in the next chunking method.This rule will make the word in sentences that we have been input to be categorized in a phrase that has been made in the rule. For example in rule 3 in figure 2 : TP1 : { * + + | + * } , If we have a sentence that consists of word in PoS that included in that rule, the sentence will be cropped into that rule name, for example TP1. For example in sentences “Ikot ndak”, if it tagged with PoS in Table 2 it became “Ikot/VBI ndak/NEG”. When we read the sentences by the grammar rules, the rules would categorize it as TP1 in rule 3 because it contained the same pattern with the rule and became : Figure 3. Example of grammar rules 2.6. Chunking the Phrase Using Chunking Method The chunking process is made to chop sentences into pause phrases using RegexpParser in NLTK. The process of chunking can be seen in Figure 4. Figure 4. Chunking Process Using NLTK, when we input the Pontianak Malay Language with PoS Tag, the sentences then will be identified by the PoS label then will be processed by grammar rule to be chunked into chunking phrases. We can look in figure 4, when we have a Pontianak Malay Language Sentence that has been tagged with PoS : “Semue-mue-e/PRN tepat/DRB waktu/NNU ./.” , the next step to do is to split the word and the PoS tag so it can be processed in the next step. After that, the grammar rule in figure 1 will categorize each word into phrases that have been formed in grammar rule. In the example, the sentences are categorized into “ Rule BP : (BP Semue-mue-e/PRN ) and Rule AP2 (AP2 tepat/DRB waktu/NNU ). All of the sentences in this LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 134 research are processed in this step so it can be analyzed to get pause rule and can be implemented to make a pause predict. 2.7. Pause Rule Development and the Implementation Phrase fragments from the chunking process are analyzed to get the type of pause that occurs based on the incidence of the speaker. The pause type consists of two short pauses (symbolized as / 1) and long pauses (symbolized as / 2). The results of the analysis are then used as a pause rule to mark short pauses and long pauses at the pause prediction stage. Figure 5 presents the process of pause rule checking. Figure 5. Pause Rule Process The pause rule that has been made will be implemented in this pause rule process. The phrase fragments from figure 3, will be processed in pause rule checking. For example : (BP Semue-mue-e/PRN ) and (AP2 tepat/DRB waktu/NNU), in pause rule when BP run into AP2 then it will be marked as short pause (symbolized as “/1”) and became : (BP Semue-mue-e/PRN )/1 (AP2 tepat/DRB waktu/NNU) and the final sentence would became: “Semue-mue-e/1 tepat waktu. For another example, if the phrase fragments are: (VP Kau/PRP bikin/VBT) (NP janji/NN jam/NN (BP limak/CDP) , (KP make/CON) (NP jam (BP limak/CDP))….., in pause rule when VP run into NP there is no pause , but when NP run into “,” it will be marked as long pause (symbolized as “/2”). If KP runs into NP it will be marked as short pause (symbolized as “/1”), so the phrase fragments became: (VP Kau/PRP bikin/VBT) (NP janji/NN jam/NN (BP limak/CDP)/2 , (KP make/CON)/1 (NP jam (BP limak/CDP)) ….. , and the final sentence would become: “Kau bikin janji jam limak/2, make/1 jam limak ………………….” After all the process, then the output from this prediction process is tested using pause break accuracy in one sentence and a test using f-measure, recall, and precision parameters. In chunking method, there is no training processing because it based on the rule that has been LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 135 made. The prediction system is built in a web form and can be accessed on http://203.24.50.138:8027/prediksi_jeda/. 3. Result and Discussion This research result is tested using two tests, first is pause break compatibility in one sentence testing and the second test is using precision, recall, and f-measure testing. 3.1. Pause Break Compatibility in One Sentence Testing This testing is done to see the similarity of the occurrence of pauses in the original sentence of the corpus which has been marked the pause event according to the speech of the speaker and the predicted sentence from the chunking process. The total sentences tested were 500 sentences from speaker sentences and 500 sentences as a result of the chunking process. There are two tests carried out, namely testing using the new rule compared to the previous rule from previous research [8]. The example of the test can be seen in Table 4. Table 4. Example of Pause Break Compatibility in One Sentence Using New No Original Pause from Speaker Chunking Phrase Prediction Short Pause + Long Pause Long Pause Same Not Same Same Not Same 1 Kau bikin janji jam limak/2, make/1 jam limak/1 kau haros datang Kau bikin janji jam limak/2, make/1 jam limak/1 kau haros datang √ x √ x 2 Manelah negare kau tuh nak maju/2 kalok tebiat pemerintah- e tak tentu rudu macam itu Manelah negare kau tuh nak maju/2 kalok tebiat pemerintah- e/1 tak tentu rudu macam itu x √ √ x For the example of the test compared to the previous research can be seen in table 5. In this test, we only see if the phrase fragment of the pause event is same or not because in the previous research, there is no pause index categorization. Table 5. Example of Pause Break Compatibility in One Sentence Using Previous Rule No Original Pause from Speaker Previous Rule Prediction Pause Same Not Same 1 Kau bikin janji jam limak/2, make/1 jam limak/1 kau haros datang Kau bikin/ janji jam limak/ , make jam limak/ kau haros datang x √ 2 Manelah negare kau tuh nak maju/2 kalok tebiat pemerintah-e tak tentu rudu macam itu Manelah negare kau tuh nak/ maju/ kalok tebiat pemerintah- e/ tak tentu rudu macam itu x √ In Tables 6 and 7 we could see the testing results. The result is can be seen in the accuracy columns. http://203.24.50.138:8027/prediksi_jeda/ LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 136 Table 6. Pause Break Compatibility in One Sentence Using New Rule Testing type Number of Sentences Accuracy The appearance of a short pause and long pause 500 33.6% (168 sentences are correct) The appearance of a long pause 500 72.8% ( 364 sentences are correct) Table 7. Pause Break Compatibility in One Sentence Using Previous Rule Testing type Number of Sentences Accuracy The appearance of pause 500 10.6% ( 53 sentences are correct) The accuracy in the table is obtained from the number of sentences that are correctly divided into all of the numbers of sentences. The accuracy told about the chunking phrase accuracy into predicting pause in Pontianak Melayu Sentences. The chunking phrase has a higher accuracy when predicting a sentence with a long pause. But in the sentence that contains a short pause, the accuracy only 33.6% out of 100%. From the test, we could also see that the accuracy value from the new rule developed in this research is higher than the previous one. In the previous rule, the rule only makes phrases without knowing which is a short and long pause, so there is no test for the appearance of a long pause. 3.2. Precision, Recall, and F-Measure Testing The evaluation of the prediction is also evaluated in terms of precision, recall, and F-Measure. Precision is the percentage of correct guessed chunks.It is obtained by the total amount of correct chunking phrase and the wrong fragment in the prediction sentences. Meanwhile recall is the percentage of correct chunks were guessed. It is obtained by the total amount of correct chunking phrases and fragments of pauses that were not taken in the original sentence. F- measure is the harmonic mean of precision and recall. 3.2.1. Precision, Recall, and F-Measure Testing to Long and Short Pause. The testing for a long and short pause in divided into five tests, namely comparing sentences of 100 sentences, 200 sentences, 300 sentences, 400 sentences, and 500 sentences. The test results can be seen in Table 8 and Figure 6. Table 8. Summary of Testing Value for Long and Short Pause Testing No Number of Sentences Precision Recall F- Measure 1 100 0.449 0.475 0.461 2 200 0.448 0.475 0.462 3 300 0.448 0.475 0.461 4 400 0.448 0.475 0.462 5 500 0.448 0.475 0.462 LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 137 Figure 6. Precision, Recall, and F-measure Testing Value Chart for the Long and Short Pause In this testing, we could see in Table 8 and Figure 6, the precision value or the percentage of correct guessed chunks for the sentences is almost the same and the recall is same. The harmonic mean or the f-measure value is almost the same in 0.46. The value that almost same showed in Figure 6 is meant that the chunking phrase makes in this research based on the rule to predict the pause predict is consistent in predicting the pausing index. The value in the test which is in the range of 0.4 due to chunking prediction is not accurate due to the low precision value. Many irrelevant phrases or pause phrases that have not been properly formed. This wrong pause phrase is because the grammar rule forms phrases according to the type of post that appears in the sentence. Short pauses have a pattern of pauses that vary from the speaker which causes the appearance of pauses to be unequal. 3.2.2. Precision, Recall, and F-Measure Testing to Long Pause The testing for a long pause in divided into five tests, namely comparing sentences of 100 sentences, 200 sentences, 300 sentences, 400 sentences, and 500 sentences. The test results can be seen in Table 9 and Figure 7. Table 9. Summary of Testing Value for Long Pause Testing No Number of Sentences Precision Recall F- Measure 1 100 0.746 0.703 0.724 2 200 0.746 0.702 0.723 3 300 0.746 0.702 0.723 4 400 0.746 0.701 0.723 5 500 0.746 0.701 0.723 Figure 7. Precision, Recall, and F-measure Testing Value Chart for the Long Pause Number of Sentences Number of Sentences LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 138 In this testing, we could see in Table 9 and Figure 7, the precision value or the percentage of correct guessed chunks for the sentences and the recall is same. The harmonic mean or the f- measure value is almost the same in 0.72. The value that almost same showed in Figure 6 is meant that the chunking phrase makes in this research based on the rule to predict the pause predict is consistent in predicting the pausing index. The precision testing value shows the same number at 0.746 which means that the rule grammars and the pause rule succeed in predicting the right fragment for all sentences from 100 sentences to 500 sentences. The recall value is more varied because there are still fragments of phrases that do not match the speakers' pause phrases because the rules do not match. The f-measure value has almost the same value and is classified as good which is 0.72. The prediction of long pauses has better and higher values because based on the speakers' pauses, the location of the long pauses tends to have a stop pattern in the same phrase so that the grammar rule and the paused rule created can predict the gap well. 3.3. Analysis of The Test Results Based on the results of pause break compatibility in one sentence, the value of the accuracy has increased by 23% value. The new rule has better accuracy in predicting pause based on the speaker’s speech. In precision, recall, and f-measure testing, based on Tables 8 and 9, long pause prediction has a better value. This is because, based on analysis while making grammar and pause rule, the long pause is easier to be formed than the short pause. Based on the speaker, the short pause has a varied and different pattern in each sentence which makes the rule cannot predict all the testing sentence into a perfect prediction. This is also due to the imperfect labeling word class that make rule cannot cut phrases into accurate prediction according to the speaker’s phrase. Table 10. Pause Comparison No Pause from Speaker Pause from System 1 kame/PRP ni/DT jaim/VBI/1 tang/IN atas/NN kapal/NNC kame/PRP ni/DT jaim/VBI/1 tang/IN atas/NN kapal/NNC 2 Naekan/VBT ke/IN atas/NN kapal/NNC klotok/NNC. Naekan/VBT/1 ke/IN atas/NN kapal/NNC klotok/NNC. In Table 10, we can see the difference in the speaker’s pause and system. In the first sentence, after the word with pos label verb VBI, a short pause occurs before the preposition “tang” with the label “IN”. This is because the rule is set to have a short pause before “IN” for a word like “tang”.The grammar and pause rules predict the same results as the speakers. Meanwhile, in the second sentence, verb VBT and IN do not pause. Because the “IN” PoS is assigned to a word named “ke”. So the prediction results are not accurate. 4. Result and Discussion Based on the test results, the new grammar rule and pause rule that formed a chunking phrase can predict the pause in Pontianak Malay language with accuracy about 33.6% for short pause and long pause in one sentence, and 72.8% for the long pause. This value has a better number than the previous rule. The best value is for long pause with 72.8% compatibility with speaker’s pause and precision value with 0.74, recall with 0.70 and f-measure with 0.723. The chunking phrase can be implemented to develop a text-to-speech system for Pontianak Malay Language. References . [1] M. Dwi Etsa Putra, “Pengaruh Metode Dictionary Lookup Pada Proses Cleaning Korpus Terhadap Akurasi Mesin Penerjemah Statistik Bahasa Indonesia-Bahasa Melayu LONTAR KOMPUTER VOL. 10, NO. 3 DECEMBER 2019 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2019.v10.i03.p01 e-ISSN 2541-5832 Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 139 Pontianak,” Universitas Tanjungpura, 2018. [2] N. dan S. H. Akhsan, Hasil Sensus Penduduk 2010: Kewarganegaraan, Suku Bangsa, Agama dan Bahasa Sehari-hari Penduduk Indonesia. Jakarta: Badan Pusat Statistik, 2010. [3] A. Trivedi, N. Pant, P. Shah, S. Sonik, and S. Agrawal, “Speech to text and text to speech recognition systems-A review,” IOSR Journal of Computer Engineering, vol. 20, no. 2, p. 39, 2018. [4] N. Braunschweiler and R. Maia, “Pause prediction from text for speech synthesis with user- definable pause insertion likelihood threshold,” in INTERSPEECH 2016, 2016, p. 3191. [5] A. Wahab Syahroni, J. Santoso, and E. Setyati, “Pendekatan Rule Handmade untuk Menentukan Klausa Bahasa Indonesia,” in E-Proceedings KNS&I STIKOM Bali 2017, 2017, pp. 598–603. [6] R. J. Prathibba and M. C. Padma, “Shallow Parser for Kannada Sentences Using Machine Learning Approach,” International Journal of Computational Linguistics Research Vol. 8 Number 4, pp. 158–170, 2017. [7] S. Abney, “Parsing By Chunks. In Berwick, Abney, and Tenny (eds),” 1991. [8] M. I. Kamiludin, “Prediksi Jeda Pada Ucapan Bahasa Melayu Pontianak dengan Menggunakan Metode Shallow Parsing,” Universitas Tanjungpura, 2017. [9] P. Arulmozhi and A. G. Ramakrishnan, “Prediction of Pauses in TTS - Tamil,” in Conference: Tamil internet 2010, 2010. [10] S. Darjdowidjojo, Psikolinguistik, Pengantar Pemahaman Bahasa Manusia. Jakarta: Yayasan Obor Indonesia, 2005. [11] C. Brierley and E. Atwell, “Corpus-Based Evaluation of Prosodic Phrase Break Prediction Using nltk_lite;s Chunk Parser to Detect Prosodic Phrase Boundaries in the Aix-MARSEC Corpus of Spoken English,” United Kingdom, 2007. [12] L. Jian-feng, H. Guo-ping, Z. Wan-ping, and W. Ren-hua, “Chinese Prosody Phrase Break Prediction Based on Maximum Entropy Model,” in INTERSPEECH 2004, 2004. [13] A. Teguh Nugraha, “Prediksi Jeda Dalam Ucapan Kalimat Bahasa Indonesia Dengan Hidden Markov Model,” Universitas Tanjungpura, 2014. [14] A. F. Wicaksono and A. Purwarianti, “HMM Based Part-of-Speech Tagger for Bahasa Indonesia,” in Conference: 4th International MALINDO (Malaysian-Indonesian Language) Workshop, 2010. [15] P. J. Sujarwo, Sepok: Cerite Orang Kampong, yang Kampongan, di Kampong Orang. Pontianak: Pijar Publishing, 2010. [16] E. Rahayu Setyaningsih, “Part of Speech Tagger Untuk Bahasa Indonesia Dengan Menggunakan Modifikasi Brill,” Dinamika Teknologi, vol. 9, pp. 37–42, 2017. [17] M. Adriani and H. Riza, “Research Report on Local Language Computing: Development of Indonesia Language Resources and Translation System,” 2009. [18] P.Sarkar and K.Sreenivasa Rao, "Data-Driven Pause Prediction for Synthesis of Storytelling Style Speech Based On Discourse Modes," In: 2015 IEEE International Conference on Electronics, Computing and Communication Technologies, 2015. [19] Q. Truong Do, S.Sakti,G.Neubig, T.Toda and S.Nakamura, "Improving Translation of Emphasis with Pause Prediction in Speech-to-Speech Translation Systems," Japan: Nara Institute of Science and Technology, 2015. [20] R.Manurung, "Tutorial: Pengenalan Terhadap POS Tagging dan Probalistic Parsing," Workshop Nasional INACL, 2016. [21] R.Niu and T.Osborne, "Chunks are Components: A Dependency Grammar Approach to The Syntactic Structure of Mandarin," Lingua: Elsevier, 2019 [22] A. Ibrahim and Y.Assabie, "Amharic Sentence Parsing Using Base Phrase Chunking,", In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing, CICLing 2014. [23] A. Subhan Yazid and A.Fatwanto, "Penentuan Kelas Kata Pada Part of Speech Tagging Kata Ambigu Bahasa Indonesia," Jurnal Informatika Sunan Kalijaga, vol.2, No.3, pp. 157- 166, 2018 [24] S. Denisleam-Molomer, S.Trausan-Matu, P.Dessus, and M.Bianco," Analyzing Students Pauses During Reading and Explaining A Story," RoEduNet International Conference: Networking in Education and Research 2015, Craiova, Romania, pp.90-93, 2015