Mathematical Problems of Computer Science 56, 56–64, 2021. UDC 519.688 Determining the Degree of Fuzzy Regularity of a String Armen H. Kostanyan IT Educational and Research Center Yerevan State University e-mail: armko@gmail.com Abstract The paper deals with the issue of determining the degree of fuzzy regularity of a crisp string. It is assumed that the concept of fuzzy regularity is formalized by a pattern given as a finite automaton with fuzzy properties of alphabet characters on transitions. Proceeding from this, we replace the problem of determining the degree of fuzzy regularity of a crisp string with the problem of determining the degree of belonging of such a string to the language of the corresponding automaton and propose an effective method for solving it using the dynamic programming approach. The solution to the considered problem makes it possible to fuzzify the set of strings in a given alphabet based on a pattern defining fuzzy regularity. This work is a continuation of the author’s previous works related to finding occurrences of a fuzzy pattern in the text. It may have applications in the field of pattern recognition, data clustering, bio-informatics, etc. Keywords: Fuzzy string matching, Pattern recognition, Fuzzy automaton. 1. Introduction This paper refers to the definition of a degree of fuzzy regularity of a crisp string in a given alphabet. To formalize the concept of fuzzy regularity, we use the highest degree of matching of such a string to the elements of a given set of periodical sequences of fuzzy properties of the alphabetic characters. We treat this set of sequences as a matching pattern. As we noted, the sequences of fuzzy properties included in the pattern must have a certain periodicity. To achieve this goal, we propose to define the matching pattern as a finite automaton with fuzzy properties of alphabetic characters on transitions. As the pumping lemma states, any sufficiently large sequence of fuzzy properties accepted by such an automaton will be periodic with a period size not exceeding the number of states of the automaton. Thus, in the concept we propose, the problem of determining the fuzzy regularity of a crisp string is defined as a problem of determining the degree of belonging such a string to the language of a fuzzy automaton. 56 A. Kostanyan 57 It should be noted that the concept of a fuzzy automaton, which we use as a pattern for determining the regularity, is somewhat different from the one generally accepted in the literature. For example, in the concept of a fuzzy automaton used in [1] and [2], the automaton states and the alphabetic characters are assumed to be crisp, while the start and final states as well as the transitions of the automaton are assumed to be fuzzy. A review of the results on fuzzy automata in their various interpretations, as well as the fuzzy languages they accept (including automaton analysis and synthesis, minimization, closure properties, etc.) is given in [3]. The problem of determining the regularity of a string is widely used in pattern recogni- tion, where it is often necessary to detect regularities in strings encoding images [4]. Such regularity is not always specified precisely, so we have to deal with a fuzzy regularity. The concept of fuzzy regularity can also be used in the area of fuzzy data clustering [5], which deals with grouping elements into fuzzy clusters, a particular case of which is fuzzy sequence labeling [6]. The investigation on fuzzy regularity of a string that we propose in this paper continues our previous research in the field of fuzzy string matching [7], [8], [9]. The problem of splitting a string into adjacent segments to best match the pattern, which is a sequence of fuzzy properties of substrings, was considered in [10], [11]. In this paper, we concretize the concept of a fuzzy property of a substring, defining it as the degree of belonging the substring to the language of a fuzzy automaton. The paper is organized as follows. Section 2 presents the concepts of a fuzzy symbol and a finite automaton over a set of fuzzy symbols, called a fuzzy automaton. Section 3 considers the problem of matching a crisp string with a pattern given by a fuzzy automaton and provides an efficient algorithm for determining the degree of matching based on the dynamic programming approach. Finally, the conclusion summarizes the obtained results. 2. Preliminaries 2.1 Fuzzy Symbols Suppose that (L, ≤, ⊗, 0, 1) is a finite linearly ordered set of measures with the smallest element 0, the largest element 1, and the monotonic accumulation operation ⊗ such that L is a commutative monoid with unit element 1 and zero element 0. That is, for all a, b, c ∈ L a ⊗ 0 = 0, a ⊗ 1 = a, a ≤ b ⇒ a ⊗ c ≤ b ⊗ c. In our further considerations, we will assume that the maximum over the empty set of L - values is equal to 0. According to [12], the fuzzy subset X of the universal set U is defined by the membership function µX : U → L that associates with each element u ∈ U the value µX(u) ∈ L, representing the degree of belonging u to X. A fuzzy subset X of U can be represented by the additive form X = ∑ u∈U u/µX(u). We say that an element u ∈ U certainly belongs to X if µX(u) = 1, and it certainly does not belong to X if µX(u) = 0. Conversely, if 0 < µX(u) < 1, then we say that u belongs to X with degree µX(u). 58 Determining the Degree of Fuzzy Regularity of a String Given an alphabet Σ of characters, we define a fuzzy symbol α over Σ as a fuzzy subset of Σ. Given a character x ∈ Σ and a fuzzy symbol α over Σ, we say that x matches α with degree µα(x). This definition can be extended in the usual way to equal length sequences of characters and fuzzy symbols, respectively. That is, for a set of fuzzy symbols Ξ, x = x1...xn ∈ Σ∗ and ω = ω1...ωn ∈ Ξ∗, we define the matching degree of x to ω as the L-value µω(x) = { µω1(x1) ⊗ ... ⊗ µωn(xn), if x ̸= ϵ, 1, if x = ϵ. • S = 1/1 + 2/0.9 + 3/0.6 + 4/0.3 + 5/0.1 + 6/0 + 7/0, • M = 1/0 + 2/0.25 + 3/0.75 + 4/1 + 5/0.75 + 6/0.25 + 7/0, • L = 1/0 + 2/0 + 3/0.1 + 4/0.3 + 5/0.6 + 6/0.9 + 7/1. Let x = 35634, ω1 = SMLSM, ω2 = MLMSL. Then • µω1(x) = µS(3) ⊗ µM(5) ⊗ µL(6) ⊗ µS(3) ⊗ µM(4) = 3 5 · 3 4 · 9 10 · 3 5 · 1 = 243 1000 , • µω2(x) = µM(3) ⊗ µL(5) ⊗ µM(6) ⊗ µS(3) ⊗ µL(4) = 3 4 · 3 5 · 1 4 · 3 5 · 3 10 = 81 4000 . 2.2 Fuzzy Automaton Given a finite set Ξ of fuzzy symbols over the alphabet Σ, we define a fuzzy automaton as a deterministic finite automaton using symbols from Ξ on transitions. That is, a fuzzy automaton is a 5-tuple A = (Q, Ξ, δ, qin, F), where • Q is the finite non-empty set of states, • δ ⊆ Q × Ξ → Q is the transition function, • qin ∈ Q is the initial state, • F ⊆ Q is the set of final states. Let h = p0−→α1 p1−→α2 ... −→αn pn, n ≥ 0, be a path leading from the state p0 ∈ Q to the state pn ∈ Q in the diagram of the fuzzy automaton A. We say that the path h generates a sequence α = α1...αn ∈ Ξ∗ of fuzzy symbols. We define the language L(A) ⊆ Ξ∗ of the automaton A as the set of all sequences of fuzzy symbols generated by all paths leading from the initial state to a final state. For k ≥ 0 we denote by L(A)/k the set of all k-length strings in L(A). Given a string x ∈ Σ∗ and a fuzzy automaton A, we say that x matches A with degree µA(x), if µA(x) = max{µω(x)| for all ω ∈ L(A)/|x|} Example 1. Suppose Σ = {1, 2, 3, 4, 5, 6, 7} and the measures are rational numbers that belong to the segment [0, 1] with the accumulation operation defined as multiplication. Define the fuzzy symbols S (small), M (middle) and L(large) to be the following fuzzy subsets of Σ: A. Kostanyan 59 Fig. 1. A fuzzy automaton Listed below are all the strings in L(A)/5: ω1 = SMMMM, ω2 = SMLMS, ω3 = SLMSM, ω4 = LMSMM, ω5 = LMLMS. Note that µA(x) = max{µω1(x), µω2(x)µω3(x), µω4(x), µω5(x)} = max { µSMMMM(56364), µSMLMS(56364), µSLMSM(56364), µLMSMM(56364), µLMLMS(56364) } = max {( 1 10 · 1 4 · 3 4 · 1 4 · 1 ) ,( 1 10 · 1 4 · 1 10 · 1 4 · 3 10 ) , ( 1 10 · 9 10 · 3 4 · 0 · 1 ) , ( 3 5 · 1 4 · 3 5 · 1 4 · 1 ) , ( 3 5 · 1 4 · 1 10 · 1 4 · 3 10 )} = = max { 3 640 , 3 16000 , 0, 9 400 , 9 8000 } = 9 400 . 3. Calculation of Fuzzy Regularity 3.1 The Fuzzy Regularity Determination Problem Let Σ be a finite alphabet of characters, Ξ be a finite set of fuzzy symbols over Σ, P be a fuzzy automaton over Ξ called a fuzzy regularity pattern (or, in short, a regularity pattern). For a given x ∈ Σ∗, we define the (x, P) - matching problem as the problem of determining the value µP (x). We assume that this value represents the degree of fuzy regularity of the crisp string x according to the regularity pattern P . 3.2 Recursive Solution Let P = (Q, Ξ, δ, qin, F) be a regularity pattern, q ∈ Q. Let us denote Pq = (Q, Ξ, δ, q, F) the regularity pattern obtained from P by replacing the initial state qin with the state q. The value µPq(x), representing the solution to the (x, Pq) - matching problem for the string x and the regularity pattern Pq, let us denote µP (q, x). In particular, we have that µP (qin, x) = µP (x). Example 2. Let x = 56364, |x| = 5, Ξ = {S, M, L}, A is the fuzzy automaton in Fig. 1. 60 Determining the Degree of Fuzzy Regularity of a String Theorem 1: (Optimal substructure of the (x, Pq) - matching problem) 1. x = ϵ ⇒ [ If q ∈ F then µP (q, x) = 1 else µP (q, x) = 0]. 2. x = ax′ ⇒ [µP (q, x) = max{µα(a) ⊗ µP (q′, x′)| for all δ(q, α) = q′}]. Proof: The first statement is obvious. The second statement follows from [x = ax′] ⇒ [µP (q, x) = max{µα(a) ⊗ µω(x′)| δ(q, α) = q′, α ∈ Ξ, ω ∈ L(Pq′)] ⇒ [µP (q, x) = max{µα(a) ⊗ µP (q′, x′)| for all δ(q, α) = q′}]. Theorem 1 implies the following recurrent equation for calculation µP (q, x): µP (q, x) = { (q ∈ F)?1 : 0, if x = ϵ, max{µα(a) ⊗ µP (q′, x′)| for all δ(q, α) = q′}, if x = ax′. Direct calculation of the value of µP (q, x) using this formula will be inefficient due to overlapping subproblems. In order to get a more efficient solution, let us apply the dynamic programming approach. 3.3 Dynamic Programming Solution Suppose Q is represented as an m-tuple < q1, ..., qm >, so that qin = q1. For 1 ≤ i ≤ m, 1 ≤ j ≤ n + 1, let us denote s[i, j] = µP (qi, x[j..n]) (we assume that x[n + 1..n] = ϵ). The optimal substructure of the (x, Pq) - matching problem dictates the following recur- rent equation for calculating s[i, j]: s[i, j] = { qi ∈ F)?1 : 0, if j = n + 1, max{µα(x[j]) ⊗ s[k, j + 1]| for all δ(qi, α) = qk}, if 1 ≤ j ≤ n. (1) Note that the (x, P) - matching problem can be represented as the problem of determining the value s[1, 1] = µP (x). For q ∈ Q, let us denote out(q) = {(α, p)|δ(q, α) = p }, which is the set of pairs consisting of states and fuzzy symbols corresponding to transitions outgoing from q. The algorithm in Fig. 2 presents the process of calculating the matrix {s[i, j]|, 1 ≤ i ≤ m, 1 ≤ j ≤ n + 1} of matching degrees according to formula (1): A. Kostanyan 61 Fig. 2. Building the matrix of matching degrees The solution to the (x, P) - matching problem, presented in Fig. 3, is simply reduced to extracting the value s[1, 1] from the matrix s. Fig. 3. Determination of fuzzy regularity Example 3 . The matrix s of matching degrees, constructed by the Calculate−Matching− Degrees algorithm on the input x = 56364 and the regularity pattern in Fig. 1, is presented in Fig. 4. Fig. 4. Memoization results 62 Determining the Degree of Fuzzy Regularity of a String In line with Determine − Fuzzy − Regularity algorithm, the value s[1, 1] = 9 400 is the solution to the (x, P) - matching problem for the string x = 56364 and regularity pattern in Fig. 1, which is consistent with the result obtained in Example 2. According to our approach, this value determines the degree of fuzzy regularity of the crisp string x based on the pattern P . 3.4 Analysis To estimate the complexity of the proposed solution to the (x, P)-matching problem, let us assume that there are k transitions in the graph of the m-state fuzzy automaton P , and that this graph is represented as an array A[1..m] such that A[i] = out(qi), 1 ≤ i ≤ m. In this case, the construction of the previous column of the matrix s in lines 5-12 of the Calculate − Matching − Degrees algorithm takes time O(1 + k) and, consequently, the construction of the entire matrix s for a string of length n takes time O(n(1 + k)). As a result, the instruction in line 1 of the Determine − Fuzzy − Regularity algorithm runs in O(n(1 + k)) time. The instruction in line 2 of this algorithm obviously runs in O(1) time, which makes the time complexity of the Determine − Fuzzy − Regularity algorithm equal to O(n(1 + k)). For an n-length string x and an m-state regularity pattern P , the algorithm uses O(mn) extra memory to represent the matrix s[1..m, 1..n + 1] of matching degrees. 4. Conclusion The problem of determining the fuzzy regularity of a crisp string has been considered in this paper, where the concept of fuzzy regularity is formalized by means of a finite automaton with fuzzy properties of alphabet characters on transitions. Using the dynamic programming approach, we propose a solution to this problem with O(n(k + 1)) time complexity, and O(mn) space complexity, where n is the length of the input string; m and k are the number of states and the number of transitions of the automaton, respectively. The proposed algorithm can be used in the field of pattern recognition, data clustering, DNA analysis, etc. Acknowledgments This work was supported by the Ministry of Education, Science, Culture and Sports of the Republic of Armenia, project 21T-1B326. A. Kostanyan 63 References [1] Y. Cao and Y. Ezawa, “Non-determinstic fuzzy automata”, Information Sciences, vol. 191, pp. 86-97, 2012. [2] J. N. Mordeson and D.S. Malik, Fuzzy Automata and Languages: Theory and Applica- tions, Chapman & Hall, CRC, Boca Raton, London 2002. [3] R. K. Singh, A. Rani and M.K. Sachan, ”Fuzzy automata: A quantitative review”, International Journal on Future Revolution in Computer Science & Communication Engineering, vol 3, no. 7, pp. 11-17, 2017. [4] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. [5] Wang Zhen Zhou, ”Image segmentation by combining the global and local properties”, Expert Systems with Applications, vol. 87, pp. 30-40, 2017. [6] Bezdek, James C, Pattern Recognition with Fuzzy Objective Function Algorithms, 1981. [7] A. Kostanyan, Fuzzy string matching with finite automata, Proc. on 2017 IEEE Con- ference CSIT-2017, Yerevan, Armenia, pp. 25-29. IEEE Press, USA 2018. [8] A. Kostanyan, Fuzzy string matching using prefix table, Transactions of IIAP NAS RA, Mathematical Problems of Computer Science, vol. 54, 2020, pp.116 -121. [9] A. Kostanyan and A. Karapetyan, String Matching in Case of Periodicity in the Pattern, In: Dolinina O., Brovko A., Pechenkin V., Lvov A., Zhmud V., Kreinovich V. (eds) Recent Research in Control Engineering and Decision Making. ICIT 2019. Studies in Systems, Decision and Control, vol 199. Springer, 2019. [10] A. Kostanyan and A. Harmandayan, Segmentation of string to match a fuzzy pattern, In Proc. of Computer Science & Information Technologies (CSIT) Conference, Yerevan, Armenia, September, pp. 17-19, 2019. [11] A. Kostanyan and A. Harmandayan, Mapping a fuzzy pattern onto a string, in Proc. on 2019 IEEE Conference ”2019 Computer Science and Information Technologies (CSIT), Yerevan, Armenia, 23-27 Sep., 2019, IEEE Press, USA, pp. 5-8, 2019. [12] L. A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning-I, Information Sciences, vol. 8, pp. 199-249, 1975. Submitted 04.10.2021, accepted 06.12.2021. 6 4 Determining the Degree of Fuzzy Regularity of a String îáÕÇ áã Ñëï³Ï ϳÝáݳíáñáõÃÛ³Ý ³ëïÇ׳ÝÇ áñáßáõÙ ²ñÙ»Ý Ð. Îáëï³ÝÛ³Ý îî ÏñÃ³Ï³Ý ¨ ѻﳽáï³Ï³Ý Ï»ÝïñáÝ ºñ¨³ÝÇ å»ï³Ï³Ý ѳٳÉë³ñ³Ý e-mail: armhko@gmail.com ²Ù÷á÷áõÙ Ðá¹í³ÍáõÙ ¹Çï³ñÏíáõÙ ¿ ïáÕÇ áã Ñëï³Ï ϳÝáݳíáñáõÃÛ³Ý ã³÷Ç áñáßÙ³Ý ËݹÇñÁ: ºÝó¹ñíáõÙ ¿, áñ áã Ñëï³Ï ϳÝáݳíáñáõÃÛ³Ý Ñ³ëϳóáõÃÛáõÝÁ ýáñÙ³Éǽ³óíáõÙ ¿ í»ñç³íáñ ³íïáÙ³ïÇ (ß³µÉáÝÇ) ÙÇçáóáí, áñÇ ³ÝóáõÙÝ»ñÇÝ í»ñ³·ñí³Í »Ý ³Ûµáõµ»ÝÇ Ýß³ÝÝ»ñÇ áã Ñëï³Ï ѳïÏáõÃÛáõÝÝ»ñ: ²ñ¹ÛáõÝùáõÙ, Ñëï³Ï ïáÕÇ áã Ñëï³Ï ϳÝáݳíáñáõÃÛ³Ý ã³÷Ç áñáßÙ³Ý ËݹÇñÁ ÷á˳ñÇÝíáõÙ ¿ ѳٳå³ï³ëË³Ý ³íïáÙ³ïÇ É»½íÇÝ Ýßí³Í ïáÕÇ å³ïϳݻÉÇáõÃÛ³Ý ã³÷Ç áñáßÙ³Ý Ëݹñáí, áñÇ ÉáõÍÙ³Ý Ñ³Ù³ñ ³é³ç³ñÏíáõÙ ¿ û·ï³·áñÍ»É ¹ÇݳÙÇÏ Íñ³·ñ³íáñÙ³Ý Ù»Ãá¹Á: Îïðåäåëåíèå ñòåïåíè íå÷åòêîé ðåãóëÿðíîñòè ñòðîêè Àðìåí Ã. Êîñòàíÿí Îáðàçîâàòåëüíûé è íàó÷íûé öåíòð èíôîðìàöèîííûõ òåõíîëîãèé Åðåâàíñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò e-mail: armhko@gmail.com Àííîòàöèÿ  ñòàòüå ðàññìàòðèâàåòñÿ çàäà÷à îïðåäåëåíèÿ ñòåïåíè íå÷åòêîé ðåãóëÿðíîñòè äàííîé ñòðîêè. Ïðåäïîëàãàåòñÿ, ÷òî íå÷åòêàÿ ðåãóëÿðíîñòü ôîðìàëèçóåòñÿ ïîñðåäñòâîì ïàòåðíà, ïðåäñòàâëåííîãî â âèäå êîíå÷íîãî àâòîìàòà ñ íå÷åòêèìè ñâîéñòâàìè ñèìâîëîâ àëôàâèòà íà ïåðåõîäàõ.  ðåçóëüòàòå, çàäà÷à îïðåäåëåíèÿ ñòåïåíè íå÷åòêîé ðåãóëÿðíîñòè ñòðîêè çàìåíÿåòñÿ çàäà÷åé îïðåäåëåíèÿ ñòåïåíè åå ïðèíàäëåæíîñòè ÿçûêó íå÷åòêîãî àâòîìàòà, äëÿ ðåøåíèÿ êîòîðîé ïðåäëàãàåòñÿ èñïîëüçîâàòü ìåòîä äèíàìè÷åñêîãî ïðîãðàììèðîâàíèÿ. Ðåøåíèå ðàññìàòðèâàåìîé çàäà÷è ïîçâîëÿåò ôàççèôèöèðîâàòü ìíîæåñòâî ñëîâ â äàííîì àëôàâèòå íà îñíîâå ïàòòåðíà îïðåäåëåíèÿ íå÷åòêîé ðåãóëÿðíîñòè. Äàííàÿ ðàáîòà ÿâëÿåòñÿ ïðîäîëæåíèåì ðÿäà ïðåäûäóùèõ ðàáîò àâòîðà ïî ïîèñêó íå÷åòêîãî ïàòòåðíà â ñòðîêå. Îíà ìîæåò èìåòü ïðèìåíåíèÿ â òàêèõ îáëàñòÿõ, êàê ðàñïîçíàâàíèå ïàòòåðíîâ, êëàñòåðèçàöèÿ äàííûõ, áîèíôîðìàòèêà, è ò. ä. Êëþ÷åâûå ñëîâà: íå÷åòêîå ñîïîñòàâëåíèå ñ îáðàçöîì, ðàñïîçíàâàíèå ïàòòåðíîâ, íå÷åòêèé àâòîìàò. Üßí³Í ËݹñÇ ÉáõÍáõÙÁ Ñݳñ³íáñáõÃÛáõÝ ¿ ï³ÉÇë ϳéáõó»É ïñí³Í ³Ûµáõµ»ÝÇ µ³é»ñÇ áã Ñëï³Ï µ³½ÙáõÃÛáõÝ՝ áã Ñëï³Ï ϳÝáݳíáñáõÃÛ³Ý áñáßÙ³Ý ß³µÉáÝÇ ÑÇÙ³Ý íñ³: îíÛ³É ³ß˳ï³ÝùÁ ß³ñáõݳÏáõÃÛáõÝÝ ¿ Ñ»ÕÇݳÏÇ Ý³Ëáñ¹ ³ß˳ï³ÝùÝ»ñÇ՝ ïáÕáõÙ áã Ñëï³Ï »ÝóïáÕÇ (ß³µÉáÝÇ) áñáÝÙ³Ý áõÕÕáõÃÛ³Ùµ: ²ÛÝ Ï³ñáÕ ¿ ÏÇñ³éí»É ß³µÉáÝÝ»ñÇ ×³Ý³ãÙ³Ý, ïíÛ³ÉÝ»ñÇ ËÙµ³íáñÙ³Ý, µÇáÇÝýáñÙ³ïÇϳÛÇ ¨ ³ÛÉ µÝ³·³í³éÝ»ñáõÙ: ´³Ý³ÉÇ µ³é»ñ՝ ÝÙáõßÇ áã Ñëï³Ï ѳٳ¹ñáõÙ, ß³µÉáÝÝ»ñÇ ×³Ý³ãáõÙ, áã Ñëï³Ï ³íïáÙ³ï: 05_Kostanyan_56 kostanyan