11_Armen Kostanyan's article_116-121


116 

 
Mathematical Problems of Computer Science 54, 116--121, 2020. 

 
    UDC 519.688 

 
Fuzzy String Matching Using a Prefix Table 

 
Armen H. Kostanyan 

IT Educational and Research Center  

Yerevan State University, Armenia 

e-mail: armko@ysu.am 

 
Abstract 

 
The string matching problem (that is, the problem of finding all occurrences of a 

pattern in the text) is one of the well-known problems in symbolic computations with 

applications in many areas of artificial intelligence. The most famous algorithms for 
solving it are the finite state machine method and the Knuth-Morris-Pratt algorithm 

(KMP).  

In this paper, we consider the problem of finding all occurrences of a fuzzy pattern in 

the text. Such a pattern is defined as a sequence of fuzzy properties of text characters. 

To construct a solution to this problem, we introduce a two-dimensional prefix table, 

which is a generalization of the one-dimensional prefix array used in the KMP 

algorithm. 

Keywords: KMP algorithm, Approximate string matching, Fuzzy string matching 
 

1. Introduction 
 
String matching is a well-known computational problem, the earliest references to which date to 
the 1960s. Interest in this problem really grew with the publication of the Boyer-Moor (BM) [1] 
and Knuth-Morris-Pratt (KMP) [2] algorithms in the 1970s. Since then, quite a lot of exact string 
matching algorithms have been proposed using one or another modification of BM or KMP.  
Along with investigations on exact string matching, there have also been investigations on 
approximate string matching that looked at the problem of finding a generic pattern in the text. 
Unlike the regular pattern, specified as a list of characters, the generic pattern is rather some 
description of the substring to be found. Research on approximate string matching has focused 
on distance-based string matching [3], string matching using meta-characters in patterns, and 
more generally, string matching using a regular expression as a pattern [4]. A detailed survey of 
these works is given in the monograph [5]. 

In this paper, we formulate a fuzzy string matching problem, in which the pattern is 
represented as a sequence of fuzzy characters, and the problem of finding all occurrences of such 


A. Kostanyan 117

a pattern in a text with a given accuracy is defined. This problem was investigated in [6], where a 
non-deterministic transition system was constructed to describe the possibilities of processing the  
given text in order to find all occurrences of a fuzzy pattern in it, and an efficient algorithm for 
determining some occurrences was proposed. In contrast, here we propose an efficient algorithm 
to determine all occurrences of a fuzzy pattern in the text, mimicking the KMP algorithm. 

The paper is organized as follows. 
Section 2 presents the problem definitions. Section 3 introduces the concept of a two-

dimensional prefix table, which generalizes the notion of a one dimensional prefix array used in 
the KMP algorithm. The text processing algorithm using the prefix table is presented in Section 
4. The complexity of the proposed algorithm is analyzed in Section 5. Finally, the conclusion 
summarizes the obtained results. 

     
2. The Fuzzy String Matching Problem  

Suppose that (L, ∨, ∧, 0, 1) is a finite lattice with the smallest element 0 and the largest element 
1. According to [7], the fuzzy subset A of the universal set U is defined by the membership 

function µA: U→L that associates with each element x from U the number µA(x) in L, 
representing the grade of membership of x in A. A fuzzy subset A from U can be represented by 
the additive form  

( )
∈

=
Ux

A
xxA µ/ . 

We say that an element x definitely belongs to A if µA(x) = 1, and it definitely does not 
belong to A if µA(x) = 0. On the contrary, if 0 < µA(x) < 1, we say that x belongs to A with degree  
µA(x).  

Let us define a fuzzy symbol α over the alphabet Σ to be a fuzzy subset of Σ. Given a 
character x ∈ Σ, we say that x matches α with the grade µα(x). We define the fuzzy pattern P[1 .. 
m] as a sequence of fuzzy symbols of length m.   

For a given text T[1 .. n], a fuzzy pattern P[1 .. m] and a threshold λ, we formulate the 
fuzzy string matching problem as the problem of finding all positions w (or, valid shifts), 1 ≤ w ≤ 

n-m+1, in the text such so that 

 
µ P[k] (T[w + k - 1])  ≥  λ for all k, 1 ≤ k ≤ m. 
 

3. Prefix Table  

Let x = x[1 .. q] be an array of text symbols of length q and λ ≥ 0. Let us define the x-border of P 
= P[1..m] as a subarray P[1 .. k] (k < q) such that 

 
µ P[i] (x[q – k + i]) ≥ λ for all i, 1 ≤ i ≤ k, 
 

(that is, the last k characters of x match the first k symbols of P with degree of at least λ).    
Let LBP(x) denote the longest x-border of P. 

Example 3.1: Choose Σ = {1, 2, 3, 4, 5}, L = { 0, 0.25, 0.5, 0.75, 1 } and define the fuzzy 
symbols S (small), M (middle) and L (large) as follows: 

S =  1/1 + 2/0.75 + 3/0.5 + 4/0.25 +  5/0, 
M = 1/0 + 2/0.75 + 3/1 +    4/0.75 +  5/0, 
L =  1/0 + 2/0.25 + 3/0.5 + 4/0.75 +  5/1. 


           Fuzzy String Matching Using Prefix Table 118

Then, for x = 14252, P  = SLMLS and λ = 0.75, the following statement holds  
LBP(x) = SLM.  �   

 
Let us define the prefix table as a matrix π[1 .. m, 1 .. n] such that 
 

Claim 1: For all 2 ≤ q ≤ m, 1 ≤ i ≤ n–q+1             

 
Proof:  Follows from the following statements: 

 [|LBP ( T[i .. i+q-1] )| = q-1 ]  [ µ P[s] (T[i+s]) ≥ λ for all s, 1 ≤ s ≤ q-2 ]  
  [|LBP(T[i .. q-1])| = q-2, µ P[q-1] (T[i+q-2]) ≥ λ ]  
  [π [q-1, i] = q-2, µ P[q-1] (T[i+q-2]) ≥ λ ]  [π [q, i] = q-1]. 
 [|LBP ( T[i .. i+q-1] )| < q-1]  [ |LBP (T[i .. i+q-1])| = |LBP ( T[i+1 .. i+q-1])| ]  

  [ π [q, i] = π [q-1, i+1] ]. �   
The formula (1) allows us to construct the prefix table by filling it from top to bottom and 

from left to right. The procedure below provides additional details.  

Compute-Prefix-Table(P, T, λ) 

 
1    m = P.length, n = T.length  

2    let π[1..m, 1.. n] be the prefix table to be filled accordingly 
3    for i = 1 to n 
4        π [1, i] = 0 
5    for q = 2 to m                  //substring length 
6        for i = 1 to  n-q+1    //position in the text 
7            if  π [q-1, i] == q - 2 and µ P[q-1] (T[i+q-1]) ≥ λ 
8                π [q, i] = q - 1 
9            else π [q, i] = π [q-1, i+1]     
10   return π 

 
Example 3.2: Suppose that 

P = MSMSLM, T = 223141325422414251, 

where S, M and L are the fuzzy symbols defined in Example 3.1. The matrix in Fig. 1 presents а 
prefix table built by the Compute-Prefix-Table algorithm based on P, T and λ = 0.75. 
 

                    0,                                  if  q = 1 

π [q, i] =      |LBP(T[i .. i+q-1])|,      if  2 ≤ q ≤ m, 1 ≤ i ≤ n–q+1             
                    undefined,                    if  i+q-1 > n               

                     q - 1,              if  π [q-1, i] = q-2 and µ P[q-1] (T[i+q-1]) ≥ λ                 
π [q, i] =                                                                                                                  (1)  
                    π [q-1, i+1],    otherwise              


A. Kostanyan 119

 
Fig. 1. Prefix table for P = MSMSLM, T = 223141325422414251. 

 
4. Text Processing using a Prefix Table   

The following procedure is an adaptation of the KMP algorithm to solve the fuzzy strings 

matching problem using a prefix table. 

KMP-FUZZY-STRING-MATCHING ( P, T, λ )  

1    m = P.length, n = T.length,  
2  π = Compute-Prefix-Table(P, T, λ) 
3  q = 0                   // number of characters matched 
4  for i = 1 to n 
5      while q > 0 and µ P[q+1](T[i]) < λ  
6          q = π[q, i - q]       
7        if  µ P[q+1](T[i]) ≥ λ 
8         q = q + 1   
9      if  q == m            //the entire pattern matches  
10          Print( i – q +1 )   
11         q = π[q, i – q + 1] 

Example 4.1. Fig. 2 illustrates the fuzzy string matching process for T, P and λ from Example 
3.2 and the prefix table in Fig. 1.  

 
Fig. 2. Fuzzy string matching process. 

        1     2     3     4     5    6    7     8     9   10   11   12   13  14   15  16   17   18   
T:     2   2   3   1   4   1   3   2   5   4   2   2   4   1   4   2   5   1 
        1     2     3     4     5     # 
P:    M  S  M  S   L  M 
                   1     2    3     4    # 
                  M  S  M  S  L 
                                       1    2     3     4    5     6 
                            M  S  M  S  L  M  - matching 
                                                                        1     2    3     # 
                                                     M  S  M  S 
                                                                               1     2    3     4    5     6 
                                                          M  S  M  S  L  M  - matching 
                                                                                            1     2    3     4   5      # 
                                                                    M  S  M  S  L  M 
                                                                                                                             # 

                                                                                                                            M 

          2   2   3   1   4   1   3   2   5   4   2   2   4   1   4   2   5   1 

M      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
 S       1   1   0   1   0   1   1   0   1   1   1   1   0   1   1   0   0   - 
M      1   2   1   2   1   2    0   1   2   2   1   2   1   2   0   0   -   - 
 S       2   3   2   3   2   0   1   2   3   3   2    3   2   0   0   -   -   - 
 L      3   4   3   4   0   1    2   3   3   4   3    4   0   0    -   -   -   - 
M      4   3   4   5   1   2    3   3   4   5   4    5   0   -    -    -   -   - 


           Fuzzy String Matching Using Prefix Table 120

5. Analysis   
 

To estimate the complexity of the KMP-FUZZY-STRING-MATCHING algorithm, let us take 
advantage of the potential method. Define the potential before every execution of the body of the 

for loop equal to q, which is initially 0 and never becomes negative. Note that the while loop has 
an amortized cost of O(1) since the total number of executions of the assignment statement in 

line 6 does not exceed the potential value before this loop is executed. The two subsequent if 
statements obviously have O(1) amortized complexities. Thus, the amortized complexity of the 

execution of the body of the for loop is O(1), which gives O(n) complexity for the processing 
phase. The complexity of the preprocessing phase represented by the Compute-Prefix-Table 
procedure is obviously O(mn). 

The algorithm uses O(mn) extra memory needed to represent the prefix table π. 
 

5. Conclusion  

The problem of finding occurrences of a fuzzy pattern in a given text with a given accuracy has 

been considered in this paper. A generalization of the KMP string matching algorithm is 

proposed to solve this problem. Like the KMP algorithm, the proposed algorithm consists of 

preprocessing and processing phases. The preprocessing phase creates a two-dimensional prefix 

table that is used in text processing.  

For a pattern of length m and a text of length n, the preprocessing phase takes O(mn) time, 

and the processing phase takes O(n) time. The extra space used by the algorithm is O(mn).        

 
References 
 

[1] R. S. Boyer and J. S. Moore, “A fast string matching algorithm,” Association for 

Computing Machinery, vol. 20, no. 10, pp. 762-772, 1977. 

[2] D. Knuth and M. Pratt, “Fast pattern matching in strings,” SIAM J. Comput. vol. 6, no. 2, 

pp. 323-350, 1977. 

[3] G. Landau and U. Vishkin, “Efficient string matching with k mismatches,” TCS, vol.  43, 

pp. 239-249, 1986. 

[4] R. Baeza-Yates and N. Gonzaio, “A faster algorithm for approximate string matching,” in 

Proc. of Seventh Annual Symp. Combinatorial Pattern Matching, Spinger-Verlag, 1996, 

pp. 1-23. 

[5] B. Smyth, “Computing Patterns in Strings”, Addison-Wesley UK, 2003. 

[6] A. Kostanyan, “Fuzzy String Matching with Finite Automata”, in Proceedings on 2017 

IEEE Conference CSIT-2017, Yerevan, Armenia, pp. 25-29.  IEEE Press, USA 2018. 

[7] L. A. Zadeh, “The concept of a linguistic variable and its application to approximate 

reasoning-I,” Information Sciences, vol. 8, pp. 199-249, 1975. 

 
Submitted 16.07.20, accepted 12.11.20.  
 

A. Kostanyan 121

Տողում ենթատողի ոչ հստակ որոնում նախածանցների 
աղյուսակի օգտագործմամբ 

 
Արմեն Հ․ Կոստանյան 

ՏՏ կրթական և հետազոտական կենտրոն 
Երևանի պետական համալսարան, Հայաստան 

e-mail: armko@ysu.am   
 

Ամփոփում 
 

Տողում ենթատողի որոնման խնդիրը արհեստական բանականության տարբեր 
բնագավառներոմ լայն կիրառում ունեցող խնդիրներից է։ Նշված խնդրի լուծման 
հանրահայտ ալգորիթմներն են՝ վերջավոր ավտոմատների մեթոդը և Կնուտի-
Մորիսի-Պրատի (ԿՄՊ) ալգորիթմը։ 

Տվյալ հոդվածում դիտարկվում է հստակ տողում (տեքստում) ոչ հստակ 
ենթատողի (շաբլոնի) որոնման խնդիրը։ Ոչ հստակ ենթատողը սահմանվում է որպես 
տեքստի սիմվոլների ոչ հստակ հատկությունների հաջորդականություն։ Խնդրի 
լուծումը կառուցվում է ԿՄՊ ալգորիթմում օգտագործվող նախածանցների միաչափ 
աղյուսակի երկչափ ընդլայնման եղանակով։   

Բանալի բառեր` ԿՄՊ ալգորիթմ, տողում ենթատողի մոտավոր որոնում, տողում 
ենթատողի ոչ հստակ որոնում: 

 
Нечеткий поиск подстроки в строки с использованием таблицы 
префиксов 

 
Армен Г. Костанян 

Образовательный и научный центр ИТ 

 Ереванский государственный университет, Армения 

e-mail: armko@ysu.am 
 

Аннотация 

 
Задача поиска подстроки в строке одна из часто встречающихся задач в области 

символических вычислений, имеющая многочисленные применения в задачах 

искусственного интеллекта. К числу наиболее известных алгоритмов для ее решения 

относятся метод конечных автоматов и алгоритм Кнута-Морриса-Прата (алгоритм КМП).  

В настоящей статье рассматривается задача поиска всех вхождений нечеткой 

подстроки в строке. При этом нечеткая подстрока определяется как последовательность 

нечетких свойств символов строки (текста). Поставленная задача решается с помощью 

двумерной таблицы префиксов, являющейся обобщением одномерной таблицы 

префиксов, используемой в алгоритме  КМП. 

Ключевые слова: алгоритм КМП, приближенный поиск подстроки в строке, 
нечеткий поиск подстроки в строке.