.


37 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2019, 3 (2): 37-41

ReseaRch aRticle

Measuring the Score Matching of the Pairwise Deoxyribonucleic 
Acid Sequencing using Neuro-Fuzzy
Safa A. Hameed*, Raed I. Hamed

Department of Computer Science, College of Engineering and Science, University of Bayan, Erbil, Iraq

ABSTRACT

The proposed model for getting the score matching of the deoxyribonucleic acid (DNA) sequence is introduced; the Neuro-Fuzzy 
procedure is the strategy actualized in this paper; it is used the collection of biological information of the DNA sequence performing 
with global and local calculations so as to advance the ideal arrangement; we utilize the pairwise DNA sequence alignment to gauge the 
score of the likeness, which depend on information gathering from the pairwise DNA series to be embedded into the implicit framework; 
an adaptive neuro-fuzzy inference system model is reasonable for foreseeing the matching score through the preparation and testing in 
neural system and the induction fuzzy system in fuzzy logic that accomplishes the outcome in elite execution.

Keywords: Component, dynamic programming, matching, neuro-fuzzy, sequence alignment

INTRODUCTION

Deoxyribonucleic acid (DNA) sequence matching is an essential area and more approaching nearby in computational biological data.[1] DNA sequence analysis 
is an imperative exploration topic in bioinformatics. Assessing 
the similarity between sequences, this is important for sequence 
analysis, because similarity proves congruence.[2] The DNA atom 
contains biological, physical, and chemical data; it has turned 
out to be essential to examine DNA sequences statistically.[3] 
String matching is a strategy to find a design from the predefined 
info string.[4] Similarities between DNA sequences may emerge 
due to the functional, structural, or transformative relationship 
among them.[5] Sequence alignment of two biological sequences 
may be called pairwise sequence alignment, also in the event, 
more than two sequences are involved; it may be called multiple 
sequence alignment.[6] The dynamic programming is the 
method to implement the DNA alignment using the Needleman–
Wunsch[7] and Smith–Waterman algorithms.[8]

Here, in this article, we use the pairwise sequence 
alignment in a global and local algorithms and examined the 
measure of the matching based on the collected data for DNA 
alignment. The Neuro-Fuzzy model[9] is used in the Matlab 
tool that implemented by the data set files of measure score 
matching of DNA sequences that deal with the set of biological 
data. This tool is efficient and fast to evaluate the scoring 
measure of matching the DNA sequences.

LITERATURE REVIEW

The study of a biological sequence has been growing 
exponentially, while the applications of the sequence 

alignment cover the wide range in bioinformatics. The 
previous research work has been studied to provide new 
Algorithms with the main purpose for the requirements of 
matching sequences; the techniques have been used all the 
latest with providing fast and efficient sequence alignment 
algorithms. Bhukya and Somayajulu[1] suggested a new 
pattern for matching technique defined as exact multiple 
pattern-matching algorithms that utilize DNA sequence. 
The current method is used to avoid unneeded comparisons 
in the DNA sequence. Gill and Singh[6] proposed a multiple 
sequence alignment algorithm which performs fuzzy logic 
to measure the similarity of sequences based on the fuzzy 
parameters. Nasser et al.[10] suggested the fuzzy logic model 
for approximate matching of DNA subsequences. Kim et al.[11] 
suggested a DNA sequence alignment, which uses quality 
information and a fuzzy inference implementation developed 
based on the features of DNA parts and a fuzzy logic system. 
Chai et al.[12] explained how to perform pairwise sequence 
alignments utilizing the biostrings bundle using the pairwise 
alignment function. Hameed and Hamed[13] discussed how to 

Cihan University-Erbil Scientific Journal (CUESJ)

Corresponding Author: 
Safa A. Hameed, Department of Computer Science, College 
of Engineering and Science, University of Bayan, Erbil, Iraq. 
E-mail:safa.hamid@bnu.edu.iq

Received: Mar 21, 2019 
Accepted: Apr 24, 2019 
Published: Aug 20, 2019

DOI: 10.24086/cuesj.v3n2y2019.pp37-41

Copyright © 2019 Safa A. Hameed, Raed I. Hamed. This is an open-access 
article distributed under the Creative Commons Attribution License.


Hameed and Hamed: Measuring the score matching of DNA using NF

38 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2019, 3 (2): 37-41

implement the pairwise alignment technique to get the score 
of similarity for a pair of characters. In our work, we use the 
Neuro-Fuzzy model that utilizes the biological dataset files for 
matching DNA and measures the score of matching the DNA 
sequences with global and local alignment.

SEQUENCE ALIGNMENT

DNA matching is a significant venture in the sequence 
alignment. Since sequence alignment is a discretionary 
matching process, there is a need for better algorithms.[10] 
DNA sequence alignment algorithms over computational 
biological science have been enhanced eventually by different 
techniques:[11] the (Needleman-Wunsch) global, the (Smith-
Waterman) local, and (ends-free) cover pairwise sequence 
alignment issues.[12] Pairwise alignment is a technique for 
scoring the similarity of a pair of characters. It decides the 
correspondences between the substrings in the sequences 
like the similarity score is amplified.[13] For it is a large 
portion basic form, known as pairwise sequence alignment, 
we provided for two sequences A and B and discover their 
best alignment (either global or local).[12] Aligned sequences 
represented as rows in a grid. Gaps (“−“) need aid embedded 

between the characters with the goal.[6] The ways we use it 
to perform the alignment are global and local alignment, 
these algorithms uses the proposed matrix to measure the 
similarity of bases in the two sequences. For the Needleman–
Wunsch algorithm, a scoring matrix is ascertained for those 
two provided for sequences A and B, by setting one sequence 
along column side, furthermore on the turn sequence side. 
It is additionally frequently referred as optimal matching 
algorithm and the global alignment technique.[7] The Smith–
Waterman algorithm, which is the method used to perform 
the local sequence alignment, local alignment algorithms find 
the sections of the highest similarity between two sequences 
and create the alignment to abroad from there, that is, 
identify the most similar portion comparable subregion 
imparted between two successions.[8]

THE PROPOSED METHOD

We use the Nero-Fuzzy technique in Matlab tool. The 
Neuro-Fuzzy model is very well established approach and 
has a tremendous potentiality to outcome results with high 
accuracy ratio and the efficiency with biological data to 
determine the measure score of matching DNA sequencing. 
Those recommended sequence-matching algorithm utilize the 
three input variables – match score (match), mismatch score 
(mismatch), and gaps, as shown in “Figure 1.”

These three inputs would then fuzzified utilizing the 
following membership functions (MFs) equations and giving 
the calculated resulting score:

Matching =

0 if there is no similarity

1 if there is highest simmilarity 100%

1,0 matching score / lenseq   










 (1)

Mismatch =

0 if there is no mismatch

1 if there is no similarityy

1,0 mismatching score / lenseq   










 (2)

Gap =
0 if there is no need to put a gap

1, 0 gaps score / lens  eeq 





 (3)

Figure 1: The three input variables and the output

Figure 2: The membership function and the training testing phase for the lowest possible error. (a) Input membership function. (b) The testing 
data

ba


Hameed and Hamed: Measuring the score matching of DNA using NF

39 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2019, 3 (2): 37-41

Score =

0 if the resulting score £ 0

1 if the resulting perfect

11, 0 resulting score / perfect score   










 (4)

The variable “lenseq” mean the entire length of the sequence.

 Score = match + (− mismatch) + (− gabs) (5)

THE SIMULATION RESULTS

In our work, we perform the Neuro-Fuzzy model by an 
adaptive neuro-fuzzy inference system (ANFIS) tool in Matlab, 
using the data set about 600 samples for a matching measure 
score of DNA sequencing; these data divide into two data files 
for the training and testing, for the training step, we use the 
data set about 450 samples, and for the testing step, we use 
the data about 150 samples.[14]

We use these dataset files in ANFIS system in the 
range value in equation 1, 2, 3, and 4 and output the result 
according to the equation 5. We use different processing 
systems to implement the matching process, and each 
system has different results with convergent values, and that 
for choosing the most suitable one with less error percentage 
and depends on it to calculate the matching score. “Figure 

2” shows the chosen attempt that gives the results with high 
accuracy.

“Figure 2” explains the most suitable ANFIS system with 
the lowest average testing error; Table 1 shows the different 
processing systems; we implement it with the details. In this 
table the most suitable system which has chosen is the system 
that has the following: trapezoidal MFs that has three MFs, 
constant MFs output, backpropegation train Fuzzy inference 
system method and the number of epochs which are 500, this 
system has the lowest average training and testing error which 
are 0.016572 and 0.01657 respectively, and gives the result 
with high performance, thus we use it to get the score matching 
of the numeric data from the DNA sequence alignment, show 
Tables 2 and 3.

In Table 3, we aligned the sequence using the local 
algorithm; in this method, the algorithm takes the most similar 
part of the pair sequence, not must in the order and not need 
to input the gap; the resulting score is the perfect, there are no 
mismatch and no gaps, and is 100% identical.

DISCUSSION

Sequence alignment is a necessary condition for analyzing 
DNA sequencing; in our method, we use the pairwise sequence 
alignment; it is applied using the global and local alignment 
algorithm method. In this method, we use the numeric 
biological data for sequence alignment using ANFIS system in 
MATLAB; this system implements several processing systems 
to get the most suitable results as shown in Table 1; the most 
suitable system is the trapezoidal with three MF for each 
input and 500 epochs; it has the lowest average testing error. 
We use several different sequences to be aligned, as shown 
in Table 2; we aligned the sequences in the global alignment 
algorithm. In this method, we insert the gaps when the base in 
the sequence is not similar with the other in the same order in 
this pair, and shifted the character and input the gap in order 
to be identical; we use the number of times for (matching), 
mismatching and gaps as the input in the ANFIS view tool, 
and output the score matching, as shown in “Figure 3” we can 
compute the percentage similarity of the alignment by using 
this code in Matlab [Score, Alignment] = nwalign(‘S1’,’S2’); 
showalignment(Alignment); in order to display a pairwise 
sequence alignment, as shown in “Figure 4.” This use the 

Table 1: The various ANFIS testing results

Sequences Global alignment Identities (%) Score

AGGTTGC AGGTTGC 7-May 0.149

AGGTC AGGT--C −71%

GTAGGCTTAAGGTTA GTAGGCTTAAGGTTA 15-May 0

TAGATC A- - - T-C- TAG - - - - −33%

AGTCCA A – GTCCA 7-May 0.149

ATGTCC ATGTCC- −71%

CGGGA CGGGA- 6-Feb 0

ATTGAC ATTGAC 33%

CTATCCG CTATC CG 7-May 0.425

CTAGTCG CTAGTCG −71%

ANFIS: Adaptive neuro-fuzzy inference system

Table 2: The score matching for global alignment of ANFIS tool

Sequences Local 
alignment

Identities (%) Score

AGGTTGC AGGT 100 1

AGGTC AGGT

GTAGGCTTAAGGTTA TAG 100 1

TAGATC TAG

AGTCCA GTCC 100 1

ATGTCC GTCC

CGGGA GA 100 1

ATTGAC GA

CTATCCG CTA 100 1

CTAGTCG CTA

ANFIS: Adaptive neuro-fuzzy inference system


Hameed and Hamed: Measuring the score matching of DNA using NF

40 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2019, 3 (2): 37-41

number of times similar in the sequence alignment divided on 
the entire length of the alignment sequence.

In Table 3, we aligned the sequence using the local 
algorithm, in this method the algorithm take the most 
similar part of the pair sequence, not must in the order 

and not need to input the gap, the resulting score is the 
perfect there is no mismatch and no gaps, and have 100% 
identical.

CONCLUSION

The proposed technique, here, is used to obtain the similarity 
measure of the pairwise DNA sequence alignment; the pattern 
matching is an essential task of example the disclosure process 
in this day and age for finding the basic and utilitarian conduct 
in the DNA sequencing. In spite of the fact that, the example of 
matching is commonly utilized as a part of computer science 
and information processing. In this paper the proposed 
algorithms are used that are Global and Local alignment to 
measure the score matching, which are utilized; it as a method 
to be align the two DNA sequences. The Neuro-Fuzzy model 
is used to evaluate the score similarity by the ANFIS tool in 

Table 3: The score matching for local alignment of ANFIS tool

MF type 
input

The number of 
MF for the inputs

The MF 
output

Train FIS 
method

The number 
of epochs

The average 
training error

The average 
testing error

Triangular MF 3 3 3 Constant Backpropagation 50 0.04657 0.06078

Triangular MF 5 5 5 Constant Backpropagation 250 0.01109 0.04306

Trapezoidal MF 3 3 3 Constant Backpropagation 100 0.01898 0.0453

Trapezoidal MF 3 3 3 Constant Backpropagation 500 0.01657 0.01657

Gaussian MF 5 5 5 Constant Backpropagation 350 0.01436 0.04436

Gaussian2 MF 5 5 5 Constant Backpropagation 500 0.02804 0.05415

MF: Membership functions, ANFIS: Adaptive neuro-fuzzy inference system, FIS: Fuzzy inference system

Figure 3: View rules adaptive neuro-fuzzy inference system

Figure 4: The identical alignment


Hameed and Hamed: Measuring the score matching of DNA using NF

41 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2019, 3 (2): 37-41

Matlab; we implement the method in several processing 
systems and depend on the most suitable system with the 
lowest average testing error; we obtain the score matching 
result for several patterns of DNA sequencing; this model 
presented the matching implementation in fast and efficient.

REFERENCES
1. R. Bhukya and D. V. L. Somayajulu. “Exact multiple pattern 

matching algorithm using DNA sequence and pattern pair”. 
International Journal of Computer Applications, vol. 17, no. 8, 
pp. 32-38, 2011.

2. X. Xie, J. Guan and S. Zhou. “Similarity evaluation of DNA 
sequences based on frequent patterns and entropy”. BMC 
Genomics, vol. 16, no. 3, p. S5, 2015.

3. W. Deng and Y. Luan. “Analysis of similarity/dissimilarity of DNA 
sequences based on chaos game representation”. Abstract and 
Applied Analysis, vol. 2013, p. 926519, 2013.

4. P. Pandiselvam, T. Marimuthu and R. Lawrance. “A Comparative 
Study on String Matching Algorithms of Biological Sequences”. In: 
International Conference on Intelligent Computing, pp. 1-5, 2014.

5. T. Chakrabarti, S. Saha and D. Sinha. “DNA multiple sequence 
alignment by a hidden markov model and fuzzy levenshtein 
distance based genetic algorithm”. International Journal of 
Computer Applications, vol. 73, no. 16, pp. 26-30, 2013.

6. N. Gill and S. Singh. “Biological sequence matching using fuzzy 
logic”. International Journal of Scientific and Engineering Research, 

vol. 2, no. 7, pp. 1-5, 2011.
7. S. B. Needleman and C. D. Wunsch. “A general method applicable 

to the search for similarities in the amino acid sequence of two 
proteins”. Journal of Molecular Biology, vol. 48, no. 3, pp. 443-
453, 1970.

8. T. F. Smith and M. S. Waterman. “Identification of common 
molecular subsequences”. Journal of Molecular Biology, vol. 147, 
no. 1, pp. 195-197, 1981.

9. D. Nauck, F. Klawonn and R. Kruse. “Foundations of Neuro-Fuzzy 
Systems”. John Wiley and Sons, Inc., New York, 1997.

10. S. Nasser, G. L. Vert, M. Nicolescu and A. Murray. “Multiple 
Sequence Alignment Using Fuzzy Logic. In: 2007 IEEE 
Symposium on Computational Intelligence and Bioinformatics 
and Computational Biology, IEEE, pp. 304-311, 2007.

11. K. Kim, M. Kim and Y. Woo. “A DNA sequence alignment algorithm 
using quality information and a fuzzy inference method”. Progress 
in Natural Science, vol. 18, no. 5, pp. 595-602, 2008.

12. N. Chai, L. R. Swem, M. Reichelt, H. Chen-Harris, E. Luis, S. Park 
and J. McBride. “Two escape mechanisms of influenza a virus to 
a broadly neutralizing stalk-binding antibody”. PLoS Pathogens, 
vol. 12, no. 6, p. e1005702, 2016.

13. S. A. Hameed and R. I. Hamed. “Analysing the score matching of 
dna sequencing using an expert system of neurofuzzy”. Journal 
of Theoretical and Applied Information Technology, vol. 95, no. 6, 
pp. 1255-1262, 2017.

14. DNA Matching Data Base-NCBI”. https://www.ncbi.nlm.nih. 
gov/nucleotide.