D:\sbornik\...\RGevorgyan3.DVI


Mathematical Problems of Computer Science 23, 2004, 134{143.

A Dynamic P r ogr amming Appr oach for Computing

Similar ity of the P r otein Sequences B ased on

Continuous Functions Compar ison

R o b e r t K . Ge vo r g ya n

Yerevan State University, Dep. of Applied Mathematics and Informatics.

e-mail robertg@ysu.am

Abstract

This paper introduces a dynamic programming approach for computing "contin-
uous" similarity of the two protein sequences. The discrete dynamic programming
method considers items of each comparable sequence independently; meantime there
is a strong interrelation between them. To overcome this disadvantage a "continuous"
sequence comparison method is developed. Particularly, a certain continuous func-
tion is correlated to each comparable protein sequence, and then the comparison is
made between those functions. Through compressions and expansions the comparable
functions are brought to the most similar representation in the meaning of a certain
similarity function. By this approach the sequence comparison problem is reduced to
a functional maximization problem, which is numerically solved using dynamic pro-
gramming method. Finally some practical results are presented with the application
of described method.

Refer ences

[1 ] A le xe e v V .M., Tikh o m ir o v V .M. a n d Fo m in S .V ., Optimal Control., N a u ka , 1 9 7 9 .

[2 ] A ls t c h u l S .F., Glis h W ., Mille r W ., Mye r s E .W . a n d L ip m a n D .J. B asic local alignment
search tool. J. Mo l. B io l. 2 1 5 , 4 0 3 -4 1 0 , 1 9 9 0 .

[3 ] B a ir o c h A . a n d A p we ile r R . The SW ISS-P R OT protein sequence data bank and its
supplement TrE M B L in 1999. N u c le ic A c id R e s ., 2 7 , 4 9 -5 4 , 1 9 9 9 .

[4 ] B a ld i P . a n d B r u n a k S . B ioinformatics, The M achine L earning Apprach. MIT P r e s s ,
2 0 0 1 .

[5 ] B a t e m a n A ., B ir n e y E ., Ce r r u t i L ., D u r b in R ., E t wille r L ., E d d y S .R ., Gr i± t h s -Jo n e s S .,
H o we K .L ., Ma r s h a ll M. a n d S o n n h a m m e r L .L .E . The P fam protein families database.
N u c le ic A c id s R e s e a r c h , vo l. 3 0 , n o . 1 , 2 7 6 -2 8 0 , 2 0 0 2 .

[6 ] B e llm a n R . D ynamic P rogramming. P r in s t o n U n iv. P r e s s , 1 9 5 7 .

1 3 4


R. K. Gevorgyan 1 3 5

[7 ] D u r b in R . E d d y S .R ., K r o g h A ., Mit c h is o n G. B iological Sequence analysis. Ca m b r id g e
U n ive r s it y P r e s s , 1 9 9 8 .

[8 ] E d d y S .R . P ro¯le hidden M arkov models. B io in fo r m a t ic s , vo l. 1 4 , n o . 9 , 7 5 5 -7 6 3 , 1 9 9 8 .

[9 ] Ge vo r g ya n R .K ., A Continuous M ethod for E valuating D issimilarity of the P rotein Se-
quences. P r o c e e d in g s o f t h e IS A A C Co n fe r e n c e o n A n a lys is , Y e r e va n , A r m e n ia , 2 9 -4 0 ,
2 0 0 4 .

[1 0 ] Gu s ¯ e ld D . Algorithms on Strings, Trees, and Sequences. Ca m b r id g e U n ive r s it y P r e s s ,
1 9 9 7 .

[1 1 ] H e ym a n n S ., Ga b r ie lya n O. R ., Gh a z a r ya n H . a n d o t h e r s . Towards a M etrical Space
of B iological Sequences. P r o c e e d in g s o f t h e IS A A C Co n fe r e n c e o n A n a lys is , Y e r e va n ,
A r m e n ia , 1 -1 8 , 2 0 0 4 .

[1 2 ] H o r s t R ., P a r d a lo s P .M. a n d Th o a i N .V . Introduction to global optimization. K lu we r
A c a d e m ic P u b lis h e r s , 1 9 9 5 .

[1 3 ] K a n t o r o vic h L .V . a n d R u b in s t e in G.S . On a function space and certain extremum prob-
lem. D o kl.A ka d . N a u k S S S R , N 5 , 1 1 5 , 1 0 5 8 -1 0 6 1 , 1 9 5 7 .

[1 4 ] L e ve n s t e in V .I. B inary codes capable of correcting insertions and reversals. S o v. P h ys .
D o kl., 1 0 :7 0 7 -7 1 0 , 1 9 6 6 .

[1 5 ] Mikh a le vic h V .S . Sequential optimization algorithms and their applications., K ib e r -
n e t ika , N 1 , 2 , 1 9 6 5 .

[1 6 ] Mo is e e v N . N . Calculus of approximations in the theory of optimal tasks. N a u ka ,
Mo s c o w, 1 9 7 1 .

[1 7 ] N e e d e lm a n S .B . a n d W u n s c h C.D . A general method applicable to the search for simi-
larities in the amino acid sequences of two proteins. J. Mo l. B io l., 4 8 , 4 4 3 -4 5 3 , 1 9 7 0 .

[1 8 ] P e a r s o n W .R . a n d L ip m a n D .J. Improved tools for biological sequence comparison. P r o c .
N a t . A c a d S c i. U S A , 8 5 , 2 4 4 4 -2 4 4 8 , 1 9 8 8 .

[1 9 ] R a b in e r R . a n d Ju a n g B .-H . F undamentals of speech recognition. P r e n t ic e H a ll P TR
E n g le wo o d Cli®s , N e w Je r s e y 0 7 6 3 2 , 1 9 9 3 .

[2 0 ] S a n ko ® D . a n d K r u s ka l J.B . Time W arps, String E dits and M acromolecules. CS L I
P u b lic a t io n s , 1 9 9 7 , IS B N 1 -5 7 5 8 6 -2 1 7 -4 .

[2 1 ] S e t u b a l J.C. a n d Me id a n is J. Introduction to computational molecular biology. P W S
P u b lis h in g c o m p a n y, 1 9 9 7 .

[2 2 ] 2 1 . S m it h T.F. a n d W a t e r m a n M.S . Identi¯cation of common molecular subsequences.
Jo u r n a l o f Mo le c u la r B io lo g y, 1 4 7 : 1 9 5 -1 9 7 , 1 9 8 1 .

[2 3 ] 2 2 . W a s s e r s t e in L .N . M arkov processes over denumerable products of spaces describing
large systems of automata. P r o b le m s o f In fo r m a t io n Tr a n s is s io n 5 , 4 7 -5 2 , 1 9 6 9 .

[2 4 ] 2 3 . W a t e r m a n M. Introduction to computational biology. Ch a p m a n a n d H a ll, 1 9 9 5 .


1 3 6 A Dyn. Progr. Approach for Comp. Similarity of the Protein Seq. Based on Contin. Func. Comparison

êåÇï³Ïáõó³ÛÇÝ Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇª ³ÝÁÝ¹Ñ³ï ýáõÝÏóÇ³Ý»ñÇ íñ³
ÑÇÙÝí³Í ÝÙ³ÝáõÃÛ³Ý Ñ³ßí³ñÏáõÙÁ ¹ÇÝ³ÙÇÏ Íñ³·ñ³íáñÙ³Ý Ù»Ãá¹áí:

è. Î. ¶¨áñ·Û³Ý

²Ù÷á÷áõÙ

²Ûë Ñá¹í³ÍáõÙ Ý»ñÏ³Û³óí³Í ¿ Ï»Ýë³µ³Ý³Ï³Ý Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ Ñ³Ù»-
Ù³ïÙ³Ý ÙÇ Ù»Ãá¹: Ð³ÛïÝÇ ¹ÇëÏñ»ï ¹ÇÝ³ÙÇÏ Íñ³·ñ³íáñÙ³Ý ÙáÃá¹Á Ûáõñ³ù³ÝãÛáõñ
Ñ³Ù»ÙïíáÕ Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ ³Ý¹³ÙÝ»ñÁ ¹Çï³ñÏáõÙ ¿ Çñ³ñÇó ³ÝÏ³Ë,
³ÛÝÇÝã ¹ñ³Ýó ÙÇç¨ Ï³Ý áñáß³ÏÇ Ï³å»ñ: ²Û¹ Ã»ñáõÃÛáõÝÁ Ñ³ÕÃ³Ñ³ñ»Éáõ Ñ³Ù³ñ
¹Çï³ñÏíáõÙ ¿ Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ Ñ³Ù»Ù³ïÙ³Ý ÙÇ þ³ÝÁÝ¹Ñ³ïþ Ù»Ãá¹: ²ÛÝ
¿ª Ûáõñ³ù³ÝãÛáõñ Ñ³Ù»Ù³ïíáÕ Ñ³çáñ¹³Ï³ÝáõÃÛ³ÝÁ Ñ³Ù³å³ï³ëË³ÝáõÃÛ³Ý Ù»ç
¿ ¹ñíáõÙ ÙÇ ³ÝÁÝ¹Ñ³ï ýáõÝÏóÇ³ ¨ ³å³ Ñ³Ù»Ù³ïáõÃÛáõÝÁ Ï³ï³ñíáõÙ ¿ ³Û¹
ýáõÝÏóÇ³Ý»ñÇ ÙÇç¨: ê»ÕÙáõÙÝ»ñÇ ¨ Ó·áõÙÝ»ñÇ ÙÇçáóáí Ñ³Ù»Ù³ïíáÕ ýáõÏóÇ³Ý»ñÁ
µ»ñíáõÙ »Ý ³Ù»Ý³ÝÙ³Ý ï»ëùÇª ïñí³Í ÝÙ³ÝáõÃÛ³Ý ýáõÝÏóÇ³ÛÇ ÇÙ³ëïáí: ²Ûë
Ùáï»óÙ³Ý ÙÇçáóáí ËÝ¹ÇñÁ µ»ñíáõÙ ¿ ýáõÝÏóÇáÝ³ÉÇ ûåïÇÙÇ½³óÇ³ÛÇ ËÝ¹ñÇ, áñÁ
Ãí³å»ë ÉáõÍ»Éáõ Ñ³Ù³ñ Ý»ñÏ³Û³óí³Í ¿ ÙÇ ¹ÇÝ³ÙÇÏ Íñ³·ñ³íáñÙ³Ý Ù»Ãá¹:
²ßË³ï³ÝùáõÙ µ»ñí³Í »Ý Ý³¨ áñáß åñ³ÏïÇÏ ³ñ¹ÛáõÝùÝ»ñª Ý»ñÏ³Û³óí³Í Ù»Ãá¹Ç
ÏÇñ³éÙ³Ùµ: