D:\sbornik\...\RGevorgyan3.DVI Mathematical Problems of Computer Science 23, 2004, 134{143. A Dynamic P r ogr amming Appr oach for Computing Similar ity of the P r otein Sequences B ased on Continuous Functions Compar ison R o b e r t K . Ge vo r g ya n Yerevan State University, Dep. of Applied Mathematics and Informatics. e-mail robertg@ysu.am Abstract This paper introduces a dynamic programming approach for computing "contin- uous" similarity of the two protein sequences. The discrete dynamic programming method considers items of each comparable sequence independently; meantime there is a strong interrelation between them. To overcome this disadvantage a "continuous" sequence comparison method is developed. Particularly, a certain continuous func- tion is correlated to each comparable protein sequence, and then the comparison is made between those functions. Through compressions and expansions the comparable functions are brought to the most similar representation in the meaning of a certain similarity function. By this approach the sequence comparison problem is reduced to a functional maximization problem, which is numerically solved using dynamic pro- gramming method. Finally some practical results are presented with the application of described method. Refer ences [1 ] A le xe e v V .M., Tikh o m ir o v V .M. a n d Fo m in S .V ., Optimal Control., N a u ka , 1 9 7 9 . [2 ] A ls t c h u l S .F., Glis h W ., Mille r W ., Mye r s E .W . a n d L ip m a n D .J. B asic local alignment search tool. J. Mo l. B io l. 2 1 5 , 4 0 3 -4 1 0 , 1 9 9 0 . [3 ] B a ir o c h A . a n d A p we ile r R . The SW ISS-P R OT protein sequence data bank and its supplement TrE M B L in 1999. N u c le ic A c id R e s ., 2 7 , 4 9 -5 4 , 1 9 9 9 . [4 ] B a ld i P . a n d B r u n a k S . B ioinformatics, The M achine L earning Apprach. MIT P r e s s , 2 0 0 1 . [5 ] B a t e m a n A ., B ir n e y E ., Ce r r u t i L ., D u r b in R ., E t wille r L ., E d d y S .R ., Gr i± t h s -Jo n e s S ., H o we K .L ., Ma r s h a ll M. a n d S o n n h a m m e r L .L .E . The P fam protein families database. N u c le ic A c id s R e s e a r c h , vo l. 3 0 , n o . 1 , 2 7 6 -2 8 0 , 2 0 0 2 . [6 ] B e llm a n R . D ynamic P rogramming. P r in s t o n U n iv. P r e s s , 1 9 5 7 . 1 3 4 R. K. Gevorgyan 1 3 5 [7 ] D u r b in R . E d d y S .R ., K r o g h A ., Mit c h is o n G. B iological Sequence analysis. Ca m b r id g e U n ive r s it y P r e s s , 1 9 9 8 . [8 ] E d d y S .R . P ro¯le hidden M arkov models. B io in fo r m a t ic s , vo l. 1 4 , n o . 9 , 7 5 5 -7 6 3 , 1 9 9 8 . [9 ] Ge vo r g ya n R .K ., A Continuous M ethod for E valuating D issimilarity of the P rotein Se- quences. P r o c e e d in g s o f t h e IS A A C Co n fe r e n c e o n A n a lys is , Y e r e va n , A r m e n ia , 2 9 -4 0 , 2 0 0 4 . [1 0 ] Gu s ¯ e ld D . Algorithms on Strings, Trees, and Sequences. Ca m b r id g e U n ive r s it y P r e s s , 1 9 9 7 . [1 1 ] H e ym a n n S ., Ga b r ie lya n O. R ., Gh a z a r ya n H . a n d o t h e r s . Towards a M etrical Space of B iological Sequences. P r o c e e d in g s o f t h e IS A A C Co n fe r e n c e o n A n a lys is , Y e r e va n , A r m e n ia , 1 -1 8 , 2 0 0 4 . [1 2 ] H o r s t R ., P a r d a lo s P .M. a n d Th o a i N .V . Introduction to global optimization. K lu we r A c a d e m ic P u b lis h e r s , 1 9 9 5 . [1 3 ] K a n t o r o vic h L .V . a n d R u b in s t e in G.S . On a function space and certain extremum prob- lem. D o kl.A ka d . N a u k S S S R , N 5 , 1 1 5 , 1 0 5 8 -1 0 6 1 , 1 9 5 7 . [1 4 ] L e ve n s t e in V .I. B inary codes capable of correcting insertions and reversals. S o v. P h ys . D o kl., 1 0 :7 0 7 -7 1 0 , 1 9 6 6 . [1 5 ] Mikh a le vic h V .S . Sequential optimization algorithms and their applications., K ib e r - n e t ika , N 1 , 2 , 1 9 6 5 . [1 6 ] Mo is e e v N . N . Calculus of approximations in the theory of optimal tasks. N a u ka , Mo s c o w, 1 9 7 1 . [1 7 ] N e e d e lm a n S .B . a n d W u n s c h C.D . A general method applicable to the search for simi- larities in the amino acid sequences of two proteins. J. Mo l. B io l., 4 8 , 4 4 3 -4 5 3 , 1 9 7 0 . [1 8 ] P e a r s o n W .R . a n d L ip m a n D .J. Improved tools for biological sequence comparison. P r o c . N a t . A c a d S c i. U S A , 8 5 , 2 4 4 4 -2 4 4 8 , 1 9 8 8 . [1 9 ] R a b in e r R . a n d Ju a n g B .-H . F undamentals of speech recognition. P r e n t ic e H a ll P TR E n g le wo o d Cli®s , N e w Je r s e y 0 7 6 3 2 , 1 9 9 3 . [2 0 ] S a n ko ® D . a n d K r u s ka l J.B . Time W arps, String E dits and M acromolecules. CS L I P u b lic a t io n s , 1 9 9 7 , IS B N 1 -5 7 5 8 6 -2 1 7 -4 . [2 1 ] S e t u b a l J.C. a n d Me id a n is J. Introduction to computational molecular biology. P W S P u b lis h in g c o m p a n y, 1 9 9 7 . [2 2 ] 2 1 . S m it h T.F. a n d W a t e r m a n M.S . Identi¯cation of common molecular subsequences. Jo u r n a l o f Mo le c u la r B io lo g y, 1 4 7 : 1 9 5 -1 9 7 , 1 9 8 1 . [2 3 ] 2 2 . W a s s e r s t e in L .N . M arkov processes over denumerable products of spaces describing large systems of automata. P r o b le m s o f In fo r m a t io n Tr a n s is s io n 5 , 4 7 -5 2 , 1 9 6 9 . [2 4 ] 2 3 . W a t e r m a n M. Introduction to computational biology. Ch a p m a n a n d H a ll, 1 9 9 5 . 1 3 6 A Dyn. Progr. Approach for Comp. Similarity of the Protein Seq. Based on Contin. Func. Comparison êåÇï³Ïáõó³ÛÇÝ Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñǪ ³ÝÁݹѳï ýáõÝÏódzݻñÇ íñ³ ÑÇÙÝí³Í ÝÙ³ÝáõÃÛ³Ý Ñ³ßí³ñÏáõÙÁ ¹ÇݳÙÇÏ Íñ³·ñ³íáñÙ³Ý Ù»Ãá¹áí: è. Î. ¶¨áñ·Û³Ý ²Ù÷á÷áõÙ ²Ûë Ñá¹í³ÍáõÙ Ý»ñϳ۳óí³Í ¿ Ï»Ýë³µ³Ý³Ï³Ý ѳçáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ Ñ³Ù»- Ù³ïÙ³Ý ÙÇ Ù»Ãá¹: гÛïÝÇ ¹ÇëÏñ»ï ¹ÇݳÙÇÏ Íñ³·ñ³íáñÙ³Ý ÙáÃá¹Á Ûáõñ³ù³ÝãÛáõñ ѳٻÙïíáÕ Ñ³çáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ ³Ý¹³ÙÝ»ñÁ ¹Çï³ñÏáõÙ ¿ Çñ³ñÇó ³ÝϳË, ³ÛÝÇÝã ¹ñ³Ýó ÙÇç¨ Ï³Ý áñáß³ÏÇ Ï³å»ñ: ²Û¹ ûñáõÃÛáõÝÁ ѳÕóѳñ»Éáõ ѳٳñ ¹Çï³ñÏíáõÙ ¿ ѳçáñ¹³Ï³ÝáõÃÛáõÝÝ»ñÇ Ñ³Ù»Ù³ïÙ³Ý ÙÇ þ³ÝÁݹѳïþ Ù»Ãá¹: ²ÛÝ ¿ª Ûáõñ³ù³ÝãÛáõñ ѳٻٳïíáÕ Ñ³çáñ¹³Ï³ÝáõÃÛ³ÝÁ ѳٳå³ï³ë˳ÝáõÃÛ³Ý Ù»ç ¿ ¹ñíáõÙ ÙÇ ³ÝÁݹѳï ýáõÝÏódz ¨ ³å³ ѳٻٳïáõÃÛáõÝÁ ϳï³ñíáõÙ ¿ ³Û¹ ýáõÝÏódzݻñÇ ÙÇç¨: ê»ÕÙáõÙÝ»ñÇ ¨ Ó·áõÙÝ»ñÇ ÙÇçáóáí ѳٻٳïíáÕ ýáõÏódzݻñÁ µ»ñíáõÙ »Ý ³Ù»Ý³ÝÙ³Ý ï»ëùǪ ïñí³Í ÝÙ³ÝáõÃÛ³Ý ýáõÝÏódzÛÇ ÇÙ³ëïáí: ²Ûë Ùáï»óÙ³Ý ÙÇçáóáí ËݹÇñÁ µ»ñíáõÙ ¿ ýáõÝÏóÇáݳÉÇ ûåïÇÙǽ³ódzÛÇ ËݹñÇ, áñÁ Ãí³å»ë ÉáõÍ»Éáõ ѳٳñ Ý»ñϳ۳óí³Í ¿ ÙÇ ¹ÇݳÙÇÏ Íñ³·ñ³íáñÙ³Ý Ù»Ãá¹: ²ß˳ï³ÝùáõÙ µ»ñí³Í »Ý ݳ¨ áñáß åñ³ÏïÇÏ ³ñ¹ÛáõÝùÝ»ñª Ý»ñϳ۳óí³Í Ù»Ãá¹Ç ÏÇñ³éٳٵ: