IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 On Co mparison Between Radial Basis Function and Wavelet Basis Functions Neural Networks L.N. M. Tawfiq, T.A. M.Rashid Departme nt of Mathematics ,College of Education I bn Al-Haitham , Unive rsityof Baghdad Abstract In this p aper we st udy and design two feed forward neural networks. The first app roach uses radial basis function network and second app roach uses wavelet basis function network to app roximate the mapp ing from the input to the outp ut sp ace. The trained networks are then used in an conjugate gradient algorithm to estimate the outp ut. These neural networks are then ap p lied to solve differential equation. Results of app lying these algorithms t o several examp les are p resented. 1. Introduction Neural networks are connectionist models p rop osed in an att emp t to mimic the function of the human brain. A neural network (Ann) consist s of a large number of simple p rocessing elements called neurons (or nodes) [1], [2]. Neurons imp lement simple functions and are massively interconnected by means of weighted interconnections. These weights, determined by means of a training p rocess, determine the functionality of the neural network. The training p rocess uses a training database to determine the network p arameters (weights). The functionality of the neural network is also determined by its top ology . M ost networks have a large number of neurons, with the neurons arranged in lay ers. In addition to input and outp ut lay ers, there are usually lay ers of neurons that are not directly connected to either the input or the outp ut, called hidden lay ers. The corresp onding nodes are referred to as hidden nodes. Hidden lay ers give the network the ability to app roximate comp lex, nonlinear functions. The advantages of using neural networks are numerous: neural networks are learning machines that can learn any arbitrary functional map p ing between input and outp ut, they are fast machines and can be imp lemented in p arallel, either in soft ware or in hardware [3]. In fact , the comp utational comp lexity of Ann's is p olynomial in the number of neurons used in the network. Parallelism also brings with it the advantages of robust ness and fault tolerance. Efficient learning algorithms ensure that the network can learn map p ings to any arbitrary p recision in a short amount of time. Furt hermore, the input-outp ut map p ing is exp licitly known in a neural network and gradient descent p rocedures can be used advantageously to p erform the inversion process. 2. Radial Basi s Function Ne ural Ne tworks Radial basis function neural networks (RBFNN) are a class of networks t hat are widely used for solving multivariate function app roximation p roblems [5],[4]. An RBFNN consists of an inp ut and outp ut lay er of nodes and a single hidden lay er. Each node in the hidden lay er imp lements a basis function and the number of hidden nodes is equal to the number of p oints in the training database. The RBFNN ap p roximates t he unknown function that maps t he inp ut to the outp ut in terms of a basis function exp ansion, with the functions, G(x, xi) as the basis functions. The input-outp ut relation for the RBFNN is given by : F(x)  2H j j 2 j 1 j || x c || W exp 2          IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 Where H is the number of basis functions used, y = (y 1, y 2, …, y M ) T is the outp ut of the RBFNN, x is the test input, xj is the center of the basis function and are the exp ansion coefficients or weights associated with each basis function. Each training data sample is selected as the center of a basis function. Basis functions G(x, xi) that are radially sy mmetric are called radial basis functions. Commonly used radial basis functions include the Gaussian and inverse multiquadrics. 3. Wavelet Basi s Function Ne ural Ne tworks (WBFNN) The wavelet transform is a time-frequency transform that p rovides bot h the frequency as well as time localization in the form of a multi resolution decomposition of the signal [6]. Consider a square - integrable function F(x) and let Vm be the vector sp ace containing all p ossible p rojections of F at the resolution m where 2 m is the sampling interval at this resolution [7]. Obviously , as m increases, the number of samples at that resolution decreases and the app roximation gets coarser. Now, consider all app roximations of F at all resolutions. The associated vector sp aces are nested as follows : …. V2  V1  V0  V-1  V-2 …. due to the fact the finer resolutions contain all that required information to comp ute the coarser app roximation of the function F. It is also obvious t hat as t he resolution decreases, the app roximation gets coarser and contains less and less information. In the limit, it converges to zero:      m mmm VVlim = {0} On the other hand, as the resolution increases, the app roximation has more information and eventually converges to the original signal :      m mmm VVlim is dense in L 2 (R). If ф(x) denotes t he scaling function, then Vm = linear sp an {фm k , k Z} where фm k = m2 ф( 2 -m x - k ) , ( m, k ) Є Z 2 is the translated version of ф(x). Since the family of functions { фm k(x) | ( m , k ) Є Z 2 } forms an orthonormal basis for Vm , F can be writt en as : Fm (x) =   k sm k фm k(x) where sm k =    F(x) фm k(x) dx is the projection of F onto the orthonormal basis functions фm k(x) . Furt her, sup p ose Wm is the orthogonal comp lement of Vm in Vm -1. Then Vm -1 = Vm ⊕ Wm with Vm ┴ Wm …….(1) The (m-1) th app roximation can be written as the sum of the p rojections of F onto Vm and Wm . Equivalently , the difference in information ( called the dilates ) between the m th and (m- 1) th app roximations is given by the projection of F onto Wm . M allat [8] shows t hat t here exists a unique function, called the wavelet function, whose translates and dilates form an orthonormal basis for the sp ace Wm . In other words, the detail of F at the m th resolution is given by Dm F(x) =   k dm k ψm k(x) , where ψ(x) is the wavelet ، Ψm k(x) = m2 ψ(2 -m x – k) , ( m , k )  Z 2 are the translates and dilates of ψ(x) and dmk =    F(x)ψmk(x)dx are the projections of F onto Wm . Furt her from (1), we get Fm -1(x) = Fm (x) +   k dm kψm k(x). Since the V-sp aces form a nested set of subsp aces, F can be written as : IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 F(x) =   k sk,-∞ фk,-∞(x) +   l   k dlk ψlk(x) …………….(2) where l indexes over the different resolutions. In p ractice, the limits of summation are chosen to be finite. The architecture network consists of an inp ut and an outp ut lay er with a single hidden lay er of nodes [9]. The hidden lay er nodes are group ed by resolution level. We have as many group s as resolution levels, with the number of basis functions at each resolution . The inp ut-outp ut relation is given by : y l =   1 1 H j wlj фj ( x ,cj ) +   L n 1   nK k 1 wlkn ψnk( x ,cnk ) ………….(3) where L is the total number of resolutions H1 is the number of scaling functions used at the coarsest resolution, Kn is the number of wavelet functions used at resolution n, ci is the center of the corresp onding basis function and wlj is the weight of the interconnection connecting the j th hidden node to the l th outp ut node. The weights are determined in a similar manner to t he weights in the RBFNN described earlier. The p rimary advantage of using wavelet basis functions is orthonormality . Ort honormality of wavelets ensures that the number of basis functions required to app roximate the function F is minimum. The second advantage is that wavelets are local basis functions ( localization p rop erty of wavelets ). The multi resolution app roximation ( M RA ) using wavelets allows the distribution of basis functions based on the resolution required in different p arts of the inp ut sp ace. In addition, the ability to add details at higher resolutions as more data become available allows the network to learn in incremental fashion and allows the user to control the degree of accuracy of the app roximation. Equation (2) formulated for scalar inputs can be extended for multidimensional inputs. The corresp onding multidimensional scaling functions and wavelets are formed by tensor p roducts of the 1- dimensional scaling functions and wavelets. Consider the 2- dimensional case with x = (x1, x2) T . Denoting the 1-D scaling function by ф(x) and the 1-D wavelet by ψ(x) one can show that t he 2-dimensional scaling function is given by : ф ( x1, x2 ) = ф(x1) ф(x2) . Similarly , the corresp onding wavelet functions are given by : ψ 1 (x1, x2) = ф(x1) ψ(x2) ψ 2 (x1, x2) = ψ(x1) ф(x2) ψ 3 (x1, x2) = ψ(x1) ψ(x2) For an accurate app roximation, all the four basis functions must be used at each hidden node. Kugarajah and Zhang have shown that, under certain conditions, a radial basis scaling function ф(║x - xi║) and wavelet ψ(║ x-xi║) Const itut e frame, and that these functions can be used in p lace of the entire N-dimensional basis, resulting in a savings in storage and execution time while minimally affecting the accuracy of the app roximation. The operation of WBFNN is summarized in the following st eps : S tep 1. Basis Function Selection: A significant issue in wavelet basis function neural networks is the selection of the basis functions. The wavelet family used in the WBFNN depends on the form of the function F that must be reconstructed. Even though this function is usually unknown, some imp ortant details may be obtained by insp ecting the problem at hand. For inst ance, classification usually calls for a discontinuous or quantized function F where all the input data is to be map p ed onto one of a few classes. In such cases, discontinuous wavelets, may be used. Continuous wavelets may be used to ap p roximate smoother functions. S tep 2. Center Selection: The location and number of basis functions are imp ortant since they determine the architecture of the neural network. Centers at the first (or coarsest ) resolution are selected by using the K-means algorithm. Each center at successive resolutions is comp uted as t he mean of two centers at a lower resolution. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 S tep 3. Training : Training the network involves determining the exp ansion coefficients associated with each resolution level. These coefficients are determined by using a matrix inversion op eration, similar to the op eration p erformed in RBFNN . The centers can also be dy namically varied during the training p rocess till the error in the network p rediction falls below a p redetermined level. Over-fitt ing by the network can be avoided by p runing the centers one by one until the network p erforms at an acceptable level on a blind test database. In this st udy however, no optimization is p erformed after center selection. S tep 4. Generalization: In this st ep, the trained WBFNN is used to p redict the outp ut for a new test signal using (3). 4. Applications 4.1. FFNN Results Using RBFNN The RBFNN was t ested using a first order Bessel function J1(t) : t 2 y ″ + t y ′ + ( t 2 - 1 ) y = 0 , t  [0, 20] 4.2. FFNN Results Using WBFNN Two resolution levels, with 10 centers at the coarsest resolution were selected using the K-M eans clust ering algorithm. No op timization was p erformed after center selection to reduce the number of basis functions used. The scaling function used was a Gaussian function : ф (x ,c) = exp (-║x- c║/ 2σ 2 ) where c and σ are the center and sp read of the scaling function, resp ectively. The wavelet functions : ψ (x ,c) = ((1-║x- c║ 2 2 m ) / 2σ 2 ) exp (-║x-c║ 2 2 m / 2σ 2 ) . Where c and σ are the center and sp read of the wavelet function resp ectively m is a p arameter controlling the dilation of the wavelet, whose value depends on the resolution level. Figure 2 shows t he performance of the WBFNN as a forward model Comparing the results in Figures1and 2, we see that the WBFNN is a bett er forward model than the RBFNN and the error surface can be illust rated by figure 3. 4.3. Comparing of performance RBFNN and WBFNN. FFNN was t ested using van der p ol equation: x″ + (x 2 –1) x′ + x = 0 , x1(0) = 1 , x2(0) = 1Use RBFNN and WBFNN with σ = 0.01 5. CO NCLUS IONS AND FUTURE WORK This st udy p rop osed the use of ANN based forward models in iterative algorithms used for solving multivariate function app roximation p roblems. Two different ty p es of neural networks RBFNN and WBFNN used to represent t he forward model. These forward models were used, in iterative scheme, or in combination with an inverse model in feedback configuration, to solve the inverse p roblem. This ty p e of FFNN consists offers several advantages over numerical models in terms of both imp lementation of gradient calculations in the up dates of the p arameters and overall comp utational cost. One drawback of these app roaches is that the forward models are not accurate when the input signals are not similar to those used in the training database. From all above study the results of app lying the FFNN to one- and two-dimensional p roblems were p resented. Also we st udy the comp arison between RBFNN and WBFNN to obtain bett er results. In general, our numerical result shows that it is difficult to RBFNN sp ecify which algorithms will converge fast er. 1- For t he large problem we treat, we recommend that: try first the RBFNN. 2- Our numerical results shows t hat, in app roximation of the numerical solution of ODE or PDE using RBFNN gives bett er accuracy than using RBFNN for small dimensional p roblems. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 3- The app roximation of function offers the alternative app roach of adopting a general p urp ose op timization method to solve the relevant non - linear app roximation p roblem in feed forward p rop agation p rocedures. 4- For high-dimensional p roblems, RBFNN are p otentially valuable, since they may be based on a limited number of centers, which do not have to be placed on a grid throughout the domain . Anot her issue that needs to be examined in future work is related with the sampling of the grid points that are used for training. In the above exp eriments t he grid was const ructed in a simple way by considering equidistant p oints. It is exp ected that bett er results will be obtained in the case where the grid density will vary during training according to the corresp onding error values. This means that it is p ossible to consider more training p oints at regions where the error values are higher. Develop ing an up p er bound for the error ║t(x)  a(x)||  O(m  ), require further st udy , where m is the number of basis function and  is the dimension of the domain. As the dimensionality increases, the number of training p oints becomes large. This fact becomes a serious p roblem for methods that consider local functions around each grid p oint since the required number of p arameters becomes excessively large and, therefore, both memory and comp utation time requirements become extremely high, and require further st udy . Re ferences 1. At alla, .J., 1996 M odel Up dating Using Neural Networks, Ph.D. Thesis, Virginia Polytechnic Inst itut e and State University ,. 2. Cheney , .E. W. and Light, W., 2000. Course in app roximation theory, Pub. Books, Colep ub. Company ,. 3. AL-M osawi, A. K., 2009 On Training of Feed forward Neural Networks For App roximation Problems , M sc Thesis , Dep artment of M athematics , College of Education-Ibn Al-Haitham, University of Baghdad , . 4. Yegnanaray ana, B., 2000Art ificial Neural Networks, Newdelhi , 5. Tawfiq ,M . ,and Eqhaar ,Q.H. , On Radial Basis Function Neural Networks , Al-Qadisy a Journal for Science ,. 12:, No. 3 , 2007 . 6. Wajdi, B., Chokr i, B. A. and Alimi, A. M . Comp arison between Beta Wavelets Neural Networks, RBF Neural Networks and Poly nomial App roximation for 1D, 2D Functions App roximation , ROCEEDINGS OF WORLD ACADEM Y OF SCIENCE, ENGINEERING AND TEC- IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 HNOLOGY, 13:, 2006 , ISSN 1307- 6884. 7. Bakshi, B.R. and Stephanopoulos, G. “ Wave net: A multiresolution neural network with localized learnin g ”, AIChEJ.,.39, No.1, pp . 57- 81, 1993. 8. M allat, S.G. “ A theory for multiresolution signal d ecomposition: The wavelet representation” IEEE Trans. Pottern Analysis M ach. Intelligence,.11:, No.7, pp .674 - 693, 1989. 9. Kugarajah, T. and Zhang, Q. “ M ultidimensional wavelet frames”, IEEE Transactions on Neural Network s.6:, No.6, pp .1552-1556, 1995. (a) Performance of the RBFNN (b) Results of iterative RBFNN Figure (1) : RBFNN forward mo del . (a) performance of the WBFNN . IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010 (b) Results of iterati ve WBFNN . Fi g. (2) : WBFNN forward model Fi g.e 3. com pare be tween RBFNN and WBFNN from fi gu re (1) & (2 ) (a) (b) Fi g. 4. pe rfo rman ce of RBFNN: (a) result ( b) e rror IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (2 ) 2010. Fi gu re 5. Re su l t an d e rror o f W BFNN . (a) (b) Fi gure 6. com pare between RBFNN and WBFNN (a) results (b) e rror IHJPAS 2010) 2( 23المجلد مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة دوال األساس الشعاعـیة و دوال يالشبكات العصبیة ذحـول المقارنة بین األساس المتذبذبة دـیـد رشـیـبد المجـرید عـو تغ قـیـوفـمد تـي محـلمى ناج جامعة بغداد ، كلیة التربیة ابن الهیثم ،قسم الریاضیات الخالصه والثـاني RBFNNاألول یسـتخدم .التغذیــة التقدمیــة ين ذاین صـناعیتییتضمن هذا البحث دراســة وتصـمیم شـبكتان عصـبیت إلــى فضــاء المخرجــات ثـــم دربنــا الشــبكات باســتخدام خوارزمیـــة المــدخالتلتقریــب تطبیــق مــن فضـــاء WBFNNیســتخدم ثم طبقنا تلك الشبكات على بعـض األمثلـة لحـل معـادالت بة المطلو للحصول على المخرجات C Gالتدریب المرتد من النوع . رضنا نتائج تلك التطبیقات تفاضلیة ومن ثم ع IHJPAS