IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 The Effect of Number o f Training Samples for Artificial Neural Network L. N. M. Tawfiq , M. N. M.Tawfi q Departme nt of Mathematics, College of Education-Ibn Al-Haitham, Unive rsity of Baghdad . Abstract In this p aper we study the effect of the number of trainin g samples for Artificial neural networks ( ANN ) which is necessary for training p rocess of feed forward neural network .Also we design 5 Ann's and train 41 Ann's which illustrate how good the training sa mples t hat rep resent the actual function for Ann's. 1. Introduction Art ificial Neural Networks ( ANN ) are relatively crude electronic models based on the neural st ructure of the brain. The brain basically learns from exp erience. It is a natural p roof that some p roblems that are beyond the scop e of current comp uters are indeed solvable by small energy efficient p ackages. This brain modeling also p romises a less technical way to develop machine solutions. This new app roach to comp uting also p rovides a more graceful degradation during sy st em overload than its more traditional counterparts. The, advances in biological research p romise an initial understanding of the natural thinking mechanism. This research shows that brains st ore information as p att erns. Some of these patt erns are very comp licated and allow us t he ability to recognize individual faces from many different angles. T his p rocess of st oring information as p att erns, utilizing those p att erns, and then solving p roblems encompasses a new field in comp uting. This field, as mentioned before, does not utilize traditional p rogramming but involves the creation of massively p arallel networks and the training of those networks t o solve sp ecific problems. This field also utilizes words very different from traditional comp uting, words like behave, react, self- organize, learn, generalize, and forget. [1] 2. Artifi cial Ne urons and Their Work The fundamental p rocessing element of artificial neural network (ANN) is a neuron. A neuron is an information-p rocessing unit that is fundamental to the op eration of a neural networks, Figure (1) shows the model of a neuron, which forms the basis for Ann's. The artificial neurons we use to build our neural networks are truly p rimitive in comp arison to those found in the brain. An artificial neuron has several inp uts but only one outp ut. Here we identify three basic elements of the neuronal model [2], [3] : 1. Synapses or connecting links, each is characterized by a weight or st rength of its own. a Sp ecifically , a signal xj at inp ut of sy napse j connected to a neuron is multiplied by the sy naptic weight wj. 2. An activation function for limiting the amplitude of the outp ut of a neuron. IHJPAS IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.23 (3) 2010 3. The model of a neuron also includes an external bias, denoted by b, which has the effect of increasing or lowering the net input of the activation function. In mathematical terms, we may describe a neuron by : y = φ (   N j jj xw 1 + b) ……… ( 1 ) Where x1,…xn are the input signals w1,…,wn are the sy naptic weights of the neuron, b is the bias, φ is the activation function and y is the outp ut signal of the neuron. 3. Training of Ann's Basically, training is the p rocess of determining the weights which are the key elements of an ANN. The knowledge learned by a network is st ored in the nodes in the form of weights and node biases. In this p aper network training that the desired resp onse of the network (target value) for each inp ut p att ern (examp le) is alway s available. The training input data is in the form of vectors of input variables or training p att erns. Corresp onding to each element in an input vector is an input node in the network input lay er. Hence the number of input nodes is equal to the dimension of input vectors. The total available data is usually divided into a training set (in-samp le data) and a test set (out-of-sample). The training set is used for estimating the weights while the test set is used for measuring the generalization ability of the network. The training p rocess is usually as follows. First , examp les of the training set are entered into the input nodes. The activation values of the input nodes are weighted and accumulated at each node in the first hidden lay er. The total is then transformed by an activation function into the node’s activation value. It in turn becomes an input into t he nodes in the next lay er, until eventually the outp ut activation values are found. The training algorithm is used to find the weights that minimize some overall error measure such as the sum of squared errors (SSE) or mean squared errors (M SE). Hence the network training is actually an unconst rained nonlinear min imizat ion p roblem [4] . 4. Fee d forward Ne ural Ne twork Feed-Forward Neural Network (FFNN) has a lay ered st ructure. Each lay er consists of units which receive their input from units from a lay er directly below and send their outp ut to units in a lay er directly above the unit. There are no connections within a layer. The Ni inputs are fed into the first hidden lay er of Nh,1 hidden units. The input units are merely 'fan-out' units; no processing takes p lace in these units. The activation of a hidden unit is a function Fi of the weighted inp uts p lus a bias, as given in equation ( 1 ). The outp ut of the hidden units is distributed over the next lay er of Nh,2 hidden units, until the last lay er of hidden units, of which the outp uts are fed into a lay er of No outp ut units. Although back-p rop agation can be app lied to networks with any number of lay ers. In most app lications a feed-forward network with a single lay er of hidden units is used with a sigmoid activation function (bounded, monotonically increasing and differentiable) , where these sigmoid satisfy the boundary condition: x x lim (x) 1, lim (x) 0       ) [ 5 ], [ 6 ] for the units. This activation function also called transfer function or mathematically basis function. In this p aper we depend the results in [ 7 ] for choosing most transfer function, which is 'tansig'. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 5. An e xample A feed-forward network can be used to app roximate a function from data. Sup p ose we have a sy st em (for examp le a chemical p rocess or a financial market) of which we want to know t he characterist ics. The input of the sy st em is given by the two-dimensional vector (x, y )and the outp ut is given by the one-dimensional vect or z. We want to estimate the relat ion ship z = f(x, y ) from 80 data {(x, y ), z} as depicted in figure (2) (top left). A ANN was p rogrammed with two input units, 10 hidden units with 'logsig' activation function and an outp ut unit with a linear activation function. The network weights are initialized to small values and the network is trained for 5000 training iterations with the back-p rop agation (gradient descent) training rule. The relationship between (x, y ) and z as represented by the network is shown in figure (2) (top right), while the function which generated the training samples is given in figure (2) (bottom left). The app roximation error is depicted in figure (2) (bottom right). We see that the error is higher at the edges of the region within which the training samples were generated. The network is considerably bett er at interp olation than extrapolation. 6. How good are multi-layer fee d-forward ne tworks? From the examp le shown in figure (2) it is clear that t he ap p roximation of the network is not p erfect. T he resulting app roximation error is influenced by : 1. The training algorithm and number of iterations: This determines how good the error on the training set is minimized. 2. The number of training samples: This determines how good the training samples represent t he actual function. 3. The number of hidden units: This determines the "exp ressive p ower" of the network. For "smooth" functions only a few number of hidden units are needed (2N+1) [7], for wildly fluctuating functions more hidden units will be needed. In this p aper , we particularly address the effect of the number of training samples. We first have to define an adequate error measure. All neural network training algorithms try to minimize the error of the set of training samples which are available for training the network. The average error p er training sample is defined as the training error rate: Elea rning = (1/Plea rning ) ∑ lea rningP 1=p pE in which E p is the difference between the desired outp ut value and the actual network outp ut for the training samples : E p = (1/2) 2p o n 1=o p o )yd(∑ This is the error which is measurable during the training p rocess. It is obvious that t he actual error of the network will differ from the error at the locations of the training samples. The difference between the desired outp ut value and the actual network outp ut should be integrated over the entire input domain to give a more realistic error measure. This integral can be estimated if we have a large set of samples: the test set. We now define the test error rate as t he average error of the test set: Etest = (1/Ptest) ∑ testP 1=p pE . IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 In the following section we will see how these error measures dep end on training set size and number of hidden units. 7. The effect of the number of traini ng samples S p roblems are used as examp le: a function y i = fi(x), i=1,2,3,4, has to be app roximated with a ANN. A neural network is created with an inp ut, 3 hidden units with 'tansig' activation function and a linear outp ut unit in first three p roblems and two input unit, 5 hidden units with 'tansig' activation function and a linear outp ut unit in fourth p roblem. Sup p ose we have L number of training samples variant from 1 to 100. The networks are trained with these samples. Training is st op p ed when the error does not decrease anymore and the number of testing sample is 100 – L. The training samples and the testing sample of the network are shown in the same figure. We see that in the interval [1, 20] and [90, 100] Etraining is small ( the network out p ut goes p erfectly through the training samples ) but Etest is large: the test error of the network is large. This exp eriment was carried out with other training set sizes, where for each training set size the exp eriment was repeated 10 times. The average training and test error rates as a function of the training and test set size are given in figure (3). Not e that the training error increases with an increasing training set size, and the test error decreases with increasing training set size. A low training error on the (small) training set is no guarantee for a good network p erformance! With increasing number of training samples the two error rates converge to the same value. This value depends on the representational p ower of the network: given the op timal weights, how good is the app roximation. This error depends on the number of hidden units [8] and the activation function. If the training error rate does not converge to the test error rate the training p rocedure has not found a global minimum. Re ferences 1. Jay a,G.P. Prakash and TRBst aff Representative,(1999),"Use of Art ificial Neural Networks In Geomechanical And Pavement Sy st em", Transp ortation Research Circular, Number E-co12 ,December. 2. Csáji, B.C. (2001), "Ap p roximation with Art ificial Neural Networks", M SC thesis, Faculty of Sciences, Eötvös Loránd University , Hungary . 3. Alldrin,N; Smith, A. and D.Turnbull , (2001)," A T hree – Unit Networks is All You Need to Discover Females " , D.Simonl Nerocomp uting , 4. Pinkus, A.,(1999), "Ap p roximation theory of the M LP model in neural networks" , Acta Numerica ,.143-195. 5. Zhang, G; Eddy B. p atuwoand Y.Hu, (1998), " Forecast ing with Art ificial Neural Networks: The State of the Art ", International Journal of Fore cast ing.14, p p .35 - 62 . 6. Dijkstra, E.W;( 2001), " Ap p roximation with Art ificial Neural Networks ", M Sc thesis, Faculty of M athematics and Computing Science Eindhoven University of Technology , The Netherlands . 7.Hatim, Q; (2007), " Comparison of Ridge and Radial Basis Functions Neural Networks for Interpolation Problems " , M sc Thesis, Baghdad University , College of Education - Ibn Al-Haitham . 8. Tawfiq, L.N.M . and Eqhaar, Q.H.( 2009), " On M ultilay er Neural Networks And Its App lication For App roximation Problem ", 3 rd scientific conference of the College of Science , University of Baghdad. 24 to 26 M arch. IHJPAS IBN AL- HAITHAM J. FOR PURE & APPL. S CI. VOL.23 (3) 2010 Fig 1: Nonlinear model of a neuron Fi g 2: Exam pl e of fu n cti on approx i m at i on wi th a ANN. Top l e ft: The original training samples; Top ri gh t: Th e approx i m ati on wi th th e n e two rk ; Bottom left: The function which generated the tra i n i n g sam pl e s; Bot to m ri gh t: Th e e rror i n th e approx i m at i on . Y1 = f1(x) = sin(x) IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Y2 = f2(x) = 2 x 5 + 6 x 3 – 9 x 2 + 1 Y3 = f3(x) = 3 x ( x- 0.6 ) ( x + 1.17) Y4 = f4(x) = ( x1- x2 ) 3 18 x1 ( x2 – x1)7 x2 Fi g 3: Effect of the training se t size on the e rror rate. Th e a ve rage erro r rate and the ave rage test e rror rate as a fun ction of the numbe r of training sam ple s wi th L trainin g sam ples an d 100 – L tes ting samples for 4 problems IHJPAS 2010) 3( 23مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة المجلد ـةتأثیر عدد عـینات التدریب في الشبكات العـصبیة الصناعی ناجي محمد توفیق محمد،لمى ناجي محمد توفیق جامعة بغداد ،أبن الھیثم –كلیة التربیة ،قسم الریاضیات الخالصة دالت�ي تع�� الص�ناعیةف�ي ھ�ذا البح�ث درس�نا ت��أثیر ع�دد عین�ات الت�دریب ف�ي الش��بكات العص�بیة أیض�اً صناعیةش�بكات عص�بیة 5ضروریة في عملیة التدریب لھذا النوع من الشبكات صممنا ی��ؤدي إل��ى دریب ش��بكة عص��بیة توض��ح كی��ف أن االختی��ار الجی��د لع��دد عین��ات الت�� 41دربن��ا ، .تقریب أو تمثیل جید للدالة الحقیقیة IHJPAS