Autoregressive Integrated Adaptive Neural Networks Classifier for EEG-P300 Signal Classification Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 Mechatronics, Electrical Power, and Vehicular Technology e-ISSN:2088-6985 p-ISSN: 2087-3379 Accreditation Number: 432/Akred-LIPI/P2MI-LIPI/04/2012 www.mevjournal.com ยฉ 2013 RCEPM - LIPI All rights reserved doi: 10.14203/j.mev.2013.v4.1-8 AUTOREGRESSIVE INTEGRATED ADAPTIVE NEURAL NETWORKS CLASSIFIER FOR EEG-P300 SIGNAL CLASSIFICATION Demi Soetraprawata*, Arjon Turnip Technical Implementation Unit for Instrumentation Development Division โ€“ LIPI Kompleks LIPI Gd. 30, Jalan Sangkuriang Bandung, 40135, Indonesia Received 09 October 2012; received in revised form 22 February 2013; accepted 27 February 2013 Published online 30 July 2013 Abstract Brain Computer Interface has a potency to be applied in mechatronics apparatus and vehicles in the future. Compared to the other techniques, EEG is the most preferred for BCI designs. In this paper, a new adaptive neural network classifier of different mental activities from EEG-based P300 signals is proposed. To overcome the over-training that is caused by noisy and non- stationary data, the EEG signals are filtered and extracted using autoregressive models before passed to the adaptive neural networks classifier. To test the improvement in the EEG classification performance with the proposed method, comparative experiments were conducted using Bayesian Linear Discriminant Analysis. The experiment results show that the all subjects achieve a classification accuracy of 100%. Keywords: brain computer interface, feature extraction, classification accuracy, autoregressive, adaptive neural networks, EEG- based P300, transfer rate. I. INTRODUCTION A Brain Computer Interface (BCI) is a device that allows users to communicate with the world without utilizing voluntary muscle activity. BCI systems utilize what is known about brain signals to detect the message that a user chose to communicate. These systems rely on the finding that the brain reacts differently to different stimuli, based on the level of attention that is given to the stimulus and the specific processing that is triggered by the stimulus. Therefore brain activity must be monitored with various techniques. Among these techniques, EEG is the most preferred for BCI designs, because of its non-invasive, cost effectiveness, easy implementation, and best temporal resolution [1, 2]. EEGs are usually analyzed in two ways: (i) as free running EEG; (ii) as events related potentials (ERPs) (i.e., P300, slow cortical potentials (SCPs), readiness potentials (RPs), and steady state visual evoked potentials (SSVEPs)) [3]. Around 1964, Chapman and Bragdon as well as Sutton et al., are independently discovered a wave peaking at approximately 300 ms after task- relevant stimuli [4]. This component is known as the P300. While the P300 is evoked by many types of paradigms, the most common factors that influence it are stimulus frequency and task relevance [5]. The presence, magnitude, topography and time of the response signalsare often used as metrics of cognitive function in decision making processes. The P300 has been shown to be fairly stable in locked-in patients, re- appearing even after severe brainโ€™s stem injuries. Farwell and Donchin (1988) first showed that this signal may be successfully used in a BCI [6]. Using a broad cognitive signal like the P300 has the benefit of enabling control through a variety of modalities, as the P300 enables discrete control in response to both auditory and visual stimuli. As it is a cognitive component, the P300 has been known to change in response to subjectโ€™s fatigue [5]. One of the most important task in designing a BCI is in extracting relevant features from the EEG signals, which is naturally noisy and stochastic. In order to avoid the averaging processes and to remove the artifacts, which are computational complexity, poor generalization, and needs a large number of trainings to achieve a desired accuracy and a communication rate, * Corresponding Author. Tel: +62-22-2503053 E-mail: {demi001, arjo001}@ lipi.go.id http://dx.doi.org/10.14203/j.mev.2013.v4.1-8 D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 2 therefore a new adaptive neural network classifier (ANNC) of different mental activities are proposed. To overcome the over-training caused by noisy and non-stationary data, the features of the EEG-based P300 signals are extracted using autoregressive (AR) method before passed to the proposed classifier algorithm. In order to examine the performance improvements of the proposed classification method, comparative experiments were conducted using Bayesian Linear Discriminant Analysis (BLDA). The contributions of this paper are as follow: a. Enhancements and strengthens the EEG signal according to the small amplitude of the EEG- based P300 which is naturally noisy and stochastic. b. Driving the tracking error converges to a small value around zero while the closed-loop stability is guaranteed. c. The introduction of the AR method and the application of the proposed classifier improve the classification accuracy and the transfer rate of a BCI even when the subjects are in a fatigue condition. The structure of the paper is as follows. In Section 2, the EEG data set and pre-processing are described. Feature extraction and classification using AR method and adaptive neural networks, respectively, are explained in Section 3. Results and discussions are presented in Section 4. Conclusions are drawn in Section 5. II. DATA SET AND EEG PRE- PROCESSING In order to examine the performance improvement of the proposed EEG classification method, the EEG based P300 data used in this paper was obtained previously by Hoffmann et al. (2008) who used the following procedure [5]. The data have been recorded according to the 10- 20 standards from the 32 electrode configurations. In this study, however, only the signals from the eight electrodes configuration were used. Each recorded signal has a length of 820 samples with a sampling rate of 2,048 Hz. A six-choice signal paradigm was tested using a population of five disable and four able bodied subjects. The subjects were asked to count silently the number of times a prescribed image flashed on a screen. Four seconds after a warning tone, six different images (a television, a telephone, a lamp, a door, a window, and a radio) were flashed in a random order. Each flash of an image lasted for 100 ms, and for the following 300 ms no image was flashed (i.e., the inter-stimulus interval was 400 ms). Each subject completed four recording sessions. Each of the sessions consisted of six runs with one run for each of the six images. The duration of one run was approximately one minute and the duration of one session, including setup of electrodes and short breaks between runs, was approximately 30 minutes. One session comprised on average 810 trials, and the entire data for one subject was taken from an average of 3,240 trials. Our goal is to discriminate all possible combinations of the pairs of mental activities from each other using the corresponding EEG signals. The EEG signals are processed in segments (EEG-trials) in which the BCI attempts to recognize the mental activities. Before the classification and validation are performed, several pre-processing operations including filtering, and down sampling were applied to the data. A 6th order forward- backward Butterworth band pass filter with cut off frequencies of one Hz and 12 Hz was used to filter the data. The EEG was down sampled from 2,048 Hz to 32 Hz by selecting each 64th sample from the band pass-filtered data. III. FEATURE EXTRACTION AND CLASSIFICATION A. Feature Extraction In this section, the feature extraction which is focused on the estimation of statistical measurements from the perturbation free EEG- trials delivered by the pre-processing module, is explored. The features computed on a given EEG-trial are grouped into a vector called feature vector that is sent to the pattern recognition module which evaluates the likelihoods that the EEG-trial was produced during the execution of the mental activities. The autoregressive (AR) method is built on the hypotheses of stationarity, ergodicity, absence of coupling between the univariate components, and existence of a linear prediction model [7]. Let ๐’€๐’€ be an ๐‘๐‘๐‘’๐‘’ dimensional multivariable stochastic EEG signal of length sN , composed of randomvectors: { ๐’€๐’€(๐‘˜๐‘˜) = (๐‘ฆ๐‘ฆ1 (๐‘˜๐‘˜) โ€ฆ ๐‘ฆ๐‘ฆ๐‘๐‘๐‘’๐‘’ (๐‘˜๐‘˜)) ๐‘‡๐‘‡| ๏ฟฝ๐‘˜๐‘˜ = 0, โ€ฆ , ๐‘๐‘๐‘ ๐‘  โˆ’ 1 } ๏ฟฝwhere ๐‘ฆ๐‘ฆ1 , โ€ฆ , ๐‘ฆ๐‘ฆ๐‘๐‘๐‘’๐‘’ are the univariate components of ๐’€๐’€ . The AR model can be generated by a linear prediction model of the form [7]: ๐’€๐’€(๐‘˜๐‘˜) = โˆ’โˆ‘ ๐‘จ๐‘จ(๐‘˜๐‘˜, ๐‘–๐‘–)๐’€๐’€(๐‘˜๐‘˜ โˆ’ ๐‘–๐‘–) + ๐‘’๐‘’1 (๐‘˜๐‘˜) ๐‘„๐‘„ ๐‘–๐‘–=1 , (1) where ๐‘„๐‘„ is the model order, the ๐‘จ๐‘จ(๐‘˜๐‘˜, ๐‘–๐‘–) are ๐‘๐‘๐‘’๐‘’๐‘ฅ๐‘ฅ๐‘๐‘๐‘ ๐‘  matrices (๐‘๐‘๐‘’๐‘’ Ne denoting the number of electrodes and Ns denotes the number of temporal samples per EEG channel), and ๐‘’๐‘’1 (๐‘˜๐‘˜) is D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 3 the prediction error with a zero mean random vector. Since the coupling between the channels is ignored, equation (1) can be split into linear prediction models corresponding to each univariate component. Thus, the n-th univariate component of ๐’€๐’€can be written in the form: ๐‘ฆ๐‘ฆ๐‘›๐‘› (๐‘˜๐‘˜) = โˆ’โˆ‘ ๐‘Ž๐‘Ž๐‘›๐‘› (๐‘˜๐‘˜, ๐‘–๐‘–)๐‘ฆ๐‘ฆ๐‘›๐‘› (๐‘˜๐‘˜ โˆ’ ๐‘–๐‘–) + ๐‘’๐‘’n (๐‘˜๐‘˜) ๐‘„๐‘„๐‘›๐‘› ๐‘–๐‘–=1 , (2) where the ๐‘Ž๐‘Ž๐‘›๐‘› (๐‘˜๐‘˜, ๐‘–๐‘–) are the AR coefficients and ๐‘„๐‘„๐‘›๐‘› is the AR order corresponding to ๐‘ฆ๐‘ฆ๐‘›๐‘› , and ๐‘’๐‘’n is the n-th prediction error process. The indexes n and k are used to reference the electrode and time index, respectively. Furthermore, as stationarity and ergodicity are assumed, then the AR model for the n-th channel becomes: ๐‘ฆ๐‘ฆ๐‘›๐‘› (๐‘˜๐‘˜) = โˆ’โˆ‘ ๐‘Ž๐‘Ž๐‘›๐‘› (๐‘–๐‘–)๐‘ฆ๐‘ฆ๐‘›๐‘› (๐‘˜๐‘˜ โˆ’ ๐‘–๐‘–) + ๐‘’๐‘’n (๐‘˜๐‘˜) ๐‘„๐‘„๐‘›๐‘› ๐‘–๐‘–=1 . (3) The coefficients ๐‘Ž๐‘Ž๐‘›๐‘› (1), โ€ฆ , ๐‘Ž๐‘Ž๐‘›๐‘› (๐‘„๐‘„๐‘›๐‘› ) can be determined by minimizing the averaged squared prediction error: ๐œ€๐œ€(๐‘„๐‘„๐‘›๐‘› ) = 1 ๐‘๐‘๐‘ ๐‘  โˆ‘ ๐‘’๐‘’๐‘›๐‘›2 (๐‘˜๐‘˜) ๐‘๐‘๐‘ ๐‘ โˆ’1 ๐‘˜๐‘˜=0 = 1 ๐‘๐‘๐‘ ๐‘  โˆ‘ ๏ฟฝ๐‘ฆ๐‘ฆ๐‘›๐‘› (๐‘˜๐‘˜) + โˆ‘ ๐‘Ž๐‘Ž๐‘›๐‘› (๐‘–๐‘–)๐‘ฆ๐‘ฆ(๐‘˜๐‘˜ โˆ’ ๐‘–๐‘–) ๐‘„๐‘„๐‘›๐‘› ๐‘–๐‘–=1 ๏ฟฝ ๐Ÿ๐Ÿ๐‘๐‘๐‘ ๐‘ โˆ’1 ๐‘˜๐‘˜=0 (4) In this relation, the samples prior to ๐‘ฆ๐‘ฆ๐‘›๐‘› (0) are assumed to be zero. B. Adaptive Neural Networks Classifier Artificial neural networks have been proposed in the fields of information and neural sciences following research into the mechanisms and structures of the brain. This has led to the development of new computational models for solving complex problems such as pattern recognition, rapid information processing, learning and adaptation, classification, identification and modelling, speech, vision and control systems [8-14]. In this paper, we are only concerned with the adaptive classifier problem of EEG-based P300 represented by nonlinear discrete-time systems which can be transformed in state space description [15] as follow: ๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜ + 1) = ๐‘ฅ๐‘ฅ2 (๐‘˜๐‘˜), ๐‘ฅ๐‘ฅ2 (๐‘˜๐‘˜ + 1) = ๐‘ฅ๐‘ฅ3 (๐‘˜๐‘˜), โ‹ฎ ๐‘ฅ๐‘ฅ๐‘›๐‘› (๐‘˜๐‘˜ + 1) = ๐‘“๐‘“(๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)) + ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ๐‘ข๐‘ข(๐‘˜๐‘˜), ๐‘ฆ๐‘ฆ๐‘˜๐‘˜ = ๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜), (5) where ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜) = [๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜), ๐‘ฅ๐‘ฅ2 (๐‘˜๐‘˜), โ€ฆ , ๐‘ฅ๐‘ฅ๐‘›๐‘› (๐‘˜๐‘˜)]๐‘‡๐‘‡ โˆˆ ๐‘…๐‘…๐‘›๐‘› , ๐‘ข๐‘ข(๐‘˜๐‘˜) โˆˆ ๐‘…๐‘… , ๐‘ฆ๐‘ฆ(๐‘˜๐‘˜) โˆˆ ๐‘…๐‘… are the state variables, system input and output, respectively; ๐‘“๐‘“(๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)) and ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ are unknown which may not be linearly parameterized. The classifier system attempts to make the plant output ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜)match the target output asymptotically, so that lim๐‘ก๐‘กโ†’โˆžโ€–๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜) โˆ’ ๐‘ฆ๐‘ฆ๐‘˜๐‘˜โ€– โ‰ค ๐œ€๐œ€ for some specified constant ๐œ€๐œ€ โ‰ฅ 0 . If ๐‘“๐‘“(๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)) and ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ are known, the following classifier can be used, and the system would exactly track the target output ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜). ๐‘ข๐‘ข(๐‘˜๐‘˜) = ๐‘”๐‘”โˆ’1๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ๏ฟฝ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜) โˆ’ ๐‘“๐‘“(๐‘ฅ๐‘ฅ(๐‘˜๐‘˜))๏ฟฝ (6) Since ๐‘“๐‘“(๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)) and ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ are unknown, neural networks can be used to learn to approximate these functions and generate suitable classifiers. Although the function ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ is not known, it can be assumed that its sign is known along system trajectories and that there exist two constants ๐‘”๐‘”0 , ๐‘”๐‘”1 > 0 such that ๐‘”๐‘”0 โ‰ค ๏ฟฝ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ๏ฟฝ โ‰ค ๐‘”๐‘”1 , โˆ€๐‘ฅ๐‘ฅ โˆˆ ฮฉ โˆˆ ๐‘…๐‘…๐‘›๐‘› with compact subset โ„ฆ containing the origin. This assumption implies that the function ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ is strictly either positive or negative. From this point forward therefore, without losing generality, we shall assume ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ > 0. Neural networks are general modelling tools that can approximate any continuous or discrete nonlinear function to any desired accuracy over a compact set [9-11, 16, 17]. In this work, a new adaptive neural network classifier is developed for nonlinear system (5) using high order neural networks. Therefore the mental activities according to the given stimuli could be extracted and classified with high accuracy. It should be noted that although the new states ๐‘ฅ๐‘ฅ2 , ๐‘ฅ๐‘ฅ3 , . . . , ๐‘ฅ๐‘ฅ๐‘›๐‘› are not available in practice, we can predict them as will be detailed in the following discussion. Let ๐‘ฅ๐‘ฅ๐‘‘๐‘‘ = [๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜), ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + 1), โ€ฆ , ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + ๐‘›๐‘› โˆ’ 1)]๐‘‡๐‘‡ the target system states. Define error ๐‘’๐‘’(๐‘˜๐‘˜) = ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜) โˆ’ ๐‘ฅ๐‘ฅ๐‘‘๐‘‘ (๐‘˜๐‘˜). The equation of ๐‘’๐‘’(๐‘˜๐‘˜) can be written as: ๐‘’๐‘’1 (๐‘˜๐‘˜ + 1) = ๐‘’๐‘’2 (๐‘˜๐‘˜), ๐‘’๐‘’2 (๐‘˜๐‘˜ + 1) = ๐‘’๐‘’3 (๐‘˜๐‘˜), โ‹ฎ ๐‘’๐‘’๐‘›๐‘› (๐‘˜๐‘˜ + 1) = ๐‘“๐‘“๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ + ๐‘”๐‘”๏ฟฝ๐‘ฅ๐‘ฅ(๐‘˜๐‘˜)๏ฟฝ๐‘ข๐‘ข(๐‘˜๐‘˜) โˆ’๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + ๐‘›๐‘›). (7) In order to develop the output feedback classifier clearly, define the following new variables ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜) = [๐‘ฆ๐‘ฆ๐‘˜๐‘˜โˆ’๐‘›๐‘›+1 , โ€ฆ , ๐‘ฆ๐‘ฆ๐‘˜๐‘˜โˆ’1 , ๐‘ฆ๐‘ฆ๐‘˜๐‘˜ ]๐‘‡๐‘‡, ๐‘ข๐‘ข๏ฟฝ๐‘˜๐‘˜โˆ’1 (๐‘˜๐‘˜) = [๐‘ข๐‘ข๐‘˜๐‘˜โˆ’1 , โ€ฆ , ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’๐‘›๐‘›+1 ]๐‘‡๐‘‡ , and ๐‘ง๐‘งฬ…(๐‘˜๐‘˜) = ๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ๐‘‡๐‘‡(๐‘˜๐‘˜), ๐‘ข๐‘ข๏ฟฝ๐‘˜๐‘˜โˆ’1 ๐‘‡๐‘‡ (๐‘˜๐‘˜) ๏ฟฝ ๐‘‡๐‘‡ โˆˆ ฮฉ๐‘ง๐‘งฬ… โŠ‚๐‘…๐‘…2๐‘›๐‘›โˆ’1 . According to the definition of the new states, ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜) = D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 4 [๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜ โˆ’ ๐‘›๐‘› + 1), โ€ฆ , ๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜ โˆ’ 1), ๐‘ฅ๐‘ฅ1 (๐‘˜๐‘˜)]๐‘‡๐‘‡ and from Eq. (5), the following equation is obtained. ๐‘ฆ๐‘ฆ๐‘˜๐‘˜+1 = ๐‘ฅ๐‘ฅ2 (๐‘˜๐‘˜) = ๐‘ฅ๐‘ฅ3 (๐‘˜๐‘˜ โˆ’ 1) = โ‹ฏ = ๐‘ฅ๐‘ฅ๐‘›๐‘› (๐‘˜๐‘˜ โˆ’ ๐‘›๐‘› + 2) = ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ + ๐‘”๐‘”๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’๐‘›๐‘›+1 = ๐œ™๐œ™2๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ, (8) which๐‘ฅ๐‘ฅ2 (๐‘˜๐‘˜) is a function of ๐‘ฆ๐‘ฆ(๐‘˜๐‘˜) and ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’๐‘›๐‘›+1 . From (5), similarly the following equation is obtained. ๐‘ฆ๐‘ฆ๐‘˜๐‘˜+2 = ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜ + 1)๏ฟฝ + ๐‘”๐‘”๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜ + 1)๏ฟฝ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’๐‘›๐‘›+2 = ๐œ™๐œ™3๏ฟฝ๏ฟฝฬ…๏ฟฝ๐‘ง(๐‘˜๐‘˜)๏ฟฝ, โ‹ฎ ๐‘ฆ๐‘ฆ๐‘˜๐‘˜+๐‘›๐‘›โˆ’1 = ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜ + ๐‘›๐‘› โˆ’ 2)๏ฟฝ + ๐‘”๐‘”๏ฟฝ๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜ + ๐‘›๐‘› โˆ’ 2)๏ฟฝ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’1 = ๐œ™๐œ™๐‘›๐‘›๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ (9) It proves that ๐‘ฅ๐‘ฅ๐‘›๐‘› (๐‘˜๐‘˜) is a function of ๐‘ง๐‘งฬ…(๐‘˜๐‘˜) . Substituting the predicted states into the last equation in (5), we obtain: ๐‘ฆ๐‘ฆ๐‘˜๐‘˜+๐‘›๐‘› = ๐‘ฅ๐‘ฅ๐‘›๐‘› (๐‘˜๐‘˜ + 1) = ๐‘“๐‘“๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ + ๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ๐‘ข๐‘ข(๐‘˜๐‘˜) (10) where ๐‘“๐‘“๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ = ๐‘“๐‘“๏ฟฝ๏ฟฝ๐‘ฅ๐‘ฅ1(๐‘˜๐‘˜),๐œ™๐œ™2๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ, โ€ฆ , ๐œ™๐œ™๐‘›๐‘›๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ๏ฟฝ ๐‘‡๐‘‡ ๏ฟฝ, (11) ๐‘”๐‘”๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ = ๐‘”๐‘”๏ฟฝ๏ฟฝ๐‘ฅ๐‘ฅ1(๐‘˜๐‘˜), ๐œ™๐œ™2๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ, โ€ฆ , ๐œ™๐œ™๐‘›๐‘›๏ฟฝ๐‘ง๐‘ง๏ฟฝ(๐‘˜๐‘˜)๏ฟฝ๏ฟฝ ๐‘‡๐‘‡ ๏ฟฝ. (12) Define a tracking error as ๐‘’๐‘’๐‘ฆ๐‘ฆ (๐‘˜๐‘˜) = ๐‘ฆ๐‘ฆ๐‘˜๐‘˜ โˆ’ ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜). The tracking error dynamics are given by: ๐‘’๐‘’๐‘˜๐‘˜ (k + n) = โˆ’๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + ๐‘›๐‘›) + ๐‘“๐‘“๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ +๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ๐‘ข๐‘ข(๐‘˜๐‘˜). (13) Supposing that the nonlinear functions ๐‘“๐‘“๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ and ๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝare known exactly, then a desired classifier, such that the output ๐‘ฆ๐‘ฆ๐‘˜๐‘˜ follows the target trajectory ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜), is written as follow: ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) = โˆ’ 1 ๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ ๏ฟฝ๐‘“๐‘“๏ฟฝ๏ฟฝฬ…๏ฟฝ๐‘ง(๐‘˜๐‘˜)๏ฟฝ๏ฟฝ โˆ’ ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + ๐‘›๐‘›). (14) Substituting the desired classifier equation (14) into error dynamics equation (13), i.e., ๐‘ข๐‘ข(๐‘˜๐‘˜) = ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜), then the error dynamics goes to zero is obtained. This means that after n steps, we have ๐‘’๐‘’๐‘ฆ๐‘ฆ (๐‘˜๐‘˜) = 0 . Therefore, ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) is a n-step dead-beat classifier. Moreover, the desired classifier ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) can be expressed as: ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) = ๐‘ข๐‘ข๏ฟฝโˆ—๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ, (15) where ๐‘ง๐‘งฬ…(๐‘˜๐‘˜) = [๐‘ง๐‘งฬ…๐‘‡๐‘‡(๐‘˜๐‘˜), ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + ๐‘›๐‘›)]๐‘‡๐‘‡ โˆˆ ฮฉ๐‘ง๐‘งฬ… โŠ‚ ๐‘…๐‘…2๐‘›๐‘› with component set zโ„ฆ is defined as ฮฉ๐‘ง๐‘งฬ… = ๏ฟฝ๏ฟฝ(๐‘ฆ๐‘ฆ๏ฟฝ(๐‘˜๐‘˜), ๐‘ข๐‘ข๐‘˜๐‘˜โˆ’1 (๐‘˜๐‘˜), ๐‘ฆ๐‘ฆ๐‘‘๐‘‘ )|๐‘ข๐‘ข๏ฟฝ๐‘˜๐‘˜โˆ’1 (๐‘˜๐‘˜) โˆˆ ฮฉ๐‘ข๐‘ข,๐‘ฆ๐‘ฆ๐‘˜๐‘˜,๐‘ฆ๐‘ฆ๐‘‘๐‘‘โˆˆฮฉ๐‘ฆ๐‘ฆ. (16) When the nonlinear functions ๐‘“๐‘“๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ and ๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ are unknown, the nonlinearity ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) is not available. In the following, high order neural networks (HONNs) is introduced to construct the unknown nonlinear functions ๐‘“๐‘“๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ and ๐‘”๐‘”๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜)๏ฟฝ for approximating the desired feedback signal ๐‘ข๐‘ขโˆ—(๐‘˜๐‘˜) . Under certain conditions, it has been proven that neural networks has function approximation abilities and has been frequently used as function approximators, which include linearly and nonlinearly parameterized networks. Consider the following HONNs [16, 18]: ๐œ‘๐œ‘(๐‘Š๐‘Š, ๐‘ง๐‘ง) = ๐‘Š๐‘Š๐‘‡๐‘‡๐‘†๐‘†(๐‘ง๐‘ง), ๐‘Š๐‘Šand ๐‘†๐‘†(๐‘ง๐‘ง) โˆˆ ๐‘…๐‘…๐‘™๐‘™ , (17) ๐‘†๐‘†(๐‘ง๐‘ง) = [๐‘ ๐‘ 1 (๐‘ง๐‘ง), ๐‘ ๐‘ 2 (๐‘ง๐‘ง), โ€ฆ , ๐‘ ๐‘ ๐‘™๐‘™(๐‘ง๐‘ง)]๐‘‡๐‘‡, (18) ๐‘ ๐‘ ๐‘–๐‘–(๐‘ง๐‘ง) = โˆ ๏ฟฝ๐‘ ๐‘ (๐‘ง๐‘ง๐‘—๐‘— )๏ฟฝ ๐‘‘๐‘‘๐‘—๐‘— (๐‘–๐‘–) ๐‘—๐‘—โˆˆ๐‘™๐‘™๐‘–๐‘– , ๐‘–๐‘– = 1,2, โ€ฆ , ๐‘™๐‘™ , (19) where ๐‘ง๐‘ง = [๐‘ง๐‘ง1 , ๐‘ง๐‘ง2 , โ€ฆ , ๐‘ง๐‘ง๐‘š๐‘š ]๐‘‡๐‘‡ โˆˆ ฮฉ๐‘ง๐‘ง โŠ‚ ๐‘…๐‘…๐‘š๐‘š ; the positive integer ๐‘™๐‘™ indicates the node number of neural network; ๐‘‘๐‘‘๐‘—๐‘— (๐‘–๐‘–) indicates the non-negative integers; ๐‘Š๐‘Š is an adjustable synoptic weight vectors; and ๐‘ ๐‘ (๐‘ง๐‘ง๐‘—๐‘— ) is chosen as a hyperbolic tangent function. ๐‘ ๐‘ (๐‘ง๐‘ง๐‘—๐‘— ) = ๐‘’๐‘’๐‘ง๐‘ง๐‘—๐‘— โˆ’ ๐‘’๐‘’โˆ’๐‘ง๐‘ง๐‘—๐‘— ๐‘’๐‘’๐‘ง๐‘ง๐‘—๐‘— + ๐‘’๐‘’โˆ’๐‘ง๐‘ง๐‘—๐‘— (20) According to Girosi and Poggio (1989) [19], there exist ideal weight ๐‘Š๐‘Šโˆ—such that the function ๐œ‘๐œ‘(๐‘ง๐‘ง) can be approximated by an ideal neural network on a compact set ฮฉ๐‘ง๐‘ง โŠ‚ ๐‘…๐‘…๐‘š๐‘š : ๐œ‘๐œ‘(๐‘ง๐‘ง) = ๐‘Š๐‘Šโˆ—๐‘‡๐‘‡๐‘†๐‘†(๐‘ง๐‘ง) + ๐œ€๐œ€๐‘ง๐‘ง, (21) where ๐œ€๐œ€๐‘ง๐‘ง is called the neural network approximation error. It is representing the minimum possible deviation of the ideal approximator ๐‘Š๐‘Šโˆ—๐‘‡๐‘‡๐‘†๐‘†(๐‘ง๐‘ง) from the unknown function ๐œ‘๐œ‘(๐‘ง๐‘ง). In general, the ideal neural network weight ๐‘Š๐‘Šโˆ— is not known and needs to be estimated. In this paper, there exist an integer ๐‘™๐‘™โˆ— and an ideal constant weight vector ๐‘Š๐‘Šโˆ—, such that for all ๐‘™๐‘™ โ‰ฅ ๐‘™๐‘™โˆ—, ๐‘ข๐‘ข๏ฟฝโˆ—(๐‘ง๐‘งฬ…(๐‘˜๐‘˜)) = ๐‘Š๐‘Šโˆ—๐‘‡๐‘‡๐‘†๐‘†(๐‘ง๐‘งฬ…(๐‘˜๐‘˜)) + ๐œ€๐œ€๐‘ง๐‘งฬ…, โˆ€๐‘ง๐‘งฬ… โˆˆ ฮฉ๐‘ง๐‘งฬ…, (22) where ๐œ€๐œ€๐‘ง๐‘งฬ… is the neural network estimation error satisfying |๐œ€๐œ€๐‘ง๐‘งฬ…| < ๐œ€๐œ€0 . Based on Lyapunov technique, it has been proven in Ge et al., 2003 [17] that the adaptive classifier law and the updating law can be chosen as: D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 5 ๐‘ข๐‘ข(๐‘˜๐‘˜) = ๐‘Š๐‘Š๏ฟฝ ๐‘‡๐‘‡๐‘†๐‘†(๐‘ง๐‘งฬ…(๐‘˜๐‘˜)), (23) ๐‘Š๐‘Š๏ฟฝ (๐‘˜๐‘˜ + 1) = ๐‘Š๐‘Š๏ฟฝ (๐‘˜๐‘˜1 ) + ฮ“[๐‘†๐‘†๏ฟฝ๐‘ง๐‘งฬ…(๐‘˜๐‘˜1 )๏ฟฝ(๐‘ฆ๐‘ฆ๐‘˜๐‘˜+1 โˆ’๐‘ฆ๐‘ฆ๐‘‘๐‘‘ (๐‘˜๐‘˜ + 1)) + ๐œŒ๐œŒ๐‘Š๐‘Š๏ฟฝ (๐‘˜๐‘˜)], (24) where ๐‘˜๐‘˜1 = ๐‘˜๐‘˜ โˆ’ ๐‘›๐‘› + 1 , diagonal gain matrix ฮ“ > 0, and ๐œŒ๐œŒ > 0. Therefore, the tracking error converges to a small neighborhood of zero by increasing the approximation accuracy of the neural networks and the closed-loop stability is guaranteed. Figure 1 shows the structure of the pre- processing, feature extraction, and the ANNC algorithm. In Figure 1, )(tx indicates a non-pre processed (raw) EEG signal; )(kx indicates a filtered signal; )(ky indicates an extracted signals in which the artifact was removed; )(kyd and ky indicate a target and classified signal, respectively. IV. RESULT AND DISCUSSION In this study, a new method using adaptive neural networks for the classification of the EEG- based P300 signals is proposed. This method is supported by the AR model to extract the features and reduce the artifact that is contained within the EEG signals. The methods mentioned above were applied to the training of eight subjects who participated in four training sessions with six runs for each session. Figure 2 and 3 are the pre- processed EEG-based P300 signals using Butterworth band pass filter and after applying the AR method as feature extractor and artifacts remover, respectively. Although we can notice some improvement, at Figure 3, it is still difficult to classify the signals with respect to the P300 component. Thus, a new adaptive neural networks classifier is proposed. The tracking error graph with and without applying the AR model approach is shown in Figure 4. The curves show that a level of accuracy is attained after about 250 iterations by applying the AR model approach. On the other hand, the same level of accuracy is attained after 1,800 iterations if the proposed feature extraction method is not applied. It means the introduction of the AR method is relevant to accelerate the training processes. The tracking error converges to a small value around zero while the closed-loop stability is guaranteed. Furthermore, the tracking error with the AR model was converging in faster. According to [17], the data set for subject five were not included in the simulation since the subject misunderstood the instructions given before the experiment. Comparative plots of the Figure 1. Structure of feature extraction and classification algorithms Figure 2. Pre-processed EEG signals using the Butterworth band pass filter Figure 3. Extracted EEG signals using the AR model Figure 4. Networkโ€™s performance according tomean squares errors D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 6 classification accuracies and transfer rates (obtained with BLDA, ANNC, and the combination of the AR model and ANNC methods and averaged over four sessions) for the disable-bodied subjects (subjects 1 โ€“ 4) and able- bodied subjects (subjects 6 โ€“ 9) are shown in Figure 5 and 6, respectively. All of the subjects (with the combination of the AR model and ANNC methods), except for subjects 6 and 9, achieved an average classification accuracy of 100% after five blocks of stimulus presentations were averaged (i.e., 14 second). However, subjects 6 and 9, compared with BLDA, still achieved an average classification accuracy of 100% after nine and ten Figure 5. Comparison of classification accuracy and transfer rate plots obtained with BLDA, ANNC, and the combination of the AR model and ANNC for disable-bodied subjects Figure 6. Comparison of classification accuracy and transfer rate plots obtained with BLDA, ANNC, and the combination of the AR model and ANNC for able-bodied subjects 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 A cc ur ac y (% ) 0 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 50 Subject 1 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 T ra ns fe r ra te ( bi ts /m in ) 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Subject 2 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 Time (s) 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 T ra ns fe r ra te ( bi ts /m in ) BLDA ANNC AR+ANNC Subject 4 Accuracy Transfer rate 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 A cc ur ac y (% ) Time (s) 0 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 50 Subject 3 Time [s] Time [s] A cc ur ac y (% ) Time [s] Subject 3 Subject 4 Time [s] Transfer rate Accuracy Subject 1 Subject 2 Ac cu ra cy (% ) Tr a n s f er R a t e (b it s / m in ) Tr a n sf er R at e (b its /m in ) 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 A cc ur ac y (% ) 0 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 50 Subject 6 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 T ra ns fe r ra te ( bi ts /m in ) 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Subject 7 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 A cc ur ac y (% ) Time (s) 0 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 500 5 10 15 20 25 30 35 40 45 50 Subject 8 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100 Time (s) 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 T ra ns fe r ra te ( bi ts /m in ) BLDA ANNC AR+ANNC Subject 9 Accuracy Transfer rateAc cu ra cy (% ) Time [s] Time [s] Subject 6 A cc ur ac y (% ) Tr an sf er R at e (b its /m in ) Subject 9 Transfer rate Accuracy Subject 7 Tr an sf er R at e (b its /m in ) Subject 8 D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 7 blocks of stimulus presentations were averaged, respectively. This results give a significant improvement compared with the results presented in [17] in which subject 6 and subject 9 were not achieved an average classification accuracy of 100%. It means the introduction of the AR method and the application of the proposed classifier enables the BCI to extract and classify the information in terms of the classification accuracy from a fatigue subject. Thus, fatigue as one of the reasons for the poorer performance of subject 9, as mentioned in Hoffmann et al. [5], can be solved using the proposed method. V. CONCLUSIONS The results presented in this study show that corresponding to the classification accuracy, the data indicates that a P300-based BCI system can communicates at the rate of 31.2 bits/min and 36.7 bits/min for the disable-bodied and able- bodied subjects, respectively. The classification and transfer rate accuracies obtained based ANNC with the AR models approach are found to be far superior in comparison with the BLDA approach and therefore better suited for BCI applications. ACKNOWLEDGEMENT This work is a part of thematics project research funded by UPT BPI LIPI (DIPA NO. 3425.01.011) budgetting year of 2012. The authors would like to thank the Deputy Chairmant for Scientific Services Dr. Fatimah Zulfah S. Padmadinata for supporting to publish this paper. REFERENCE [1] B. E. Hillner, et al., "Impact of positron emission tomography/computed tomography and positron emission tomography (PET) alone on expected management of patients with cancer: initial results from the National Oncologic PET Registry," J Clin Oncol, vol. 26, pp. 2155-61, May 1 2008. [2] F. Jouret, et al., "Single photon emission- computed tomography (SPECT) for functional investigation of the proximal tubule in conscious mice," Am J Physiol Renal Physiol, vol. 298, pp. F454-60, Feb 2010. [3] E. Niedermeyer and F. L. D. Silva, Electroencephalography, 4th ed. Baltimore: Lippincott, Williams & Wilkins, 1999. [4] R. M. Chapman and H. R. Bragdon, "Evoked Responses to Numerical and Non-Numerical Visual Stimuli while Problem Solving," Nature, vol. 203, pp. 1155-1157, 1964. [5] U. Hoffmann, et al., "An efficient P300- based brainโ€“computer interface for disabled subjects," Journal of Neuroscience Methods, vol. 167, pp. 115-125, 2008. [6] L. A. Farwell and E. Donchin, "Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials," Electroencephalogr Clin Neurophysiol, vol. 70, pp. 510-23, Dec 1988. [7] W. D. Penny, et al., "EEG-based communication: a pattern recognition approach," IEEE Trans Rehabil Eng, vol. 8, pp. 214-5, Jun 2000. [8] H. Chen and L. Li, "Semisupervised multicategory classification with imperfect model," IEEE Trans Neural Netw, vol. 20, pp. 1594-603, Oct 2009. [9] S. S. Ge, et al., "Nonlinear adaptive control using neural networks and its application to CSTR systems," Journal of Process Control, vol. 9, pp. 313-323, 1999. [10] S. S. Ge, et al., "Adaptive MNN control for a class of non-affine NARMAX systems with disturbances," Systems & Control Letters, vol. 53, pp. 1-12, 2004. [11] S.-J. Liu, et al., "Adaptive output- feedback control for a class of uncertain stochastic non-linear systems with time delays," International Journal of Control, vol. 81, pp. 1210-1220, August 1 2008. [12] S. Jaiyen, et al., "A very fast neural learning for classification using only new incoming datum," IEEE Trans Neural Netw, vol. 21, pp. 381-92, Mar 2010. [13] S. Ozawa, et al., "A multitask learning model for online pattern recognition," IEEE Trans Neural Netw, vol. 20, pp. 430-45, Mar 2009. [14] Y. Washizawa, "Feature extraction using constrained approximation and suppression," IEEE Trans Neural Netw, vol. 21, pp. 201-10, Feb 2010. [15] A. Isidori, Nonlinear control systems, 2nd ed. Berlin: Springer-Verlag, 1989. [16] S. S. Ge, et al., Stable Adaptive Neural Network Control, 1st ed. Norwell: Kluwer Academic, 2001. [17] S. S. Ge, et al., "Adaptive NN control for a class of strict-feedback discrete-time D. Soetraprawata and A. Turnip / Mechatronics, Electrical Power, and Vehicular Technology 04 (2013) 1-8 8 nonlinear systems," Automatica, vol. 39, pp. 807-819, 2003. [18] E. B. Kosmatopoulos, et al., "High-order neural network structures for identification of dynamical systems," IEEE Trans Neural Netw, vol. 6, pp. 422- 31, March 1995. [19] F. Girosi and T. Poggio, "Networks and the best approximation property," Biological Cybernetics, vol. 63, pp. 169- 176, July 1 1990. Feature Extraction Adaptive Neural Networks Classifier