Microsoft Word - BRAIN_vol8_issue4_2017_final.doc 109 A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning Dattaprasad Torse Department of Electronics and Communication Engineering, KLS Gogte Institute of Technology, Jnana Ganga, Khanapur Road, Udyambag, Belagavi, Karnataka 590008, Tel.: +91 831 240 5500, Belagavi, India datorse@git.edu Veena Desai Department of Electronics and Communication Engineering, KLS Gogte Institute of Technology, Jnana Ganga, Khanapur Road, Udyambag, Belagavi, Karnataka 590008, Tel.: +91 831 240 5500, Belagavi, India veenadesai@git.edu Rajashri Khanai Department of Electronics and Communication Engineering, KLE Dr. M. S. Sheshgiri College of Engineering and Technology, Udyambag, Angol Main Road, Belgaum, Karnataka 590008, India Tel/: +91 831 244 0322, Belagavi, India rajashri.khanai@gmail.com Abstract At present, manual observation of the electroencephalogram (EEG) signals is the prime method for diagnosis of epileptic seizure disorders. The method is a time consuming and error prone as it involves errors due to fatigue in continuous monitoring of nonlinear and nonstationary EEG signals. Out of approximate 1% of the world’s epilepsy patients more than 25% cannot be treated correctly due to erroneous diagnosis. The automated seizure detection system can prove efficient by making the process reliable and faster. This paper reviews multi-domain feature extraction and machine learning classification techniques used in automated seizure detection systems. To analyse subtle variations in EEG, signal decomposition algorithms have been used in time, frequency, joint time-frequency, and nonlinear domain. The statistical and entropy parameters are the key features to discern normal from the seizure EEG signals. Machine learning plays a critical role in extracting meaningful information out of the extracted features. The paper also evaluates the performance of Multilayer Perceptron Neural Network, naïve Bayes, Least Square Support Vector Machine, k nearest neighbour, and random forest classifiers using sensitivity, specificity and accuracy metrics. A seizure detection technique is developed by decomposing the EEG signals by means of Tunable-Q Wavelet Transform (TQWT). To quantify the complexity of the individual multivariate sub-bands of the biomedical signals TQWT proves effective with varied values of Q factor suitable for analyzing signals with oscillatory and non-oscillatory nature. The highest accuracy of 97.3% is obtained using random forest classifier for the combination of spectral, Shannon and Kraskov entropy features. The paper compares the performance of feature extraction and classification techniques for the implemented system. The comparison explores possibility of hardware implementation of real time seizure detection scheme. Keywords: Seizure Detection, Tunable-Q Wavelet Transform, Shannon entropy, Kraskov entropy, Least Square Support Vector Machine, Random Forest BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 110 1. Introduction Epilepsy is a widespread brain disorder which affects a variety of mental and physical actions. When more than two episodes of seizures occur in a lifespan of a person then they are categorized as a seizure patient. Epileptic seizures are provoked by group of nerve cells which affect a person’s normal behavior. This sudden brain signal change is life intimidating in few cases as it can cause physical injury to the affected person. In the form of partial and generalized seizures the abnormal brain activity poses a very important health concern to the patient. Partial seizures start with a specific area of brain and usually called the epileptic foci. Partial seizures may or may not affect conciseness of a person. Generalized seizures involve seizure signals originating from most part of the brain and cause loss of mental alertness and muscle spasms. The process of ‘epileptogenesis’ is highly unpredictable and the risk involved in the form of injury is very high (D. Buck, 1997). The seizure disorder occurs due to several causes such as birth asphyxia, stroke, traumatic brain injury or brain infections. The seizure disorders are not preventable or in some cases not completely curable but with the help of anticonvulsant drugs the life threatening seizures can be controlled in majority of the cases (Englander J., 2014). The episode of epileptic seizures occurs as the brain’s controlled neonatal firing circuit malfunctions and causes excessive electrical discharge by a group of nerve cells in the brain cortex. This processing is sudden and unpredictable. Depending upon the side of cortex, out of four sides namely frontal, parietal, occipital and temporal which originates the abnormal signals, the abnormalities in the motor control results in tonic-clonic movements of muscles and joints. The discharge of electrical energy in a normal brain cells is controlled and produces variations that are in normal magnitude ranges. However an abrupt and large transient rush of energy by the brain cells results in epileptic seizures. An epileptic seizure can show variation in properties of brain waves which can result in a short term muscle movement to severe convulsions. These variations mainly depend on the area of the brain from which the energy is generated, the level of electrical energy discharge and the total area over which this energy is extended in the event of abnormal activity (Acharya U., 2013). The working of brain and its properties that cause epileptic activities are still a mystery. When a person experiences epileptic activity the possible observable signs are sudden movement of the body parts, loss of concentration, muscle involuntary movement, disturbance in visual and auditory senses and mood disorder. There can be several changes in a person suffering from mild to severe epileptic attack which are beyond the range of normal observations. When the seizures are seen in children who have limited knowledge about the situation that they experience it become difficult to notice the seizure onset. This pre-seizure behavior changes in children are linked to the behavioral disorder. Hence, children with epileptic disorder need continuous monitoring and thus the epilepsy observation is a continuous process. In order to make the process fully automated with indication of seizure occurrence many signal processing algorithms need to be considered with detailed analysis. In order to detect epilepsy using automated Computer Assisted Diagnostic (CAD) techniques using EEG signals understanding the physiological aspects of the seizure signal class is essential (Sanei, Saeid). In this work, a Tunable-Q Wavelet Transform (TQWT) (IW Selesnick, 2011) based seizure detection system is proposed which uses spectral and entropy based features to test performance of five classification algorithms. Figure 1 shows the proposed block diagram of TQWT sub-band’s spectral and entropy feature based seizure classification system. As shown in figure, the features for two TQWT sub bands namely, sub-band 1 and sub-band 16 are taken to consideration for the feature extraction from normal and seizure EEG signals. The oscillatory information contained in the signal is reflected in the TQWT sub bands with low frequency content represented in the first sub band and the last sub band representing the high frequency oscillation. The EEG signal decomposition technique quantifies the sub band spectral and entropy features for low and high frequencies and this can be a widespread method to detect seizures from other EEG recording with appropriate choice of the Q-parameter. The efficacy of features extraction and classification D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 111 algorithms is tested using computational complexity and the effectiveness of the proposed technique for hardware implementation. The performance of the designed algorithm is tested on the real time data recorded from epileptic and normal patients from a local hospital. The efficacy of the tested method motivates to build an automated seizure detection system to assist neurologists to diagnose the epilepsy and related disorders with improved accuracy. Figure 1. Flow diagram of the structure representing a step-by-step idea of the proposed method The remaining part of the paper is arranged as follows: In the following section, the different methods are reviewed that are currently used to detect epilepsy. In Section 2, a review on state-of- the-art algorithms of EEG signal feature extraction and classification is presented for seizure detection system. The results obtained using the TQWT feature extraction and classification techniques are illustrated in the section 3 and discussion on comparison between various methods is presented in section 4. Section 5 concludes the paper. 2. Literature Review This paper reviews seizure detection systems that have been developed using the EEG database from University of Bonn, Germany (R. G. Andrzejak,2001). The decomposition of EEG signals have been implemented time, frequency, joint time-frequency, and nonlinear domain. The statistical and entropy based parameters are used as the key features in machine learning algorithms to separate signals. 2.1. Datasets The dataset used in this work is mainly consisting of five subsets, out of which two cases, namely, normal (set N), and seizure (set S) were considered for the system developed in this work. The EEG signals of subsets N was recorded in normal signal intervals for 5 patients (subjects) in the epileptogenic region. The subset-S is composed of EEG signals with seizure action recorded from all the electrodes showing seizure activity. The bio potentials of subsets N and S were recorded intracranially. The 128-channel amplification system is used to record the signals with an average Seizure / Non Seizure Feature Extraction Spectral Features Kraskov Entropy Shannon Entropy Multilayer Classifiers MLPNN LS-SVM random forest KNN Naïve Bayes random  Means Square Error (MSE)  Computational Time  Classification Accuracy Single Channel EEG Set-N Set-S  P-value  Wrapper test BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 112 common. To record the signal, depth electrodes were inserted symmetrically into the hippocampal formations. The basal and lateral regions of the neocortex were used to insert strip electrodes. The EEG was recorded during epileptic seizures is termed as ictal activity. Figure. 2–3 show sample recordings of datasets N and S respectively. Each dataset contains 100 single channel recording with a 4097 samples and a sampling rate of 173.6 Hz. Figure 2. Normal EEG signal sample Figure 3. Epileptic EEG signal sample Another clinical data was collected from patients receiving routine EEG examinations at Dr. Mohire’s Neurology Research Centre, Kolhapur, India. In total, routine EEG data from 30 participants (15 men and 15 women) was included whose ages varied from 12 to 25 years. There are normal and seizure activity in the recorded EEG data signal. The dataset includes signals of patients who suffered headache but diagnosis did not reveal any kind of epilepsy. The written permission was obtained from the patients and the neurologist to approve the research work. To find the latent D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 113 of the technique in real-time, training and test samples were divided into 70% and 30% respectively. 2.2. Pre-processing The process of 10-20 EEG recording system can suffer from contaminations and thus generate ambiguous signal affected by various noise sources. Although the vigilant design of the recording scheme and apt recording of signal procedures will be able to minimize the noise generated, a number of electrophysiological signals, mainly electrooculogram (EOG), needs careful elimination using signal processing techniques. The main cause of electrical spike generation during eye blink is an electrical dipole in the human eye which is a result of a positive cornea and negative retina. The Independent Component Analysis (ICA) was independently used to filter artifacts from EEG signals by (Maarten Mennes et al., 2010, Manousos A. Klados et al., 2011, Nadia Mammone, 2012). The combination of Discrete Wavelet Transform (DWT) and ICA is studied by (Mingai Li, 2012, Nadia Mammone et al., 2014). Another technique of adaptive noise cancellation using DWT was tested as a online predictive tool for ocular artifacts (Hong Peng, 2013). A method (Qinglin Zhao, 2014) demonstrates the elimination of ocular artifact by utilizing the Adaptive Predictive Filter (APF) techniques to improve true EEG by finding EEG eye movement artifact signal amplitudes. An automated online filtering is developed by using a combination of wavelet decomposition, ICA, and thresholding. The adaptive filtering has been compared with the ICA and PCA based methods in one of the studies to find computationally efficient method (2016, Dattaprasad A. Torse). The use of joint time-frequency domain approaches to process the EEG signals and preserve the useful information is explored by means of Empirical Mode Decomposition (EMD) (Gang Wang, 2016). The EMD method outperform the wavelet based methods as in EMD the signal is sifted for the predefined levels. The section of preprocessing stage depends on the application, dataset type used and complexity of the algorithm in terms of computation time. 2.3. Feature Extraction In many research papers, use of University of Bonn, Germany, database is made to categorize normal and epileptic EEG signals by means of pattern classifiers such as supervised and unsupervised. In (İnan Gűler, 2005, Pari Jahankhani, 2006, Hojjat Adeli, 2007), wavelet based coefficients have been used as features and classifiers were tested for classification of EEG into normal and seizure states. Most of the studies have aimed on implementation using software tools against the development of low cost, computationally efficient hardware for seizure detection. The combination of wavelet and PCA and Independent Component Analysis (ICA) with SVM classifiers was studied to classify EEG signals (Subasi and Gursoy, 2010). The epileptic EEG was analyzed with entropy features using Principal Component Analysis (PCA) enhanced Radial Basis Function (RBF) neural network and Support Vector Machine (SVM) classifier by (Ghosh-Dastidar et al., 2008, Oliver Faust et al., 2010) respectively. The Discrete Wavelet Transform (DWT) has been extensively used to decompose EEG signal into different sub-bands to find detail and approximation coefficients (Tapan Gandhi, 2011, Rami J. Oweis, 2011, Esma Sezer, 2012). The kNN classifier was tested for DWT based variance features (Shufang Li, 2013). The 1-NN classifier resulted in the overall accuracy of 99% as compared to complicated SVM classifier. The DWT- entropy feature set resulted 100% accuracy in another study by (Yantindra Kumar, 2014). Using multiple classifiers the performance of DWT and entropy is assesses in (Oliver Faust, 2015). In another study on DWT (Jiang-Ling Song, 2016), automated detection of epileptic EEGs is presented using a novel fusion features which characterize the similarity between signals and extreme learning machine. The LS-SVM, feed forward multi-layer perceptron neural network, kNN classifiers have been considered to identify epileptic seizure using Wavelet Packet Decomposition (WPD). Five mental tasks were classified using SVM for categorization of epileptic seizures using BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 114 WPD and approximate entropy (ApEn) and sample entropy (SampEn) (Yong Zhang et al., 2015). The effect of WPD based log energy entropy is studied by (Raghu, Sriraam, 2015). 2.4. Tunable Q Wavelet Transform In the literature, existing time-frequency domain analysis techniques are providing constant Q-factor in accordance with the signal. On the other hand, TQWT can tune the wavelet’s Q-factor in accordance with the signal. In TQWT it is apparent that the EEG signal under study can be decomposed into sub-bands with different properties. Thus TQWT is a flexible and fully discrete wavelet transform that is particularly suitable for analyzing oscillatory signals (IW Selesnick, 2011). It achieves flexibility by adjusting its input parameters such as Q-factor (Q), rate of over- sampling or redundancy (r), and number of levels of decomposition (J). The excessive noise that is unwanted is controlled by the parameter ‘r’ in order to localize the wavelet temporally. Without affecting the shape of the decomposed signal ‘Q’ controls the number of oscillations of the wavelet. TQWT decomposes an input signal x(n) into (J+1) sub-bands for ‘J’ level of decomposition by employing two channel filter banks in an iterative method. The two channel filter banks are applied to the low-pass sub-band. In every stage, x(n) is decomposed into 0 ( )l n and 0 ( )h n . The 0 ( )l n is low-pass sub-bands and 0 ( )h n high-pass sub-bands. The low and high pass sub-bands have sampling frequency of fs and fs respectively. The scaling factors are denoted by α and β and fs is the sampling frequency of x(n). The Low-pass filter ( )0H  and ( )1H  , with low-pass scaling α and high-pass scaling β are applied to produce 0 ( )l n and 0 ( )h n . However, perfect reconstruction is ensure without redundancy when α and β have relations given by: 0 < < 1, 0 <    1 and α + β > 1. The ( ) 0 j H  and ( ) 1 j H  are the equivalent frequency response generated after J-level for low and high-pass sub-bands respectively and mathematically represented as: 1 ( ) 0 0( )0 0 J H ifnJ nH J if                  (1) 2 ( ),11 0 0 1 1 ( ) (1 )1 [ , ] J H H mJ n J J J H if else                               (2) where, ( )0 ( 1) 1 H                () 1 ( ) 1 H               () D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 115 The ( )i is the Daubechies filter’s frequency response with two vanishing moments given by: ( ) 0.5(1 cos( )) 2 cos( ) , i i i i     (3) The scaling factors are related to ‘r’ and ‘Q’ parameters as follows: 2 1 r and Q         (4) The prime reason to select TQWT over other time-frequency techniques is due to various advantages of TQWT. Firstly, when a signal with little or no oscillatory activities, e.g. EEG, needs analysis demands a wavelet transform having a low Q-factor. On contrary, oscillatory signals demand relatively high Q-factor. However, majority of the wavelet transforms are unable to tune the Q-factors for signals containing varied oscillatory behavior. TQWT solves the difficulty by adjustment in Q-factors. Secondly, TQWT has been applied for the analysis of various biomedical signals (Hasan Ahnaf, 2016, Ram Bilas Pachori, 2016, Abhijit Bhattacharyya, 2017, Shivnarayan Patidar, 2017). The “rational transfer functions” of the filters in TQWT enhance computational efficiency and enables a perfect reconstruction of wavelet transform. These advantages motivate the use of TQWT in decomposing the EEG signal into sub-bands for further processing in the proposed scheme. In this paper, computational complexity of the algorithm is analyzed in order to improve the current Information Transfer rate of current BCI based seizure detection system. 2.5. Spectral and Entropy Features The idea of the paper is to test the classification performance of five classifiers namely, Multilayer Perceptron Neural Network (MLPNN), Least Square Support Vector Machine (LS- SVM), Naïve Bayes (NB), k Nearest Neighbour (kNN), and Random Forest (RF). For this purpose, spectral domain features are used to build primary feature space by combining minimum and maximum value, mean, median and standard deviation of EEG signals. This primary feature subset is combined with entropy features to build robust feature vector. The minimum and maximum values of EEG signals decomposed using TQWT are separately stored to create the first feature vector. The second feature vector is build using mean frequency, median frequency of the sub bands. In the third set only standard deviation of the absolute value of the coefficients are computed for each TQWT sub-bands. The standard deviation is effective frequency domain feature as it represents the average deviation of a random nature signal (Phinyomark A. et al.,2102). The Mean Frequency (MF) is an average value of frequency computed as the summation of product of the EEG signal power spectrum divided by the total summation of the power spectrum (Phinyomark et al., 2012). The MF is also explained as the central frequency (fc) in (Du & Vuskovic, 2004). The definition of mean frequency (MF) is given by: 1 1 N f Pi iiMF N Pii     (5) where fj is the frequency component of EEG power spectrum at the frequency bin i, Pi is the EEG power spectrum at the frequency bin i, and N is the length of frequency bin. In the study of EEG signal in time domain, N is usually defined as the next power of 2 from the length of EEG data. Median Frequency (MFR) is a frequency at which the EEG power spectrum is divided into two regions with equal amplitude (Phinyomark A., et al., 2012a). MFR is moreover described as half of the total power. The description of MFR is as follows: 1 1 MFR N N P P Pi i ii MFRi i       (6) BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 116 In the standard deviation the averaging is computed with power and not using amplitude. It is calculated, first by squaring each of the deviations previous to computation of the average. In the final step, the square root is considered to balance for the initial squaring. The standard deviation is given by: 2112 ( ) 1 0 N xiN i        (7) where μ is the mean. 2.5.1 Shannon Entropy For a given discrete probability distribution, Shannon entropy (ShEn) is defined as a measure of on average how much information is necessary to identify random samples from the given distribution. ShEn gives the average information present in the EEG signal. It uses a non-normalized method for estimation of entropy (Shannon, 1948) based on energy content in the EEG. The non- normalization Shannon entropy is given by: (Shannon, 1948), ( ) ( ) log ( )ShEn H x P x p x x    (8) As shown in Figure. 4, the ShEn feature values are negative due to the fact that the difference in the entropy parameters is negative. 2.5.2. Kraskov Mutual Information Normally, the seizure detection is carried out by extracting a set of analytic features extracted from EEG signals. These features need to fulfill similarities for signals of same class and represent variations for different class. The hidden nonlinearity and non-stationarity in EEG signals often results in variations in analyzing these signals for extracting features. The time-frequency signal representations using wavelet transform based features is a suitable method for analyzing nonlinear and non-stationary EEG signals. Recently, the Kraskov entropy based nonlinear features were developed and found applicable in EEG signal analysis (K.A. Veselkov, et al., 2010, A. Kraskov, et al., 2008). For measuring and characterizing nonlinearities of EEG signals many authors have employed the Kraskov entropy. It measures the Shannon entropy or differential statistical entropy of the EEG signals using the kNN sample with some distance measures such as Euclidean or Hamming. Figure 4. Plot of the Shannon entropy values for normal and seizure signal’s high and low frequency sub bands D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 117 Figure 5. Plot of the Kraskov entropy values for normal and seizure signal’s high and low frequency sub bands In continuous-time domain, the differential statistical entropy of d-dimensional random variable X with unknown density function f(x) is defined as: ( ) ( ) log ( )H x dx x x   (9) The density function dx can be obtained by probability distribution function for the distance between xi and the k-nearest neighbor samples. The above equation results in Kraskov entropy by measuring the k-NN entropy. It can be expressed as follows (Kraskov, et al., 2008): ˆ ( ) ( ) ( ) log( ) log ( ) 0 Nd H X k N C i d N i          (10) where (x1, x2, x3, . . ., xn) are n random samples of d-dimensional random variable X, ϕ(t) symbolizes the digamma function, Cd represents the volume of the d-dimensional unit ball that depends on the sample space. The term ( )i is the distance between sample xi and its kNN sample points in d- dimensional sample space. A more detailed explanation on mathematical aspects and other application is available from (Kraskov, et al., 2008). In Figure. 5, the KraEn features values are plotted for normal and seizure signal’s low and high frequency sub bands. 2.6. Classification In machine learning, classification of EEG signals deal with the task of categorizing a set of classes to which a new reading belongs, on the basis of a training set of EEG feature set containing occurrences whose class relationship is identified. Based on the spectral and entropy features of TQWT sub bands, the classification of EEG signals is tested using five classifiers (Bishop, Christopher M., 2006). 2.6.1 Multilayer Perceptron Neural Network (MLPNN) The MLPNNs with the ability to learn and generalize are most commonly used classifiers in EEG analysis and seizure detection. They need smaller training set and work fast with less complexity in implementation (Haykin, 2001). In the MLPNN, every neuron j in the hidden layer sums its input signals by multiplying the signal xi and the strengths of the individual correlation weights wpq. The output yq is then computed as a function of the summation as:  q qp py W x (11) BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 118 where f is the activation function. The activation function is to be selected depending on the application as a sigmoid or radial basis function. The addition of differences between the desired and actual values of the output neurons is squared and computed as: 1 2 ( ) 2 q q Sum y ydq  (12) where ydq is the desired value of output neuron q and yq is the actual output of the neuron. Each weight wqp is tuned to decrease Sum as fast as possible. Depending on the training algorithm employed the wqp value is set for the computation (Haykin, 2001). The ANN model development primarily focuses on the training algorithms. A suitable training algorithm can result in the better model for prediction. An optimized training algorithm can also reduce the training time and provide a better accuracy. There are several training algorithms used to train a MLPNN and the most often used is the backpropagation algorithm (Haykin, 2001). The backpropagation algorithm is relatively easy to implement in which a search of an error surface using gradient descent is carried out. However, the error remaining in the local minima for indefinite time is a major challenge in the backpropagation algorithm. The long training sessions is another major concern in the backpropagation algorithm. Therefore, a lot of deviations to advance the convergence of the backpropagation were proposed in the literature. 2.6.2 Least Square Support Vector Machine (LS-SVM) The focus in this paper is on the binary classification and LS-SVMs are least squares of SVM, perfectly suited for the application. In LS-SVM, a set of related supervised learning techniques analyze data, recognize patterns, and hence are used for classification and regression analysis. The method is based on using a quadratic error criterion with equality constraints as an alternative to a convex quadratic programming (QP) problem for classical SVMs. Least squares SVM classifiers, were proposed by (Suykens, 2002). The LS-SVMs are a class of kernel-based learning methods. The SVM is assuring classifier that reduces the error and increases the boundary to classify the data by separating hyper plane. The LSSVM is least squares formulation of SVM and contains the equality constraints. For two-class SVM, decision function is as follows (Suykens, 2002): ( ) [ ( ) ] T f x Sign g x b  (13) where w - d-dimensional weight vector, b - bias, and g(x) is a function that maps x into the d- dimensional space. To obtain w and b values, the subsequent optimization problem can be created in the subsequent way: 1 2 ( , , ) 2 2 0 NyT J b e e i i       (14) Minimize subject to equality constraints [ ( ) ] 1 , 1, 2, 3, ..., T y g x b e i Ni i i     (15) where xi and yi are N i/o pairs. 2.6.3. Naïve Bayes The Bayesian classifier is a supervised learning technique and a statistical method for classification. It is one of the classification algorithms that apply density estimation to the data to work out diagnostic and predictive problems. This classifier uses Bayes theorem, and naively presupposes that the predictors are conditionally independent for the given class of data. Despite the fact that the hypothesis is usually desecrated in practice, this classifiers yields posterior distributions D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 119 those are stout to biased class density estimates, predominantly where the posterior is 0.5 (the decision boundary) (Hastie, T., R., 2008). The naive Bayes assumption is that all the features are conditionally independent given the class label: ( ) ( ) 1 D P x y c p x y cii      (16) These classifiers allocate observations to the most probable class. The algorithm works as: 1. An estimation of the densities of predictors inside each class. 2. Modeling subsequent probabilities according to Bayes theorem, i.e., for all k = 1,...,K, ( ) ( ) 1 ˆ ( , ..., )1 ( ) ( ) 11 p Y k X Y kjj p Y k X X p pk Y k X Y kjjk              (17) where: Y is the random variable equivalent to the class index of a sample set, X1,...,XP are the random predictors, and π(Y=k) is the former probability for a class index k. 3. Classification of the sample space by finding the posterior probability for all classes, and then assigning the sample space to the a class resulting in the highest subsequent probability (Hastie, T., R., 2008). 2.6.4. k Nearest Neighbour kNN is a non-parametric supervised learning algorithm (D. T. Larose, 2004) in which use of class labels is made to stores all available cases and classify new data based on a similarity (distance function) measure. For the new sample data to be tested k number of training data closest to the test sample are estimated and the class that is most familiar amongst that k nearest neighbors is allocated as the class to the new test data. In this paper, the K nearest neighbors have been varied from 2 to 6 and achieved the highest accuracy for k = 2. The distance was calculated by means of Euclidean distance similarity measure. The kNN are an example of instance based supervised learning in which based on the number of nearest neighbour value k is used to make the classification order to memorize the training set data. The classifier takes the decision about the class label based on the computation done in the previous step. In this work, the combination of stand deviation and Kraskov entropy features for the first and sixteenth sub band of the TQWT decomposed EEG signals were used to form the feature vector for kNN classifier. 2.6.5. Random Forest A significant enhancement in classification accuracy are resulted from developing an ensemble of trees and allowing the vote for the most relevant class. Random forest is an ensemble tool that takes a subset of features and a subset of class variables to construct a decision trees. A merger of such multiple decision tree is achieved to obtain a more precise and steady classification. The bagging techniques is used to develop the ensembles by repeatedly producing the random vectors to administer the growth of every tree in the group (Breiman, 1996). Another techniques used to build the ensemble is to split the selection randomly from k best splits (Dietterich, 1998). In the optimization of RFs, a new training sets was produced by randomizing the outputs in the original training set (Breiman, 1999). In several papers on “the random subspace”, a technique was presented that does a random selection of a subset of features to use to develop each tree (Ho, 1998). The RF is a practically suited classification algorithm as:  RFs are non-parametric and can model arbitrarily complex relations between inputs and outputs, without any a aforementioned assumption;  RF can classify nonlinear and nonstationary EEG data; BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 120  RF are strong to errors in classes and are easily interpretable. For the given an assembly of classifiers h1(x), h2(x), . . . , hK (x), and by the training set drawn at random from the distribution of the random vector Y, X, describe the margin function as ( , ) ( ( ) ) max ( ( ) )mg X Y a I h X Y a I h X jk k k kj Y       (18) where I(·) is the indicator function. The margin measures the average number of votes at X, Y for the right class surpasses the average vote for any new class. For the classification to be promising, margin should be more. The generalization error is as follows: * ( , ) 0) , PE P mg X Y X Y   (19) where the subscripts X, Y indicate that the probability is over the X, Y space. In random forests, ( ) ( , )h X h X k k   . 3. Results and discussions The automated seizure detection algorithm was developed and tested using two sub sets from Bonn University dataset. The efficacy of the designed algorithms has also been verified using EEG data recorded from a local hospital. The sample normal and epileptic EEG signals of 23.6 seconds duration are shown in Figure 2 and 3 respectively. Figure 6. 16 sub bands of sample normal EEG signal decomposed using TQWT D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 121 Figure 7. 16 sub bands of sample epileptic EEG signal decomposed using TQWT The selection of most encouraging value of Q and J is a significant step in signal decomposition using TQWT. To select the optimum value of Q and J, numbers of experiments were performed by taking only Shannon and Kraskov entropy features as a result of maximum classification accuracy. Further, the values of J were varied by keeping Q = 3. It was noted that the highest possible value of J with Q = 1 is 15. Hence, the value of J is varied from 3 to 15 and Shannon and Kraskov entropies were computed for each subband. Then, the classification was performed for varied values of J. The 16 sub bands of normal and ictal EEG data for a sample data are plotted in Figure 6 and Figure 7 respectively. It can be noticed that at J = 15, maximum variation in low and high frequency sub bands was obtained which resulted in more relevant entropy parameters and improved classification accuracy. Therefore, J = 15 was selected for additional experiments. After selecting J = 15, the value of Q was varied in steps from 1 to 3 and entropies are computed from 16 subbands for each value of Q. The plot of frequency response for normal and seizure sample EEG signal is depicted in Figure 8. The increase in r, while keeping Q unchanged, has shown the effect of increase in the overlap between adjacent frequency responses. The parameter r has no effect on the general shape of the wavelet of frequency response as they are governed by Q. With r>3, the number of levels J need to be increased to cover the same frequency range as a result of the increased overlap. The figures show the frequency responses on log frequency scale with an r of 3. BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 122 (a) (b) Figure 8. Frequency response of TQWT sub bands for (a)normal, (b)seizure EEG signal sample The Figure 9 shows wavelets for 3-15 sub bands for r = 3. The figures display the wavelets and the frequency responses when Q is set to 3.0. (a) (b) Figure 9. Wavelets for 3-15 sub bands for (a) normal and (b) epileptic EEG signal sample It is to be noted that the frequency responses are more narrow for Q =3, compared to the cases where Q was set to 1.0. With increasing Q from 1.0 to 3.0, more stages are needed to span the same frequency range because each frequency response is narrower. Here, we made use of 16 stages divided into 2 parts for the purpose of display. The figures show a sample normal and seizure EEG signal decomposed using the TQWT, displays the subbands, and the distribution of energy across subbands. The seizure sample signal has a more oscillatory behavior than the normal signal used in the demonstration. It is interesting to note that out of total 16 stages specified the high-pass subbands (subband 1-8) have negligible energy as compared to the low-pass sub band (sub band-16). Because the first eight subbands have D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 123 essential zero energy, it was decided not to compute entropy features for these sub bands for normal signals. Figure 10. Energy distribution for sub bands(1-4 and 13-16) for (a) normal, (b) Seizure EEG signal sample The subscript in Subband-1 denotes the first level of subband. The subbands from Subband- 1 to Subband-16 are in decreasing order of frequencies. First 15 subbands were reconstructed from the detail coefficients, and 16th subband was reconstructed from the approximate coefficients. In this way, the values of Q and J were selected for further experiments. The different features were computed from all subbands and experiments were performed to find the best combination of features using various ranking methods and five different classifiers. Further, with Q = 3 and J = 15 different nonlinear features are computed from each decomposed subband. All the computed features are combined to form a feature set of size 140×4 . The typical range of spectral and entropy features obtained from sub band 1-16 are shown in Table 1 and 2 respectively. The p-value statistical test (T. Dahiru, 2008) was applied to examine the discrimination ability of various features. Apart from spectral features, all other features were found to be significant with less p-values (p < 0.05) indicating their suitability for good discrimination of normal and seizure EEG signals. Further, the features were ranked using Receiver Operating Characteristics (ROC) (Zweig Mark H., 1993). The classification stage followed the steps where decomposition of TQWT and computation of spectral and entropy features was achieved for both normal and seizure types of EEG signals. After feature computation the next task was to divide features into training and test datasets for the performing classification of signal into two classes. To improve the classification performance and BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 124 to provide faster and more cost-effective classifier the selection of the optimal set of both the feature sets is important. The p-test alone was not suited to describe the efficacy of the features employed in the classification. Hence, the method called wrapper-based feature selection, suggested in (Kohavi Ron, 1997), was employed for feature selection. The performance of five classifiers have been tested. The performance parameters of classification methods, such as specificity (Spec), sensitivity (Sens) and accuracy (Acc), were found by the 10-fold cross-validation process (Van Stralen Karlijn J., et al., 2009). The 10-fold cross-validation process has been used in this work to get unbiased performance of the classifier. Along with the above classifiers the wrapper-based feature selection algorithms have been carried out by the MATLAB machine learning and statistics toolbox (MATLAB, 2016) software on Intel ® CoreTM i5-7200 U CPU @ 2.5 GHz with 8 GB RAM. The parameters of LSSVM and RBF kernel like cost and gamma were set to 1 for the selected number of features. The performance evaluation of the proposed Shannon and Kraskov entropy features has been presented for EEG classification. The distinguishing ability of the five classifiers have been tested using the two classes (normal, and seizure). The Shannon and Kraskov entropies of decomposed EEG signals have been computed and used as a separate feature set as compared to the spectral features set. Table 1. Spectral Features of the extracted TQWT sub bands (Q = 3, R = 3, J = 16). Mean Standard Deviation Scale No. Normal Seizure Normal Seizure 1 9.834927 8329.427 11.56066 71.58755 2 58.77755 11110.04 23.84113 65.25688 3 47.77503 8932.082 20.76151 85.51468 4 10.30816 446.1276 20.08941 114.6693 5 130.0165 1539.636 428.3669 99.01905 6 6.069761 243.3806 24.44858 106.087 7 34.06279 5926.254 16.71747 95.69294 8 33.06874 1857.898 11.7173 76.64777 9 522.3913 2032.91 35.768 71.35347 10 72.81399 28592.28 11.83377 251.9867 11 16.94179 7297.517 6.223891 298.7179 12 13.56822 3044.189 23.99828 116.3269 13 7.512193 7857.243 6.081293 132.1923 14 7.613255 426.0846 13.88737 287.1622 15 6.310997 552.7713 113.0568 282.8694 16 48.03316 86.0241 13.83567 79.17186 In Table 1, the spectral features, namely, mean and standard deviation are projected. It signifies the computed value of SD by the projected technique for normal categories of EEG signals are less significant than expected, because of the absence of abnormalities present in signals. Table 2. Entropies of the extracted TQWT sub bands (Q = 3, R = 3, J = 16). Shannon Entropy Kraskov Entropy Scale No. Normal Seizure Normal Seizure 1 -7832.583117 -2120182.678 1.754068 3.25435 2 -19938.80744 -5836813.985 1.977948 4.173407 3 -21701.92126 -6969011.533 2.047737 3.642304 4 -31714.94771 -153011.0566 2.197772 2.397555 5 -1111243.781 -2075609.65 3.687368 3.24926 6 -133957.6349 -102607.3996 1.663588 1.940963 7 -33054.87133 -4459470.757 2.26557 2.829904 D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 125 8 -11657.37693 -1497161.665 1.949023 2.905165 9 -39067.18007 -622563.9997 2.312583 3.393844 10 -10312.37088 -5192562.954 1.677248 3.961785 11 -11278.44848 -5773207.133 1.815736 3.598474 12 -26697.77175 -3530204.881 2.095272 3.058711 13 -4380.662994 -5064415.974 1.443809 3.797015 14 -8794.364987 -76324.63628 1.803797 2.19443 15 -310926.8264 -106834.1615 3.172345 2.015378 16 -18999.05462 -15945.06889 2.065524 1.944525 In Table 2, the Shannon and Kraskov entropy values have been presented for normal and seizure signals. The use of MLPNN classifier depends on the number of iterations and the learning rate used for a specific transfer function employed in the design. In this work, the highest classification of normal and seizure signal using MLPNN reported was 92.5% for the Shannon entropy features. In the classification task, the entropy features outperformed the spectral feature. It was inferred from the MLPNN study that the tan-sigmoid and pure linear transfer functions were best possible for the application with backpropagation algorithm for the learning purpose. Two different datasets were tested using MLPNN for the learning rate of varying between 0.1 to 0.4 and the classification tasks were assessed using sensitivity, specificity and accuracy. The simulation results showed that the classification accuracy indirectly varies with the mutual range of entropy features, p-value, features selected using the wrapper test and Mean Square Error (MSE). Though, reasonably good classification accuracies were achieved from the five classifiers, the classification accuracy for ictal and non-ictal (S-N) EEG signals using KNN and naïve Bayes classifiers are 86.1% and 84.6%, which are significantly low. Figure 11.Receiver Operating Characteristics (ROC) graphs for LS-SVM, naïve Bayes and random forest classifiers As compared to conventional KNN and naïve bayes classifiers, the performance of the proposed random forest classifier with feature selected using wrapper test is very promising. In Table 3, a comparison of classifier performances are presented using proposed features. In this technique, the parameter (Q) is varied to obtain enhanced discrimination between two classes. Many decomposition levels of TQWT are set to 8 and 16, respectively, to set the significance of the number of sub-bands in the entropy based classification. The ROC curves for LS-SVM, naïve Bayes and KNN classifiers is shown in Figure 11. In all the classifiers, the accuracy increases by taking into account J = 16. For Q = 3, R = 3 and J = 16, major enhancement in classification accuracies have been achieved for random forest BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 126 classifier. Many classification tasks, like normal-seizure and seizure-interictal have shown promising results for entropy feature. The maximum accuracy of 97.3% was attained for naïve bayes and kNN classification tasks. The maximum classification accuracy of 97.3% was obtained using random forest classifier. For the normal-seizure classification, the highest classification accuracy obtained was 98% with the Kraskov entropy with decomposition using TQWT for the values specifies as Q = 3, R = 3 and J = 16. The performance of classifier through the combination of Shannon and Kraskov entropy features is also noteworthy. Table 3. Classification performance of MLPNN, LS-SVM, naïve Bayes, KNN, and random forest classifier Classifier Sens(%) Spec(%) Acc(%) MLPNN 91.5 90.5 92.5 LS-SVM 94.5 93.2 95.6 naïve Bayes 85.2 85.3 84.4 KNN 87.3 84.3 86.1 random forest 94.5 95.6 97.3 Table 3 revels the classification accuracies for the majority of classifiers presented in this paper. The Table 3 presents comparison of the proposed entropy features based classifiers and their classification performance on the EEG database acquired from Dr. D. M. Mohire’s Neurology Research Centre, Kolhapur, INDIA. In most of the cases, the combined entropy features have shown equivalent performance with the optimum algorithms described in the literature. The proposed system may prove useful in the detection of seizures and aid the neurologists to take accurate diagnostic decision pertaining to the epileptic seizure disorders. 4. Conclusion Manual monitoring of EEG to diagnose epilepsy is a very challenging task with cumbersome work of observing long recordings and decision making through experience. The automated seizure detection is a promising tool for neurologists in making epilepsy diagnosis. In this paper, an automated method based on varied values of Q is developed that decomposes EEG signal, computes Shannon and Kraskov entropies and detects seizure signal using random forest. The method achieved classification accuracy of 97.3% and sensitivity and selectivity of 94.5% and 95.6% respectively. A literature survey is presented on the current studies that are related to single channel seizure detection. A comparison table showing different seizure detection methods show that most techniques use joint time-frequency domain signal decomposition and entropy features. Several research papers have explored multi-domain features to build robust feature space. It is clear also that the TQWT is a promising trend for seizure detection and prediction that needs further investigation. The separation of normal and seizure event based on the extracted features is achieved by employing state-of-the-art machine learning classifiers. Majority of the studies focus on improving classification accuracy using remotely accessed resources. However, there is increasing demand to implement the algorithms on local embedded system to reduce computational complexity. The main goal behind this review is to explore the field of EEG signal analysis in real time and accept the same to detect epileptic disorders using EEG recordings. References Buck, D., Baker, G.A., Jacoby, A., Smith, D.F. & Chadwick, D.W. (1997). Patients’ experiences of injury as a result of epilepsy. Epilepsia. 38 (4), 439–444. Englander, Jeffrey, et al. Seizures after traumatic brain injury. Archives of physical medicine and rehabilitation 95.6 (2014): 1223. Acharya, U. & Rajendra, et al. (2013). Automated EEG analysis of epilepsy: a review. Knowledge- Based Systems. 45, 147-165. Sanei, Saeid, & Jonathon A. Chambers. (2013). EEG signal processing. John Wiley & Sons,. D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 127 Selesnick, Ivan W. Wavelet transform with tunable Q-factor. (2011). IEEE transactions on signal processing 59.8, 3560-3575. Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P. & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state, Physical Review. 64, 8 pages. Mennes, Maarten, et al. (2010). Validation of ICA as a tool to remove eye movement artifacts from EEG/ERP. Psychophysiology 47.6 , 1142-1150. Klados, Manousos A., et al. REG-ICA: a hybrid methodology combining blind source separation and regression techniques for the rejection of ocular artifacts. Biomedical Signal Processing and Control 6.3 (2011): 291-300. Mammone, Nadia, Fabio La Foresta, and Francesco Carlo Morabito. (2012). Automatic artifact rejection from multichannel scalp EEG by wavelet ICA. IEEE Sensors Journal 12.3, 533- 542. Li, Mingai, Yan, C. & Jinfu, Y. (2013). Automatic removal of ocular artifact from EEG with DWT and ICA Method. Applied Mathematics & Information Sciences 7.2, 809. Mammone, Nadia, & Francesco C. Morabito. (2014). Enhanced automatic wavelet independent component analysis for electroencephalographic artifact removal. Entropy 16.12 6553-6572. Peng, Hong, et al. Removal of ocular artifacts in EEG—An improved approach combining DWT and ANC for portable applications. IEEE journal of biomedical and health informatics 17.3 (2013): 600-607. Zhao, Qinglin, et al. (2014). Automatic identification and removal of ocular artifacts in EEG— improved adaptive predictor filtering for portable applications. IEEE transactions on nanobioscience 13.2, 109-117. Torse, D. A. & Veena V. D., (2016). Design of adaptive EEG preprocessing algorithm for neurofeedback system. Communication and Signal Processing (ICCSP) International Conference on IEEE. 392-395. Wang, Gang, et al. (2016). The removal of EOG artifacts from EEG signals using independent component analysis and multivariate empirical mode decomposition. IEEE journal of biomedical and health informatics 20.5, 1301-1308. Güler, Inan, & Elif Derya Übeyli. (2005). Adaptive neuro-fuzzy inference system for classification of EEG signals using wavelet coefficients. Journal of neuroscience methods 148.2 113-121. Jahankhani, Pari, Vassilis Kodogiannis, and Kenneth Revett. (2006). EEG signal classification using wavelet feature extraction and neural networks. Modern Computing, 2006. JVA'06. IEEE John Vincent Atanasoff International Symposium on. IEEE. Ghosh-Dastidar, Samanwoy, and Hojjat Adeli. (2007). Improved spiking neural networks for EEG classification and epilepsy and seizure detection. Integrated Computer-Aided Engineering 14.3, 187-212. Subasi, Abdulhamit, and M. Ismail Gursoy. (2010). EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Systems with Applications 37.12, 8659-8666. Ghosh-Dastidar, Samanwoy, Hojjat Adeli, and Nahid Dadmehr. (2008). Principal component analysis-enhanced cosine radial basis function neural network for robust epilepsy and seizure detection. IEEE Transactions on Biomedical Engineering 55.2, 512-518. Faust, Oliver, et al. (2010). Automatic identification of epileptic and background EEG signals using frequency domain parameters. International journal of neural systems 20.02, 159-176. Gandhi, Tapan, Bijay Ketan Panigrahi, and Sneh Anand. (2011), A comparative study of wavelet families for EEG signal classification. Neurocomputing74.17, 3051-3057. Oweis, R.J. & Abdulhey, E.W. (2010). Seizure classification in EEG signals utilizing Hilbert- Huang transform. Biomed. Eng. Online 10, 38 BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 8, Issue 4 (December, 2017), ISSN 2067-8957 128 Işik, Hakan, and Esma Sezer. (2012). Diagnosis of epilepsy from electroencephalography signals using multilayer perceptron and Elman artificial neural networks and wavelet transform. Journal of medical systems36.1, 1-13. Omerhodzic, Ibrahim, et al. (2013). Energy distribution of EEG signals: EEG signal wavelet-neural network classifier. arXiv preprint arXiv:1307.7897 Kumar, Yatindra, M. L. Dewal, and R. S. Anand. Epileptic seizure detection using DWT based fuzzy approximate entropy and support vector machine. Neurocomputing 133 (2014): 271- 279. Faust, Oliver, et al. (2015). Wavelet-based EEG processing for computer-aided seizure detection and epilepsy diagnosis. Seizure 26, 56-64. Song, Jiang-Ling, Wenfeng Hu, and Rui Zhang. (2016). Automated detection of epileptic EEGs using a novel fusion feature and extreme learning machine. Neurocomputing 175, 383-391. Zhang, Yong, et al. (2015). Comparison of classification methods on EEG signals based on wavelet packet decomposition. Neural Computing and Applications 26.5, 1217-1225. Raghu, S., N. Sriraam, and G. Pradeep Kumar. (2015). Effect of wavelet packet log energy entropy on electroencephalogram (EEG) signals. International Journal of Biomedical and Clinical Engineering (IJBCE) 4.1, 32-43. Hassan, Ahnaf Rashik, Siuly Siuly, and Yanchun Zhang. "Epileptic seizure detection in EEG signals using tunable-Q factor wavelet transform and bootstrap aggregating." Computer methods and programs in biomedicine 137 (2016): 247-259. Patidar, Shivnarayan, and Ram Bilas Pachori. "Classification of cardiac sound signals using constrained tunable-Q wavelet transform." Expert Systems with Applications 41.16 (2014): 7161-7170. Bhattacharyya, Abhijit, et al. "Tunable-Q Wavelet Transform Based Multiscale Entropy Measure for Automated Classification of Epileptic EEG Signals." Applied Sciences 7.4 (2017): 385. Patidar, Shivnarayan, and Trilochan Panigrahi. (2017). Detection of epileptic seizure using Kraskov entropy applied on tunable-Q wavelet transform of EEG signals." Biomedical Signal Processing and Control 34, 74-80. Kumar, S. Pravin, et al. (2010). Entropies based detection of epileptic seizures with artificial neural network classifiers. Expert Systems with Applications 37.4, 3284-3291. Phinyomark, Angkoon, et al. (2012).The usefulness of mean and median frequencies in electromyography analysis. Computational intelligence in electromyography analysis-A perspective on current applications and future challenges. Du, Sijiang, & Marko Vuskovic., (2004). Temporal vs. spectral approach to feature extraction from prehensile EMG signals. Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference. Shannon, Claude E. (1948). A note on the concept of entropy. Bell System Tech. J27.3, 379-423. K.A. Veselkov, et al., A(2010). Metabolic entropy approach for measurements of systemic metabolic disruptions in patho-physiological states, J. Proteome Res.9, 3537–3544. Kraskov, A. et al., Estimating Mutual Information, 2008 [Online]. Available:arxiv.org/pdf/cond- mat/0305641. Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. Haykin, Simon S. Neural networks: a comprehensive foundation. Tsinghua University Press, 2001. Suykens, Johan AK, Tony Van Gestel, & Jos De Brabanter. (2002), Least squares support vector machines. World Scientific,. Hastie, T., R. Tibshirani, and J. Friedman. (2008). The Elements of Statistical Learning, Second Edition. NY: Springer,. D. T. Larose, (2004a). Discovering Knowledge in Data: An introduction to data mining, New Jersey, USA: Wiley Interscienceb. Breiman, Leo. (1996). Bagging predictors. Machine learning 24.2, 123-140. D. Torse, V. Desai, R. Khanai - A Review on Seizure Detection Systems with Emphasis on Multi-domain Feature Extraction and Classification using Machine Learning 129 Dietterich, Thomas G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10.7, 1895-1923. Breiman, Leo. (1999). Random forests. UC Berkeley TR567. Ho, Tin Kam. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence 20.8 , 832-844. Dahiru, Tukur. (2008). P-value, a true test of statistical significance? A cautionary note. Annals of Ibadan postgraduate medicine 6.1, 21-26. Zweig, Mark H., & Gregory Campbell. (1993). Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry 39.4, 561-577. Kohavi, Ron, & George H. John. (1997). Wrappers for feature subset selection. Artificial intelligence 97.1-2, 273-324. Van Stralen, Karlijn J., et al. Diagnostic methods I: sensitivity, specificity, and other measures of accuracy. Kidney international 75.12 (2009): 1257-1263. MATLAB and machine learning and Statistics Toolbox Release 2016b, The MathWorks, Inc., Natick, Massachusetts, United States. Dattaprasad Torse (b. April 12, 1979) received his B.E. in Electronics and Telecommunication Engineering (2001), M.E. in Digital Electronics (2005), pursuing PhD in Electronics and Communication from Visvesvaraya Technological University of Belagavi, Karnataka, India. Now, he is Assistant Professor of Electronics and Communication Department of KLS Gogte Institute of Technology, Belagavi, India. His current research interests include different aspects of EEG Signal Processing, EEG analysis and Machine Learning. He has (co-) more than 10 papers, more than 10 conferences participation. He is a member of IEEE and life member of Indian Society for Technical Education (ISTE). Veena Desai (b. August 17, 1969) received her B.E. in Electronics and Communication Engineering (2001), M.Tech in Computer Networking (2005) and PhD (2012) in Electronics and Communication from Visvesvaraya Technological University of Belagavi, Karnataka, India. Now, she is Professor of Electronics and Communication Department of KLS Gogte Institute of Technology, Belagavi, India. Her current research interests include different aspects of Cryptography and network security and machine learning. She has authored more than 30 papers, more than 10 conferences participation. She is a member of IEEE and life member of Indian Society for Technical Education (ISTE). Rajashri Khanai (b. October 15, 1979) received her B.E. in Electronics and Communication Engineering (2000), M.Tech in Digital Communication and Networking (2007) and PhD (2015) in Electronics and Communication from Visvesvaraya Technological University of Belagavi, Karnataka, India. Now, she is Professor of Electronics and Communication Department of KLE’s Dr. M. S. Sheshgiri College of Engineering and Technology, Belagavi, India. Her current research interests include different aspects of error correction coding for wireless communication, biomedical signal processing and machine learning. She has authored more than 20 papers, more than 10 conferences participation. She is a member of IEEE.