BRAIN. Broad Research in Artificial Intelligence and Neuroscience, ISSN 2067-3957, Volume 1, July 2010, Special Issue on Complexity in Sciences and Artificial Intelligence, Eds. B. Iantovics, D. Rǎdoiu, M. Mǎruşteri and M. Dehmer APPLICATION OF t-CHERRY JUNCTION TREES IN PATTERN RECOGNITION Tamás Szántai and Edith Kovács Abstract. Pattern recognition aims to classify data (patterns) based ei- ther on a priori knowledge or on statistical information extracted from the data. In this paper we will concentrate on statistical pattern recognition using a new probabilistic approach which makes possible to select the so called ’in- formative’ features. We develop a pattern recognition algorithm which is based on the conditional independence structure underlying the statistical data. Our method was succesfully applied on a real problem of recognizing Parkinson’s disease on the basis of voice disorders. Keywords: pattern recognition, probabilistic modeling, h-uniform hyper- tree, t-cherry junction tree 2000 Mathematics Subject Classification: 05C65, 62H30, 68T10, 62P10. 40 Tamás Szántai and Edith Kovács - Application of t-cherry junction trees 1. Introduction In this paper we propose a probabilistic approach for statistical pattern recognition. The classifier will be achieved by supervised learning. The classi- fication uses Bayes decision rule. In our approach we estimate the underlying multi-variate probability dis- tribution, by the exploitation of the conditional independencies between the variables (features). This is achieved by fitting to the training data a t-cherry junction tree (see [4], [8]). This represents the novelty of the approach intro- duced here. A problem that occurs is that databases often contain redundant data, which mean a great number of features that may lead to overfitting of the model. This may cause poor generalization (the performance of the model on a new test data). We use the t-cherry junction tree approach to discover the Markov Random Field (MRF) underlying the variables. In an MRF each ran- dom variable depends on all others through the variables of its neighbourhood (see for example in [5]). Throughout this paper we shall use the term of ’pattern’ to denote a d- dimensional data vector x = ( x1, . . . , xi, . . . , xd ) of measurements, where xi is the value of the feature denoted by Xi. We assume that there exist M groups or classes that can be associated with each pattern. The categorical variable which takes values in {1, . . . , M} is denoted by Y . The variables contained in the neighbourhood of Y are called informative variables for Y . Our method can be used for discovering those variables (features) which are informative for the categorical variable Y . In the second section we describe how the t-cherry junction tree approach can be applied to the classification problem. In the third section we deal with a real world problem. We use our approach for selecting features of voice that are able to predict whether a patient has Parkinson’s disease or not. 2. The classification method based on t-cherry junction tree A classifier is constructed on the basis of a training set (x1, y1) , . . . , (xn, yn) and is denoted by gn. For a given x ∈ Rd the value y ∈ {1, . . . , M} of Y is guessed by gn (x; (x1, y1) , . . . , (xn, yn)). The construction of gn in this way is called supervised learning. We assume that (X1, Y1) T , . . . , (Xn, Yn) T is a sequence of independent iden- tically distributed random vectors having the same distribution as (X, Y ) T . 41 Tamás Szántai and Edith Kovács - Application of t-cherry junction trees The training dataset may be the result of experimental observation (methe- orological data, ECG data...) The yi’s could be obtained through measure- ments or through expert who filled yi’s after having xi’s. In order to find the best fitting k-width junction tree, we search in the class of k-width t-cherry junction trees. We intend to find the approximation which minimizes the Kullback-Leibler divergence of the approximation and the true probability distribution. The method for finding a k-th order t-cherry junction tree, using this criterion is presented in [8]. On the basis of the constructed t-cherry junction tree we select the infor- mative features by the following algorithm. Algorithm. Selection of the informative features. 1. Give as input the training data set in discretized form. 2. From the empirical probability distribution determine all the k-th order marginals (it is favorable if k is not to large (less than 7, say). 3. Find the heaviest weighted tree, in sense (see [8]): ∑ (Xi1 ,...,Xic )∈C I ( Xi1 , . . . , Xic ) − ∑ (Xj1 ,...,Xjs )∈S (νj1,...,js − 1) I ( Xj1 , . . . , X js ) → max, where I (Xi1 , . . . , Xic ) is the information content (see [1]) of P (Xi1 , . . . , Xic ) . 4. Output : the set of clusters C and the set of separators S. 5. Select those clusters which contain the variable Y . 6. Select those variables which occur in the clusters selected in step 5 as informative variables. For the set of informative features we introduce the notation Xinf = {Xi1 , . . . , Xir}. The joint probability distribution of the random vector (Xinf , Y )T can be expressed using those marginals only, which correspond to the clusters that contain Y . For the classifier gn we give a formula which depends on the probability distribution of (Xinf , Y ) T . This way for classifying a new d-dimensional vector we use the informative features only. 3. Recognizing Parkinson’s disease from voice disorders Voice disorders can be premonitory of different diseases. This observation makes possible new ways of investigations and diagnosis. The idea to check up our method in discovering the connection between voice disorders and having 42 Tamás Szántai and Edith Kovács - Application of t-cherry junction trees Parkinson’s disease (PD) came from [6]: ”Research has shown that approxi- mately 90% of people with PD exhibit some form of vocal impairment [3], [7]. Vocal impairment may also be one of the earliest indicators for the onset of the illness [2]”. The dataset was created by Max Little of the University of Oxford, in col- laboration with the National Center for Voice and Speech, Denver, Colorado, who recorded the speech signals, and provided by UCI Machine Learning Repository on the internet: http://archive.ics.uci.edu/ml/datasets/Parkinsons. This dataset is composed of a range of biomedical voice measurements from 195 voice recording of 31 people, 23 with Parkinson’s disease (PD). The main aim of the data is to discriminate healthy people from those with PD. We choose randomly from the data a test set which contains 10 recordings of healthy people and 20 from ill ones. The rest of 165 vectors remained the training data set. Only the cluster (X1, X10, X17 = Y, X19, X20) was informative. The in- formative features are: Fo, shimmer, DFA, spread1. Using the probability distribution P (X1, X10, X19, X20, Y ), we obtained a 93% correct classification performance for the classifier. The authors of paper [6] got 91.4 % correct classification performance using ten features and a Kernel support vector ma- chine. 4. Conclusions In this paper we approximate the multivariate probability distribution by exploiting the conditional independencies between the features. The construc- tion is based on the fitting of a t-cherry junction tree to the training data set. The advantage of the method presented is that it makes possible the selection of informative features. Then we define a classifier which uses just the infor- mative features. This way we can reduce the dimensionality of the problem and get a good generalization. The numerical results presented confirm the efficiency of our approach. References [1] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley Interscience, New York, (1991) 43 Tamás Szántai and Edith Kovács - Application of t-cherry junction trees [2] J. R. Duffy ”Motor Speech Disorders: Substrates Differential Diagnosis, and Management , 2nd ed. St Louis, MO: Elsevier, 2005. [3] A. K. Ho, R. Iansek, C. Marigliani, J.L. Bradshaw, and S. Gates, Speech impairment in a large sample of patients with Parkinson’s disease, Behav. Neurol., 11, (1998), 131-137. [4] E. Kovács and T. Szántai, On the Approximation of Discrete Multivariate Probability Distribution using the new concept of t-cherry junction tree, Lecture Notes in Economics and Mathematical Systems, 633, Proceedings of the IFIP/IIASA/GAMM Workshop on Copyng with Uncertainty, Ro- bust Solutions, 10-12 December, 2007, IIASA, Laxenburg, Springer Verlag, Heidelberg, 2010, 39–56. [5] E. Kovács and T. Szántai, Some improvements of t-cherry Junction Trees, in: 2008 First International Conference on Complexity and Intelligence of the Artificial and Natural Complex Systems. Medical Applications of the Complex Systems. Biomedical Computing, CANS 2008, Marosvásárhely, Romania, 2008.11.08-2008.11.10, eds. Iantovics B., Enachescu C., Filip F. G., IEEE Computer Society, Los Alamitos, 2009, 117-129. [6] M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman, L. O. Ramig, Suitability of Dysphonia Measurements for Telemonitoring of Parkinson’s Disease, IEEE Transactions on Biomedical Engineering, 56, No. 4, April, (2009), 1015-1022. [7] J.A. Logemann, H.B. Fisher, B. Boshes, E. R. Blonsky, Frequency and co-ocurrence of vocal-tract disfunctions in speech of a large sample of patients with Parkinson’s disease, J. Speech, Hear. Disord., 43, (1978) 47-57. [8] T. Szántai and E. Kovács, Hypergraphs as Means of Discovering the Dependence Structure of a Discrete Multivariate Probability Distribu- tion, Proc. Conference APplied mathematical programming and MOD- elling (APMOD), 2008, Bratislava, 27-31 May, 2008, Annals of Opera- tions Research, to appear. 44 Tamás Szántai and Edith Kovács - Application of t-cherry junction trees Tamás Szántai Institute of Mathematics Budapest University of Technology and Economics Műegyetem rkp. 3, Budapest, 1111 HUNGARY email:szantai@math.bme.hu Edith Kovács Department of Methodology Budapest College of Management Villányi út 11-13, Budapest, 1114 HUNGARY email:kovacs.edith@avf.hu 45