AP07_4-5.vp 1 Introduction Cancer is the second leading cause of death in industrial countries. So as to have the best possible chance of cure, can- cer has to be detected and treated as early as possible. As cancer starts within a single cell, many types of cancer can be detected very early from already marginal changes within single cells using cytopathological methods. These cell speci- mens can be obtained easily and painlessly, e.g. with tiny brushes, from the oral mucosa. One cytopathological diagnostic method is DNA image cytometry (DNA-ICM). For this the cells are stained stoichio- metrically according to Feulgen to visualize the DNA content within the nuclei. Images of the nuclei are captured with a camera mounted on a brightfield light microscope. Since DNA-ICM needs the DNA value of each cell, the in- tegral optical densities of the nuclei are computed. To obtain the DNA value from these integral optical densities, a set of about normal, non-proliferating cells, called reference cells, has to be selected. Then the DNA content of each nucleus is com- puted from the ratio of its integral optical density to the mean integral optical density of the reference cells. Subsequently the diagnosis is performed based on the histogram of the DNA values of diagnostically relevant, suspicious analysis cells. For oral mucosa these are squamous epithelial cells with noticeably altered morphology and chromatin texture, i.e., amount and distribution of DNA, of their nuclei, cells that therefore are suspicious of cancer. According to the European guidelines for DNA-ICM [1], a cytopathologist has to find and select manually about 30 ref- 86 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 47 No. 4–5/2007 Automated Classification of Analysis- and Reference Cells for Cancer Diagnostics in Microscopic Images of Epithelial Cells from the Oral Mucosa T. E. Schneider To get the best possible chance of healing, cancer has to be detected as early as possible. As cancer starts within a single cell, cytopathological methods offer the possibility of early detection. One such method is standardized DNA image cytometry. For this, the diagnostically relevant cells have to be found within each specimen, which is currently performed manually. Since this is a time-consuming process, a preselection of diagnostically relevant cells has to be performed automatically. For specimens of the oral mucosa this involves distinguishing between undoubted healthy epithelial cells and possibly cancerous epithelial cells. Based on cell images from a brightfield light microscope, a set of morphological and textural features was implemented. To identify highly distinctive feature subsets the sequential forward floating search method is used. For these feature sets k-nearest neighbor and fuzzy k-nearest neighbor classifiers as well as support vector machines were trained. On a validation set of cells it could be shown that normal and possibly cancerous cells can be distinguished at overall rates above 95.5 % for different classifiers, enabling us to choose the support vector machine with a set of two features only as the classifier with the lowest computational costs. Keywords: cells, cytopathology, feature extraction, classification, fuzzy k-nearest neighbor, support vector machine. Fig. 1: Two examples of Feulgen stained squamous epithelial cells. On the left a normal cell is shown, on the right a morphologically abnormal analysis cell. Since Feulgen stains the DNA stoichiometrically, the DNA within the nuclei is stained, whereas the sur- rounding cytoplasm is not. erence cells and about 300 analysis cells. This is a time-con- suming task, which the expert should therefore be assisted by automated preselection of cells. A machine for fully auto- mated screening of cervical smears using DNA-ICM is already available. This machine searches for cells with an abnormally high DNA content, by measuring the DNA content of all existing cells [2]. But as these cells are rare and cancer starts already with small changes of DNA content, the same high sensitivity and specificity as with the standardized interactive DNA-ICM [1] cannot be achieved. The approach in this paper therefore aims at an algorith- mic implementation of such expert knowledge for specimens of the oral mucosa, i.e., automatic discrimination of normal epithelial nuclei as reference cells and non healthy epithelial nuclei as analysis cells. The paper is organized as follows: in section 2 the data- base of cells from oral smears and the imaging modality is described. Features characterizing the properties of epithelial reference and analysis cells, the process of reducing the whole feature set to a subset of features with optimal discriminating power and different classification algorithms to classify the cells automatically is presented in section 3. The results of these algorithms computed on the dataset are presented in section 4, showing that epithelial reference and analysis cells from 15 specimens of the oral mucosa can be discriminated with different classification methods at total classification rates of above 95.5 %. The paper ends by analyzing the classi- fication results and suggesting topics for future study. 2 Material Our dataset is based on 15 Feulgen stained smears from the oral mucosa. Seven of these specimens are without cancer cells, including three with inflammation. Five specimens con- tain cancer cells. From these specimens images have been acquired with a brightfield light microscope and a 63× oil immersion objective (NA 1.32) and a three-chip CCD camera with a resulting resolution of � x � 01. �m. An experienced cytopathologist classified 950 reference cells within the specimens without cancer cells, and 748 analy- sis cells within the specimens with cancer cells. Fig. 1 shows two example cells. For these cells, the contours of the nuclei are given as chaincodes. 3 Methods To be able to distinguish automatically between analysis and reference squamous epithelial cells, a set of potentially relevant features has been implemented. To miminize the risk of overfitting during training of a classifier, a feature selection method is performed that can be combined with different ob- jective functions to select good discriminative feature subsets. To solve the classification task, different classification algo- rithms can be chosen. Each of these has to be trained on a training set of cells, whereas the classification rate is calcu- lated on an independent cell set (validation set). The process of feature selection and classifier generation is sketched in Fig. 2, and the algorithms used within each step are described below. The distance measure used here within each step is the Euclidean distance. 3.1 Features When performing a manual discrimination between anal- ysis and reference cells, cytopathologists consider geometrical criteria like area of the nuclei, shape, as well as textural char- acteristics of the chromatin pattern and the overall amount of stain related to the size of a nucleus. To cover these criteria, several morphological features are used, as described in [3] and [4]. These comprise area, perim- eter, form factor, Fourier descriptor energies, and, further on, translation, rotation and scale invariant features. The latter are computed as combinations of the central 2D polynomial moments for the nuclear mask. As textural features, these moment-based features are cal- culated for a density estimation of the chromatin (extinction image) and an edge image of the chromatin distribution. The extinction image is calculated on the green channel of the original RGB image, and the edge image is the differ- ence of the extinction image and its median filtered version. These features are supplemented by histogram features of the topological gradient [3] and particle-oriented features homo- geneity, granularity and distribution of the chromatin [5]. See Fig. 3 for examples of the image transformations. This sums © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 87 Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 2: Algorithmic workflow of feature subset selection and clas- sifier generation. The different sized feature subsets resulting from the searching method are denoted as set_1 to set_p, the output of classifier training and training set classification as p classification rates. up to 203 features describing the morphology of the nuclei and their chromatin texture. 3.2 Feature selection Testing the separability performance of each possible fea- ture subset is computationally too expensive for large feature sets. Therefore the parameter-free sequential forward float- ing selection search method [6] is implemented to identify feature subsets with a good separability performance for our classification task. To rate the separability performance, three different objective functions can be chosen. Based on the as- sumption of normally distributed data these are Fisher’s crite- rion [7], and the Bhattacharyya distance [7], which is adapted to the special case of two classes only. Using the likelihood of feature vectors instead of a density model, mutual informa- tion [8] can be selected as the third objective function. 3.3 Classifier To carry out the classification task, three classification al- gorithms are applicable. These have been selected due to the reproducibility of the classification results, as well as the possi- bly changing behavior of the distribution of the data during feature selection. For the different subsets of features during this selection the distribution of the data may fit different dis- tribution models, including non-Gaussian, multi-modal ones. Furthermore, for higher dimensions of the feature space, the data is in general sparsely distributed and less compact, which leads to the use of non-parametric classifiers to be more general. As non-parametric classifiers that provide compre- hensible decisions, the kNN algorithm and the Fuzzy-kNN are chosen and, additionally, support vector machines (SVM), which are known to provide a good generalization capability. A kNN version is trained that makes its decision as soon as k neighbors belong to the same class i. The Fuzzy-kNN is 88 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 47 No. 4–5/2007 Fig. 3: Image transformations needed to compute the features. From left to right: Within the upper row the original RGB image, its green channel, the extinction image computed from the green channel and the edge image are displayed. Based on the extinc- tion image the lower row contains the watershed regions computed for local maxima and for local minima, each filled with the gray value of the local optima, the topological gradient as the difference of the watershed regions and a three-color image to par- tition the chromatin into darker and brighter particles basic feature set classification (a;r;o) objective function number of features number of neighbors morphology 75.5; 92.3; 85.2 B 14 3 chromatin green 92.5; 95.5; 94.1 B 14 1 chromatin extinction 90.2; 96.5; 93.8 B 19 3 chromatin+moments 90.7; 98.3; 95.1 m 2 4 all features 90.7; 98.3; 95.1 m 2 4 Table 1: Results of the feature selection for each basic feature set, selected due to the best overall classification result. The classification results (a; r; o) in percentages for the (a)nalysis, the (r)eference cells, and the (o)verall rate are given for each of the basic sets. Additionally the used objective function ((m)utual information, (F)ishers’ criterion, or (B)hattacharyya distance), the number of features and the number of neighbors of the kNN are noted. The computed optimal feature sets of chromatin�moments and all features are identical. implemented according to [9]. For this the membership to each class is computed, taking the influence of a nearest neighbor into account using the distance to the sample. This distance can moreover be weighted. For this we used the proposed version of assigning complete membership of the neighbors into their own class and nonmembership in all other classes. The second version of weighting the influence of the neighbours based on their distances to the class means assumes a unimodal distribution of the classes with equal variances, which turned out to be too restrictive [10–11]. Since the third proposed weighting increases the computational costs through a search for nearest neighbors for each nearest neighbor of a sample, it is excluded. For the SVM algorithm [12] the Spider toolbox for Matlab is applied, interfacing the libSVM. The rbf kernel is chosen as kernel function, and the classifiers are trained using grid- search for the parameter search, and cross-validation on the training data. 4 Experiments and results The cell set from section 2 is split into reference cells and analysis cells for the training set and a validation set of refer- ence cells, and analysis cells. On the basis of the training set all features are normalized to the range [0, 1]. Firstly, feature selection was performed. To rate a wide range of feature combinations during the feature selection process, the features are grouped into five basic feature sets. These are morphology, chromatin green and chromatin extinction as the chromatin features computed for the green channel and the extinction image respectively. Additionally the combi- nation of the two former chromatin sets is extended through moment based features computed for the extinction and the edge texture image to chromatin � moments. And, finally, within the last basic feature set all features are included. Feature selection was performed on each of these basic feature sets up to feature set sizes of 50 features for each objec- tive function (searching method in Fig. 2). To determine the best feature set size of each objective function, kNN classifiers were trained for k from 1 to 10 (classifier within the block fea- ture selection in Fig. 2). These sets are chosen using the over- all classification rates from leave-one-out cross-validation on the training set to rate the feature subsets. Table 1 shows the results of the best sets for each basic feature set. For these five best classifiers the classification rates on the validation set were computed with kNN and Fuzzy-kNN (Table 2). It turned out that the best distinction between the two classes needs two features only. These are a moment based feature (IMTOTE in [3]) as an estimate of the integral optical density, and, as an inhomogeneity measure of the chromatin distribution, the median of the topological gradient image (RG in [3]) for the green channel. As SVMs are known to provide good generalization re- sults, two SVMs were trained on the training set. One on the whole set of 203 features (SVM203) and one for the two features from the feature selection process that showed a good discriminative power with the kNN classifiers (SVM2). Both SVMs are trained with 13-fold cross-validation and an iteratively refined grid-search for the SVM parameter width of the rbf kernel and the penalty term. The grid search started at logarithmic scale for both parameters. The classification re- sults on the validation set are comparable for both classifiers and also with the results of the kNN and the Fuzzy kNN, as shown in Table 3. A comparison of the two classifiers with the best overall classification rate, F-kNN and SVM2, shows a higher de- tection rate of the diagnostically relevant analysis cells for F-kNN, but with a lower predictive value and higher compu- tational costs compared to SVM2. Table 4 lists the number of misclassified cells and the predictive values in comparison for these two classifiers. The predictive value of each class is calcu- lated as the ratio of the number of correctly classified cells of the class to the sum of the number of correctly classified cells of this class and the number of cells falsely classified into this class. 5 Discussion For the different basic feature sets the classification results of the kNN for the best feature sets vary slightly. It can be seen that chromatin features provide a good separability perfor- mance, whereas only the geometrical features within the basic set morphology do not distinguish the two classes sufficiently. © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 89 Acta Polytechnica Vol. 47 No. 4–5/2007 basic feature set kNN (a; r; o) F-kNN-a (a; r; o) morphology 75.3; 92.5; 85.4 77.7; 88.5; 83.2 chromatin green 91.9; 97.5; 94.7 91.9; 97.5; 94.7 chromatin extinction 93.9; 95.5; 94.7 91.9; 94.5; 93.2 chromatin+moments/ all features 92.4; 98.5; 95.5 93.9; 97.5; 95.7 Table 2: Classification results of the validation set, computed for the feature sets and the number of neighbors of the clas- sifiers within Table 1. The classification results (a; r; o) are computed for the classifiers kNN and F-kNN and are given as percentages for the (a)nalysis, the (r)eference cells, and the (o)verall rate. classifiers analysis cells reference cells overall rate kNN 92.4 98.5 95.5 F-kNN 93.9 97.5 95.7 SVM 2 92.9 98.5 95.7 SVM203 93.9 97.0 95.5 Table 3: The detection rates of analysis cells, reference cells, and the overall rates of the validation set are shown in per- centages for the different classifiers. kNN, F-kNN and SVM2 use two features only, SVM203 classifies according to all features. classifier F-kNN SVM2 analysis cells 186 (198) 97.4 % 184 (198) 98.4 % reference cells 195 (200) 94.2 % 197 (200) 93.4 % Table 4: Absolute number of correctly classified cells within each class and the predictive values in percentages for F-kNN and SVM2. The total number of cells in each class is noted in brackets. To achieve the best overall classification rate, only two fea- tures are needed, which results in low computational costs. These features also are consistent with the visual criteria used by cytopathologists (section 3), which simplifies an under- standing of the computed decisions. Validating these two features on the validation set using kNN and F-kNN, as well as SVM2, results in comparable classification rates between the classifiers. These rates on the validation set are slightly better than those classification results achieved with the kNN classifier after feature selection on the training set. This is due to the distribution of the vali- dation set within the feature space. Training a SVM on all features for the possibility of detect- ing another relation between the features than was tested during feature selection did not result in better classification results. However, these results are still comparable to the clas- sifiers using two features only. The quantity of training data may still be too low for this. In consequence, to be sure not to have performed an overfitting or to have used a training set that is not representative, these classifiers have to be tested with cells from specimens that are different from the speci- mens of the training set. Furthermore, there is a persistent difference between the classification results of the reference cells and analysis cells for the kNN methods, and also for the SVMs. The major reason for this may be the different number of training cells in the classes. Therefore the analysis cells should be supplemented with new prototypes. In conclusion, it can be stated that the classification results on the 398 validation cells result in overall classification re- sults between 95.5 % and 95.7 % for the different classifiers, thus providing good separability between the analysis cells and the reference cells. While the classification results of the different classification algorithms are comparable, we can choose the final classifier according to the best classification results of either analysis cells or reference cells, or on the basis of computational costs. Acknowledgments I would like to thank Prof. Dr.-Ing. T. Aach and Prof. emer. Dr.-Ing. D. Meyer-Ebrecht, Institute of Imaging and Computer Vision, RWTH Aachen University, Germany, for many insightful discussions, and I would like to thank Prof. Dr. med. A. Böcking, Institute for Cytopathology, Heinrich- -Heine-University, Düsseldorf, Germany, for providing the dataset and for his teaching about the medical background. The project is supported by the Viktor and Mirka Pollak Fund for Biomedical Engineering. References [1] Böcking, A., et al.: Consensus Report of the ESACP Task Force on Standardization of Diagnostic DNA Im- age Cytometry. Anal. Cell Pathol., Vol. 8 (1995), No. 1, p. 67–74. [2] Sun, X. R., Wanq, J., Garner, D., Palcic, B.: Detection of Cervical Cancer and High Grade Neoplastic Lesions by a Combination of Liquid-Based Sampling Preparation and DNA Measurements Using Automated Image Cyto- metry. Cellular Oncology, Vol. 27 (2005), No. 8, p. 33–41. [3] Rodenacker, K., Bengtsson, E.: A Feature Set for Cyto- metry on Digitized Microscopic Images. Anal Cell Pathol, Vol. 25 (2003), No. 1, p. 1–36. [4] Suk, T., Flusser, J.: Graph Method for Generating Af- fine Moment Invariants. Procs ICPR, Vol. 2 (2004), p. 192–195. [5] Young, I. T., Verbeek, P. W., Mayall, B. H.: Character- ization of Chromatin Distribution in Cell Nuclei. Cyto- metry, Vol. 7 (1986), No. 5, p. 467–474. [6] Pudil, P., Novovičová, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters, Vol. 15 (1994), No. 11, p. 1119–1125. [7] Fukunaga, K.: Statistical Pattern Recognition. Academic Press, 1990. [8] Pluim, J. P. W., Maintz, J. B. A., Viergever, M. A.: Mutual Information Based Registration of Medical Images: a Survey. IEEE Trans. Med Imaging, Vol. 22 (2003), No. 8, p. 986–1004. [9] Keller, J. M., Gray, M. R., Givens, J. A.: A Fuzzy K-Near- est Neighbor algorithm. IEEE Trans. System, Man, and Cybernetics, Vol. SMC-15 (1985), No. 4, p. 580–585. [10] Schneider, T. E., Bell, A. A., Meyer-Ebrecht, D., Böcking, A., Aach, T.: Computer Aided Cytological Can- cer Diagnosis: Cell Type Classification as a Step Towards Fully Automatic Cancer Diagnostics on Cytopatholo- gical Specimens of Serous Effusions. SPIE Medical Imag- ing 2007, Computer–Aided Diagnosis, Vol. 6514 (2007), p. 6514OG-1–10. [11] Schneider, T. E., Bell, A. A., Gerberich, G., Meyer- -Ebrecht, D., Böcking, A., Aach, T.: Chromatinmuster- -basierte Zellklassifizierung für die DNS-Bildzytometrie an Mundschleimhaut-Abstrichen. Bildverarbeitung für die Medizin 2007, 2007, p. 257–261. [12] Vapnik, V. N.: The Nature of Statistical Learning Theory. Springer, 1995. Timna Esther Schneider e-mail: Timna.Schneider@lfb.rwth-aachen.de Institute of Imaging and Computer Vision RWTH Aachen University Sommerfeldstrasse 24 52074 Aachen, Germany 90 © Czech Technical University Publishing House http://ctn.cvut.cz/ap/ Acta Polytechnica Vol. 47 No. 4–5/2007