 Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 68 Spatial and Spectral Nonparametric Linear Feature Extrac- tion Method for Hyperspectral Image Classification Jinn-Min Yang * , Shih-Hsuan Wei Department of Mathematics Education, National Taichung University of Education, Taichung, Taiwan . Received 02 May 2016; received in revised form 21 June 2016; accept ed 21 June 2016 Abstract Feature extraction (FE) or dimensionality re- duction (DR) plays quite an important role in the field of pattern recognition. Feature extraction aims to reduce the d imens iona lity of the high-d imens iona l dataset to enhanc e the clas- sification accuracy and foster the classification speed, particularly when the training sample size is small, namely the small sample size (SSS) problem. Remotely sensed hyperspectral images (HSIs) are often with hundreds of measured features (bands) which potentially provides more accurate and de- tailed information for classification, but it generally needs more samples to estimate parameters to achieve a satisfactory result. The cost of collect ing ground -t ruth o f re mote ly sensed hyperspectral scene can be considerably difficult and expensive. Therefore, FE techniques have been an important part for hyperspectral image classification. Unlike lots of feature extraction methods are based only on the spectral (band) information of the training sam- ples, some feature extraction methods integrating both spatial and spectral information of training samples show more effective results in recent years. Spatial contexture information has been proven to be useful to improve the HSI data representation and to increase classification accuracy. In this paper, we propose a spatial and spectral nonparametric linear feature extraction method for hyperspectral image classification. The spatial and spectral in- formation is extracted for each training sample and used to design the within-class and between-class scatter matrices for constructing the feature extrac- tion model. The experimental results on one benchmark hyperspectral image demonstrate that the proposed method obtains stable and satisfactory results than some existing spectral-based feature extraction. Keywords : feature extraction, hyperspectral image, dimensionality reduction, classification, small sample size problem 1. Introduction Re motely sensed hyperspectral images (HSIs) are often with hundreds of measured features (spectral bands) potentially provides more accurate and detailed informat ion for classification and widely used in environmental mapping, geologica l research, and minera l identification in recent years. The cost of co l- lecting ground-truth of re motely sensed hyper- spectral image can be considerably d if f ic u lt a n d e xp e n s iv e . T h e r e fo re , F E techniques have been an important part for hyperspectral image classification. In general feature e xtraction (FE) or dimen- sionality reduction (DR) plays quite an im- portant role in the field of pattern recognition. Linear discriminant analysis (LDA) [1] is one of the most well-known linear feature e xtraction methods and has been successfully applied to many fie lds. The purpose of LDA is to find a linear t ransformat ion matrix that can be used to project data fro m a h igh-dimensional space into a lo w-dimensional subspace to mitigate the so-called curse of dimensionality [2], [3] or the Hughes phenomenon [4], [5]. The Hughes phenomenon describes that the ratio of the number of train ing samples and the number of features must be ma intained at or above some minimu m va lue to achieve statistical confidence [5]. Otherwise, the classification accuracy will decline with an increase in the dimensionality of data to some e xtent. However, it is not necessary to have sufficient training samp les to keep the ratio in a high-dimensional classification task. The re fo re , by featu re e xt ract ion, the ra t io c an b e re la t iv e ly en la rge d and t he cu rs e o f dimensionality can therefore be improved, this will result in an enhancement of c lassification accuracy. Meanwhile, the co mputational time can be reduced as well. * Corresponding aut hor, Email: jinnminyang@mail.ntcu.edu.t w Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 69 Copyright © TAETI Ba s ic a lly , LDA h as t h ree inhe rent defi- ciencies in dealing with classificat ion proble ms. First, LDA is only well-suited for normally distributed data [1]. If the distributions are sig- nificantly non-normal, the use of LDA cannot be e xpected to accurately indicate which features should be e xtracted to preserve co mple x stru c- tures needed for classification. Second, since the rank of between-class scatter matrix is the number o f c lasses minus one [1], the nu mber of features can be extracted at most rema ins the same. Third, the singularity proble m arises when dealing with high-dimensional and small samp le size (SSS) data [1, 3, 6-7]. Nonparametric linear discriminant analysis such as nonparametric discriminant analysis (NDA) [1], nonparametric weighted feature e xtract ion (NWFE) [6] and cosine-based feature extraction (CNFE) [7] provide solutions for circu mventing the previ- ously mentioned problems. The afore mentioned FE methods are spec- tral-based algorith ms ; in other words, they measure similarity in the spectral space. Using only spectral informat ion to classification tasks is insufficient. Spatial conte xture info rmation has been proven to be useful to improve the classification of HSI data in recent years [8]-[9]. In the paper, a nonparametric feature e xtraction method, integrating both spectral and spatial information, is proposed. The rest of this paper is organized as fo llows. The proposed method and its e xperiment a re described in Section 2. The e xperimental results and discussion are provided in Section 3. Finally, Section 4 gives some conclusions of the paper. 2. Method The goal of FE is to find a transformation matrix A which ma xi mi zes th e c lass separab ility in the transformed space, where and denote the within-class and between-class scatter matrices, respectively. That is (1) The ma ximizat ion of (1) is equivalent to solving the generalized eigenvalue decomposition problem where denotes the dimensionality of the trans- formed space, represent the eigen-pair of , and . Thus, the trans- formation matrix A = [v1 , … , v𝑃 ] can be obtained. The proposed spatial and spectral feature e x- traction method includes two parts, one idea is to incorporate the spatial information into the within -class and scatter matrix design, and the other idea is to incorporate another scatter ma - trix to regularize the within -class scatter matrix. 2.1. The Nonparametric Linear Discriminate Analysis The within-c lass matrix and between-class scatter matrix o f the nonpara metric linear dis- criminant feature e xtract ion (NLDA) are defined as follows, respectively. (2) (3) where and denote the local mean of tra ining samp le corresponding to the th class and th class, respectively. The local mean of is computed by its -nearest neighbors ( NNs) in the sa me c lass or in the different classes as shown in (4). (4) 2.2. The Spatial and Spectral Nonparametric Linear Discriminate Analysis The within-c lass matrix and between-class scatter matrix o f the spatial and spectral n o n - p a ra me t ric lin e a r d isc rimin a n t fe atu re ex- traction (SSNLDA) are defined as follows, re - spectively.             1 1 i T L S S i i I iw i N i i iS x x x xP M M       (5)             1 1 1 iNL L T S S i i i i i j jb i j j i S x x x xP M M         (6) where and denote the local mean of tra ining samp le corresponding to the th class and th class, respectively. The local mean of is co mputed by utilizing the Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 70 Copyright © TAETI spectral we ighted local mean and spa- tial we ighted local mean , as shown in (7). (7) where (8) with (9) and (10) with (11) and denotes the coordinate of training sample , the parameter . Another part of is the distance scatter matrix introduce in [8]. Based on a window, a training samp le and its pixe l neighbors form a local patch , where the odd number is the width of the neighborhood window. The scatter matrix and (12) whe re s = 𝑧2 − 1 , and . Pa ra met er re- flects the degree of filtering. Regularization is emp loyed to improve the singularity problem in SSNLDA. The within-class scatter matrix is replaced by (13) where denotes the diagonal parts of a matrix and . 2.3. Dataset The Indian Pines image, mounted fro m an aircra ft flo wn at 65000-ft a ltitude and operated by the NASA/Jet Propulsion Laboratory, with the size of 145 × 145 pixels has 220 spectral bands measuring appro ximately 20 m across the ground. We have also reduced th e number of bands to 200 by re moving bands covering the region of water absorption: 104-108, 150-163, and 220. There are 16 c lasses in the data set. The total number of samples is 10249, ranging fro m 20 to 2455 in each c lass. In figure 1, the left and right images depict the false color co mposition of three sample bands 50, 27 and 17 and its ground truth of the Indian Pines dataset, respec- tively. Fig. 1 The left figure depicts Indian Pines image of band 50, 27 and 17; the right one shows its ground truth. 2.4. Experiment Design Three diffe rent cases, each class with 5 (case I), 10 (case II), and 20 (case III) tra ining samp les are investigated to discover the effect on the sizes of training samples in the experiments. The re main ing samp les are e mp loyed as the test samples. The cases I and II are the so-called il l-p o se d a nd po o rly pos e d c lass ific a t io n problems [7], respectively. They are challenging cases in the fie ld of pattern recognition. In each case, the training and testing datasets are ran- domly selected. We will repeat each case for 10 times and report the averaged overall accuracy (OA) and standard deviation. Two other linear feature e xtraction methods, CNFE and NWFE, a re utilized to compare the classification performance with the proposed SSNLDA. The 1-nearest neighbor (1NN) c las- sifier is emp loyed. In SSNLDA, we adopt a window to form a local patch and the val- ues of ,  and  are set as 0.5. Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 71 Copyright © TAETI Table 1 Graph representations class # pixels class name 1 46 Alfalfa 2 1428 Corn-notill 3 830 Corn-mintill 4 237 Corn 5 483 Grass -pasture 6 730 Grass -trees 7 28 Grass -pasture-mowed 8 478 Hay-windrowed 9 20 Oats 10 972 Soybean-notill 11 2455 Soybean-mintill 12 593 Soybean-clean 13 205 Wheat 14 1265 Woods 15 386 Buildings-Grass-Trees -Drives 16 93 Stone-Steel-Towers 3. Results and Discussion Table 2 lists the best classification accuracies of the three cases of the Indian Pines dataset. As we can see from the table, 1NN classifier with SSNLDA features can achieve better results than with CNFE and NWFE features. SSNLDA pro- vides about 6% improvement as compared with the other two methods. Meanwhile, the standard devia- tions is smaller as well. Fig. 2 demonstrates the variations of OAs with the reduced dimensions where 5, 10 and 20 training samples are utilized. The proposed SSNLDA significantly outperform the other two methods. Fig. 3 shows the classifica- tion map of the Indian Pines scene using 1NN clas- sifier for 20 training samples case. The dimension- ality of the reduced space is 30. As shown in Fig. 3, the 1NN classifier with SSNLDA feature can get better results. Table 2 Classification accuracies (in percent) in Indian Pines scene Case FE OA±std (#features) NWFE 63.03±4.68(28) CNFE 65.10±4.91(15) SSNLDA 71.05±3.28(28) NWFE 72.82±2.28(29) CNFE 75.67±1.77(29) SSNLDA 81.79±1.81(30) NWFE 80.15±1.76(28) CNFE 81.93±2.17(28) SSNLDA 88.08±1.21(30) Fig. 2 Classification results on the Indian Pines dataset for the three feature extraction method. (a) NWFE (b) CNFE (c) SSNLDA (d) Ground-truth Fig. 3 Classification maps of the Indian Pines scene using 1NN classifier for 20 training samples case. 4. Conclusions In this paper, a spectral and spatial infor- mation-based nonparametric feature extraction SSNLDA is proposed. From the above results, we find SSNLDA can achieve more stable and effec- tive results. In most of cases, 1NN classifier with SSNLDA features can obtain better results than other spectral-based FE, CNFE and NLDA, partic- ularly when the training sample size is quite small. Acknowledgement This study is supported by Ministry of Sci- ence and Technology, R.O.C., under the contract number of MOST 104-2221-E-142- 005. References [1] K. Fukunaga, Introduction to statistical pattern recognition, 2nd ed., New Yo rk: Academic Press , 1990. Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 72 Copyright © TAETI [2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed., New Yo rk: John Wiley & Sons , 2001. [3] S. J. Raudys and A. K. Ja in, “ Sma ll sa mple size effects in statistical pattern recognition: recommendations for practitioners ,” IEEE Transaction on Pattern Analysis and M a- chine Intelligence, vol. 13 no. 3, pp. 252-264, 1991. [4] D. A. Landgrebe, Signal theory methods in Multispectral Re mote Sensing, Ne w Jersey: John Wiley and Sons, 2003. [5] P. K. Varshney and M. K Arora, Advanced image processing techniques for remotely sensed hyperspectral data, New Yo rk: Springer, 2004. [6] B. C. Kuo and D. A. Landgrebe, “Non- parametric we ighted feature e xtract ion for classification,” IEEE Transaction on Geo- science and Re mote Sensing, vol. 42, no. 5, pp. 1096-1105, 2004. [7] J. M. Yang, P. T. Yu, and B. C. Kuo, “A nonparametric feature extraction and its ap- plication to nearest neighbor classification for hyperspectral image data,” IEEE Trans- actions on Geoscience and Remote Sensing, vol. 48, no. 3, pp. 1279-1293, 2010. [8] Y. Zhou, J. Peng, and C. L. Ph ilip Chen, “Dimension reduction using s patial and spectral regularized local discriminant e m- bedding for hyperspectral image classifica- tion,” IEEE T ransactions on Geoscience and Re mote Sensing, vol. 53, no. 2, pp. 1082-1095, 2015. [9] H. Pu, Z. Chen, B. Wang, and G. M . Jiang, “A novel spatial–spectral similarity meas- ure for dimensionality reduction and class i- fication of hyperspectral imagery,” IEEE Transactions on Geoscience and Re mote Sensing, vol. 52, no. 11, pp. 7008-7022, 2014. [10] Hyperspectral re mote sensing scenes, [Online]. Available: http://www.ehu.eus/ccwintco/index.php?titl e=Hyperspectral_Re mote_Sensing_Scenes. [Accessed 24 May 2016]