


Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 

68 

Spatial and Spectral Nonparametric Linear Feature Extrac-

tion Method for Hyperspectral Image Classification 

Jinn-Min Yang
*
, Shih-Hsuan Wei 

Department of Mathematics Education, National Taichung University of Education, Taichung, Taiwan . 

Received 02 May 2016; received in revised form 21 June 2016; accept ed 21 June 2016 

 
Abstract 

Feature extraction (FE) or dimensionality re-

duction (DR) plays quite an important role in the 

field of pattern recognition. Feature extraction aims 

to reduce the  d imens iona lity of the  

high-d imens iona l dataset to  enhanc e  the clas-

sification accuracy and foster the classification 

speed, particularly when the training sample size is 

small, namely the small sample size (SSS) problem. 

Remotely sensed hyperspectral images (HSIs) are 

often with hundreds  of measured features (bands) 

which potentially provides more accurate and de-

tailed information for classification, but it generally 

needs more samples to estimate parameters to 

achieve a satisfactory result. The cost of collect ing  

ground -t ruth  o f re mote ly sensed hyperspectral 

scene can be considerably difficult and expensive. 

Therefore, FE techniques have been an important 

part for hyperspectral image classification. Unlike 

lots of feature extraction methods are based only on 

the spectral (band) information of the training sam-

ples, some feature extraction methods integrating 

both spatial and spectral information of training 

samples show more effective results in recent years. 

Spatial contexture information has been proven to 

be useful to improve the HSI data representation 

and to increase classification accuracy. In this paper, 

we propose a spatial and spectral nonparametric 

linear feature extraction method for hyperspectral 

image classification. The spatial and spectral in-

formation is extracted for each training sample and 

used to design the within-class and between-class 

scatter matrices for constructing the feature extrac-

tion model. The experimental results on one 

benchmark hyperspectral image demonstrate that 

the proposed method obtains stable and satisfactory 

results than some existing spectral-based feature 

extraction. 

Keywords : feature extraction, hyperspectral image, 

dimensionality reduction, classification, 

small sample size problem 

1. Introduction 

Re motely sensed hyperspectral images 
(HSIs) are often with hundreds  of measured 

features (spectral bands) potentially  provides 

more  accurate and detailed informat ion for 

classification and widely used in environmental 

mapping, geologica l research, and minera l 

identification in recent years. The cost of co l-

lecting ground-truth of re motely sensed hyper-

spectral image can be considerably d if f ic u lt  

a n d  e xp e n s iv e .  T h e r e fo re ,  F E  techniques 

have been an important part for hyperspectral 

image classification. 

In general feature e xtraction (FE) or dimen-

sionality reduction (DR) plays quite an im-

portant role in the field of pattern recognition. 

Linear discriminant analysis (LDA) [1] is one of 

the most well-known linear feature e xtraction  

methods and has been successfully applied to 

many fie lds. The purpose of LDA is to find  a  

linear t ransformat ion matrix that can be used to 

project data fro m a h igh-dimensional space into 

a lo w-dimensional subspace to mitigate the  

so-called curse of dimensionality [2], [3] or the  

Hughes phenomenon [4], [5]. The Hughes 

phenomenon describes that the ratio of the  

number of train ing samples and the number of 

features must be ma intained at or above some  

minimu m va lue to achieve statistical confidence 

[5]. Otherwise, the classification accuracy will 

decline with an increase in the dimensionality of 

data to some e xtent. However, it is not necessary 

to have sufficient training samp les to keep the 

ratio in a  high-dimensional classification task. 

The re fo re , by  featu re  e xt ract ion, the  ra t io  c an  

b e  re la t iv e ly en la rge d and  t he  cu rs e o f 

dimensionality can therefore be improved, this 

will result in an enhancement of c lassification  

accuracy. Meanwhile, the co mputational time  

can be reduced as well. 

* Corresponding aut hor, 
Email: jinnminyang@mail.ntcu.edu.t w 


Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 

69 Copyright ©  TAETI 

Ba s ic a lly , LDA  h as t h ree  inhe rent defi-

ciencies in  dealing with classificat ion proble ms. 

First, LDA  is only well-suited for normally  

distributed data [1]. If the distributions are sig-

nificantly non-normal, the use of LDA cannot be 

e xpected to accurately indicate which features 

should be e xtracted to  preserve co mple x stru c-

tures needed for classification. Second, since the 

rank of between-class scatter matrix is the 

number o f c lasses minus one [1], the nu mber of 

features can be extracted at most rema ins the 

same. Third, the singularity proble m arises when 

dealing with high-dimensional and small samp le  

size (SSS) data [1, 3, 6-7]. Nonparametric linear 

discriminant analysis such as nonparametric  

discriminant analysis (NDA) [1], nonparametric  

weighted feature e xtract ion (NWFE) [6] and  

cosine-based feature extraction (CNFE) [7] 

provide solutions for circu mventing the previ-

ously mentioned problems.  

The afore mentioned FE methods are spec-

tral-based algorith ms ; in other words, they 

measure similarity in the spectral space. Using 

only spectral informat ion to classification tasks 

is insufficient. Spatial conte xture info rmation  

has  been proven to be useful to improve the 

classification of HSI data in recent years  [8]-[9]. 

In the paper, a nonparametric feature e xtraction  

method, integrating both spectral and spatial 

information, is proposed.  

The rest of this paper is organized as fo llows. 

The proposed method and its e xperiment a re  

described in Section 2. The e xperimental results 

and discussion are provided in Section 3. Finally, 

Section 4 gives some conclusions of the paper. 

2. Method 

The goal of FE is to find a transformation matrix 

A which ma xi mi zes th e c lass separab ility 

  in the transformed space, where 

and denote the within-class and between-class 

scatter matrices, respectively. That is 

 
(1) 

The ma ximizat ion of (1) is equivalent to 

solving the generalized eigenvalue decomposition 

problem 

 
where denotes the dimensionality of the trans-

formed space, represent the eigen-pair of 

 , and . Thus, the trans-

formation matrix A = [v1 , … , v𝑃 ] can be obtained. 
The proposed spatial and spectral feature e x-

traction method includes two parts, one  idea  is to  

incorporate the spatial information into the 

within -class and scatter matrix design, and the 

other idea is to incorporate another scatter ma -

trix to regularize the within -class scatter matrix.

 
2.1. The Nonparametric Linear Discriminate 

Analysis  

The within-c lass matrix and between-class 

scatter matrix o f the nonpara metric  linear dis-

criminant feature  e xtract ion (NLDA) are  defined  

as follows, respectively. 

 
(2) 

 
(3) 

where  and denote the local 

mean of tra ining samp le  corresponding to 

the th class and th class, respectively. The  

local mean of  is computed by its -nearest 

neighbors  ( NNs) in the sa me c lass or in the  

different classes as shown in (4). 

 (4) 

2.2. The Spatial and Spectral Nonparametric 

Linear Discriminate Analysis  

The within-c lass matrix and between-class 

scatter matrix o f the spatial and spectral n o n -

p a ra me t ric lin e a r d isc rimin a n t fe atu re  ex-

traction (SSNLDA) are defined as follows, re -

spectively.  

            
1 1

 
i

T
L

S S i i

I iw
i

N i i
iS x x x xP M M

 

   
 

(5) 

            
1 1 1

 
iNL L T

S S i i i i

i j jb
i j

j i

S x x x xP M M
  



   
 

(6) 

where  and denote the local 

mean of tra ining samp le  corresponding to 

the th class and th class, respectively. The  

local mean of  is co mputed by utilizing the 


Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 

70 Copyright ©  TAETI 

spectral we ighted local mean  and spa-

tial we ighted local mean , as shown in 

(7). 

 
(7) 

where 

 (8) 

with 

 
(9) 

and  

 (10) 

with 

 
(11) 

and  denotes the coordinate of training 

sample , the parameter  . 

Another part of is the distance scatter matrix 

introduce in [8]. Based on a  window, a  

training samp le  and its pixe l neighbors form 

a local patch , where the odd number  is 

the  width of the neighborhood window. The  

scatter matrix  and  

 
(12) 

whe re s = 𝑧2 − 1  ,  and  

. Pa ra met er  re-

flects the degree of filtering. 

Regularization is emp loyed to improve the 

singularity problem in SSNLDA. The within-class 

scatter matrix is replaced by 

 
(13) 

where  denotes the diagonal parts of a  

matrix  and  . 

2.3. Dataset 

The Indian Pines image, mounted fro m an  

aircra ft flo wn at 65000-ft a ltitude and operated 

by the NASA/Jet Propulsion Laboratory, with  

the size of 145 ×  145 pixels has 220 spectral 

bands measuring appro ximately 20 m across the 

ground. We have also reduced th e number of 

bands to 200 by re moving bands covering the 

region of water absorption: 104-108, 150-163, 

and 220. There are 16 c lasses in the data set. The 

total number of samples is 10249, ranging fro m 

20 to 2455 in each c lass. In figure 1, the left and  

right images  depict the false color co mposition 

of three sample bands 50, 27 and 17 and its 

ground truth of the Indian Pines dataset, respec-

tively. 

 
Fig. 1 The left figure depicts Indian Pines image  

of band 50, 27 and 17; the right one shows 

its ground truth. 

2.4. Experiment Design 

Three diffe rent cases, each class with 5 (case 

I), 10 (case II), and 20 (case III) tra ining samp les 

are investigated to discover the effect on the 

sizes of training samples in the experiments. The 

re main ing samp les are e mp loyed as the test 

samples. The cases I and II are the so-called  

il l-p o se d  a nd  po o rly  pos e d  c lass ific a t io n 

problems [7], respectively. They are challenging  

cases in the fie ld  of pattern recognition. In each  

case, the training and testing datasets are ran-

domly  selected. We will repeat each case for 10 

times and report the  averaged overall accuracy  

(OA) and standard deviation. 

Two other linear feature e xtraction methods, 

CNFE and NWFE, a re utilized to compare the 

classification performance with the proposed 

SSNLDA. The 1-nearest neighbor (1NN) c las-

sifier is  emp loyed. In SSNLDA, we adopt a 

 window to form a local patch and the val-

ues of ,   and  are set as 0.5. 


Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 

71 Copyright ©  TAETI 

Table 1 Graph representations  

class  # pixels  class name 

1 46 Alfalfa 

2 1428 Corn-notill 

3 830 Corn-mintill 

4 237 Corn 

5 483 Grass -pasture 

6 730 Grass -trees  

7 28 Grass -pasture-mowed 

8 478 Hay-windrowed 

9 20 Oats  

10 972 Soybean-notill 

11 2455 Soybean-mintill 

12 593 Soybean-clean 

13 205 Wheat 

14 1265 Woods  

15 386 Buildings-Grass-Trees -Drives 

16 93 Stone-Steel-Towers  

3. Results and Discussion 

Table 2 lists  the best classification accuracies of 

the three cases of the Indian Pines dataset. As we 

can see from the table, 1NN classifier with 

SSNLDA features can achieve better results than 

with CNFE and NWFE features. SSNLDA pro-

vides about 6% improvement as compared with the 

other two methods. Meanwhile, the standard devia-

tions is smaller as well. Fig. 2 demonstrates the 

variations of OAs with the reduced dimensions 

where 5, 10 and 20 training samples are utilized. 

The proposed SSNLDA significantly outperform 

the other two methods. Fig. 3 shows  the classifica-

tion map of the Indian Pines scene using 1NN clas-

sifier for 20 training samples case. The dimension-

ality of the reduced space is 30. As shown in Fig. 3, 

the 1NN classifier with SSNLDA feature can get 

better results. 

Table 2 Classification accuracies (in percent) in 

Indian Pines scene 

Case FE 
OA±std 

(#features) 

 
NWFE 63.03±4.68(28) 

CNFE 65.10±4.91(15) 

SSNLDA 71.05±3.28(28) 

 
NWFE 72.82±2.28(29) 

CNFE 75.67±1.77(29) 

SSNLDA 81.79±1.81(30) 

 
NWFE 80.15±1.76(28) 

CNFE 81.93±2.17(28) 

SSNLDA 88.08±1.21(30) 

 
Fig. 2 Classification results on the Indian Pines 

dataset for the three feature extraction  

method. 

 
(a) NWFE 

 
(b) CNFE 

 
(c) SSNLDA 

 
(d) Ground-truth 

Fig. 3 Classification maps of the Indian Pines 

scene using 1NN classifier for 20 training 

samples case.  

4. Conclusions 

In this paper, a spectral and spatial infor-

mation-based nonparametric feature extraction 

SSNLDA is proposed. From the above results, we 

find SSNLDA can achieve more stable and effec-

tive results. In most of cases, 1NN classifier with 

SSNLDA features can obtain better results than 

other spectral-based FE, CNFE and NLDA, partic-

ularly when the training sample size is quite small.  

Acknowledgement 

This study is supported by Ministry of Sci-

ence and Technology, R.O.C., under the contract 

number of MOST 104-2221-E-142- 005. 

References 

[1] K. Fukunaga, Introduction to statistical 

pattern recognition, 2nd ed., New Yo rk: 

Academic Press , 1990. 


Advances in Technology Innovation , vol. 2, no. 3, 2016, pp. 68 - 72 

72 Copyright ©  TAETI 

[2] R. O. Duda, P. E. Hart, and D. G. Stork, 

Pattern classification, 2nd ed., New Yo rk:  

John Wiley & Sons , 2001. 

[3] S. J. Raudys and A. K. Ja in, “ Sma ll sa mple  

size effects in statistical pattern recognition: 

recommendations for practitioners ,” IEEE 

Transaction on Pattern Analysis and M a-

chine Intelligence, vol. 13 no. 3, pp. 

252-264, 1991. 

[4] D. A. Landgrebe, Signal theory methods in 

Multispectral Re mote Sensing, Ne w Jersey: 

John Wiley and Sons, 2003. 

[5] P. K. Varshney and M. K Arora, Advanced 

image processing techniques  for remotely  

sensed hyperspectral data, New Yo rk:  

Springer, 2004. 

[6] B. C. Kuo and D.  A. Landgrebe, “Non-
parametric  we ighted feature e xtract ion for 

classification,” IEEE Transaction on Geo-

science and Re mote Sensing, vol.  42, no. 5, 

pp. 1096-1105, 2004. 

[7] J. M. Yang, P. T. Yu, and B. C. Kuo, “A 

nonparametric feature extraction and its ap-

plication to nearest neighbor classification  

for hyperspectral image data,”  IEEE Trans-

actions on Geoscience and Remote Sensing, 

vol. 48, no. 3, pp. 1279-1293, 2010. 

[8] Y. Zhou, J. Peng, and C. L. Ph ilip Chen, 

“Dimension reduction using s patial and 

spectral regularized local discriminant e m-

bedding for hyperspectral image classifica-

tion,”  IEEE T ransactions on Geoscience 

and Re mote Sensing, vol.  53, no. 2, pp. 

1082-1095,  2015. 

[9] H. Pu, Z. Chen, B. Wang, and G. M . Jiang, 
“A novel spatial–spectral similarity meas-

ure for dimensionality reduction and class i-

fication of hyperspectral imagery,” IEEE  

Transactions on Geoscience and Re mote  

Sensing, vol. 52, no. 11, pp. 7008-7022,  

2014. 

[10] Hyperspectral re mote sensing scenes, 
[Online]. Available: 

http://www.ehu.eus/ccwintco/index.php?titl

e=Hyperspectral_Re mote_Sensing_Scenes. 

[Accessed 24 May 2016]