*Corresponding Author

P-ISSN: 2087-1244
E-ISSN: 2476-907X

111

ComTech: Computer, Mathematics and Engineering Applications, 13(2), December 2022, 111−121
DOI: 10.21512/comtech.v13i2.7821

An Improved Weighted Median Algorithm for Spatial 
Outliers Detection

Zerlita Fahdha Pusdiktasari1*, Rahma Fitriani2, and Eni Sumarminingsih3 

1-3Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Brawijaya
Jln. Veteran, Jawa Timur 65145, Indonesia

1zerlitafahdha@gmail.com; 2rahmafitriani@ub.ac.id; 3eni_stat@ub.ac.id

Received: 19th October 2021/ Revised: 4th January 2022/ Accepted: 5th January 2022 

How to Cite: Pusdiktasari, Z. F., Fitriani, R., & Sumarminingsih, E. (2022). An Improved Weighted Median Algorithm for 
Spatial Outliers Detection. ComTech: Computer, Mathematics and Engineering Applications, 13(2), 111−121. 

https://doi.org/10.21512/comtech.v13i2.7821

Abstract - A spatial outlier is an object that 
significantly deviates from its surrounding neighbors. 
The median algorithm is one of the spatial outlier 
methods, which is robust. However, it assumes that 
all spatial objects have the same characteristics. 
Meanwhile, the Average Difference Algorithm 
(AvgDiff) has accommodated the differences in spatial 
characteristics, but it does not use statistical tests to 
determine the status of an object, whether it is an outlier 
or not. The research developed an improved version of 
the median algorithm and AvgDiff, called the Weighted 
Median Algorithm (WMA). WMA combined the 
advantages of the two methods. From the median 
algorithm, WMA adopted median and statistical test 
concepts. Meanwhile, from AvgDiff, WMA adopted 
the concept of using differences in objects’ spatial 
characteristics as weights. A combination of the two 
advantages was innovated by calculating WMA’s 
neighborhood score using a weighted median. Then, a 
simulation was conducted to analyze the accuracy of 
the method. The result confirms that when objects have 
heterogeneous spatial characteristics, WMA performs 
better than the median algorithm. The accuracy of 
WMA is not much higher than AvgDiff, but the use of 
WMA can prevent a serious false detection problem. 
The methods can be applied to an incidence rate of 
Covid-19 data in East Java.

Keywords: Weighted Median Algorithm (WMA), 
spatial outliers, average difference algorithm  

I. INTRODUCTION

Outliers are objects in a data set with extreme 
values which deviate from other objects. Such 
deviations raise suspicions that the objects come 
from different mechanisms from the rest of the data. 
In a spatial context, outliers are referred to as spatial 
outliers. Spatial outliers are objects whose nonspatial 

attribute values differ significantly from their 
surrounding objects (nearest neighbors) (Shekhar, Lu, 
& Zhang, 2001). The spatial outlier detection method 
pays attention to the spatial correlation and spatial 
arrangement or position between spatial objects.  
Spatial outlier detection can lead to the discovery of 
special patterns, meaningful insights, and important 
implicit information. The relevant methods have been 
applied to many cases, such as identifying air pollutant 
networks (Araki, Shimadera, Yamamoto, & Kondo, 
2017; Van Zoest, Stein, & Hoek, 2018; Wu, Tang, & 
Wang, 2018), leaking water from supply pipelines 
(Helwig, Guggenberger, Elmore, & Uetrecht, 2019), 
water pollutant (Shukla & Lalitha, 2021), potential 
mineralization (Nguyen, Vu, Trinh, & Nguyen, 2016), 
hot spots of soil pollution (Fu, Zhao, Zhang, Wu, & 
Tunney, 2016; Tepanosyan, Sahakyan, Zhang, & 
Saghatelyan, 2019; Xiao, Wang, Hou, & Erten, 2020), 
traffic density caused by unexpected or temporary 
incidents like traffic accidents and celebration 
(Djenouri & Zimek, 2018; Pu, Wang, Liu, & Zhang, 
2019; Tang & Ngan, 2016), anomalous system 
behavior of Wireless Sensor Networks (WSN) (Ayadi, 
Ghorbel, Obeid, & Abid, 2017; Bosman, Iacca, Tejada, 
Wörtche, & Liotta, 2017), anomalous teen birth rate 
(Khan et al., 2017),  and others.

In addition, the detection of spatial outliers can 
also be applied to the cases of Covid-19, which is 
recently a problem in all countries of the world (Baba, 
Midi, & Abd Rahman, 2021; Xia, An, Li, & Zhang, 
2022). The outlier, in this case, is an area that behaves 
unusually based on the incidence rate of Covid-19. An 
area is considered normal (behaves naturally) if the 
area has almost the same high incidence rate as the 
surrounding areas. Areas that become outliers indicate 
factors that influence the number of cases, apart from 
the factor of transmission or spread from surrounding 
areas. For the rest of the research, the term ‘outliers’ is 
referred to as ‘spatial outliers’.

IN
 P

RE
SS


112 ComTech: Computer, Mathematics and Engineering Applications, Vol. 13 No. 2 December 2022, 111−121

In spatial data, an object which is the center of 
attention is called a central object. The other objects 
surrounding a central object are defined as the nearest 
neighbors. The condition of the neighboring objects is 
summarized into a neighborhood score. Then, spatial 
outlier detection is performed by comparing the 
nonspatial attribute values   of the central object with its 
neighborhood scores. A comparison score measures the 
comparison. If a central object extremely differs from 
its neighbors, the comparison score will be extremely 
large. In that case, the central object is considered an 
outlier (Lu, Chen, & Kou, 2003; Shekhar et al., 2001). 
Therefore, the neighborhood score must be carefully 
formed because it can describe the neighborhood’s 
condition accurately.

The problem that often arises in the formation 
of spatial outlier detection algorithms is how to 
define the neighborhood score that well represents 
the condition of the neighbors. Previous studies such 
as spatial statistics by Shekhar et al. (2001), iterative 
z and iterative r by Lu et al. (2003), and improved z 
score by Aggarwal, Gupta, Singh, Sharma, and Sharma 
(2019) to calculate the neighborhood score. Others 
utilize the median, such as the median algorithm by 
Chen, Lu, Kou, and Chen (2008). Each method has 
its advantages and disadvantages. The use of mean 
is the simplest method, but it can mistakenly detect a 
central-normal object as a spatial outlier due to the true 
outlier in its neighborhood. This condition is known 
as the swamping effect (Baba et al., 2021; Kolbaşi & 
Ünsal, 2019; Wang & Serfling, 2018). The neighbor 
scores are pulled towards outlier values   which cause 
the comparison score to be higher than it should be. 
The swamping effect is less likely to happen when 
the median is used. It is due to the robust nature of 
the median (Sajana & Sajesh, 2018). However, both 
mean and median-based methods still assume that all 
spatial objects have the same spatial characteristics 
(homogeneous). In many fields of application, the 
area of the objects, the distance between objects, or the 
length of the shared border between the objects tend 
to be different (Taha, Onsi, El Din, & Hegazy, 2019).  

The spatial characteristics affect the relationship 
of an object with its neighbors. Figure 1 shows the 
importance of accommodating spatial characteristics 
in calculating neighboring scores. If A is the central 
object, B, C, and D become the nearest neighbors of 
object A. The D is known as an outlier whose nonspatial 
attribute value is extremely different from A, B, and 
C. The mean and median methods will interpret these 
conditions as illustrated in Figure 2.

Figure 1 The Illustration of Spatial Configuration with 
Different Spatial Characteristics

Figure 2 The Illustration of Spatial Configuration with 
Homogen Spatial Characteristics

All objects have the same characteristics. 
Based on the concept of mean and median, B or C 
will be chosen as a representation of the neighboring 
conditions of A. It causes A to be detected as a normal 
object because it is not significantly different from its 
neighbors (B and C). Meanwhile, the real condition 
shows that D has a larger area surrounding A and 
shares a longer common borderline with A than B and 
C with A. It indicates that D dominates the area around 
A and makes a greater contribution to A. Therefore, D 
must have a stronger relationship with A and should be 
chosen to represent the neighbor’s condition of A. It is 
based on the first law of geography that everything is 
related to everything else, but nearby things are more 
related than distant things (Tobler, 1970). In this way, 
A is detected as an outlier because it is significantly 
different from its neighbors (D). From this illustration, 
it is necessary to give different weights to the objects 
with different spatial characteristics in the calculation 
of the neighborhood score, as in line with Colak, 
Memisoglu, Erbas, and Bediroglu (2018), Taha et al. 
(2019), and Ulak, Ozguven, Vanli, and Horner (2019).

According to Kou, Lu, and Chen (2006), 
outlier detection methods accommodate different 
characteristics. These methods are the weighted 
Z value approach (weighted Z) and the Average 
Difference Algorithm (AvgDiff). Both methods assign 
different weights for different neighbors in computing 
the Degree of Oulierness (DO) of the central object. 
The closer/stronger the spatial relationship between 
objects is, the greater the weight is given to the 
objects. For example, Fitriani, Pusdiktasari, and 
Diartho (2021) applied AvgDiff to identify the spatial 
outlying East Java regency/municipality in terms of 
economic growth. In their study, the use of AvgDiff 
was in accordance with the conditions of East Java 
regencies/municipalities with different characteristics. 
However, the methods still have several drawbacks. 
Weighted Z and AvgDiff have the potential for false 
detection when neighbors with extreme values have 
small weights. The non-robustness of the mean 
(average) used in those two methods is the cause. 
Furthermore, AvgDiff does not rely on a statistical 
test to determine the outliers. It is based on the rank 
of the DO, namely a concept of the top m outliers. 
Statistical tests cannot be used because the results of 
the difference between the value of the central object 
and its neighbors are in absolute values, so it does not 
follow the normal distribution. In the concept of top m 
outliers, a researcher is asked to determine the number 
of outliers (m). It is a hard-to-answer question because 

IN
 P

RE
SS


113An Improved Weighted..... (Zerlita Fahdha Pusdiktasari et al.)

the researchers never know how many outliers exist 
in the data set (Su, 2011). Therefore, it is necessary 
to develop the outlier detection method further to 
improve the drawbacks of the median algorithm 
and AvgDiff. With the improvement, it expects a 
more accurate outlier detection method that can 
accommodate the real conditions in applied cases. It 
is important to do, given that outlier detection can be 
applied in wide cases. It is not only applied in cases 
related to geospatial on the earth’s surface but can 
also be in objects that can be defined spatially. For 
example, outlier detection methods can be applied in 
the medical field to detect the position of cells that 
behave unusually, such as tumor/cancer cells, from 
other cells in the vicinity (Goovaerts, 2005; Ijaz, 
Attique, & Son, 2020; Prastawa, Bullitt, Ho, & Gerig, 
2004). If the method results in false detection, it will 
certainly be a serious problem.

Based on the explanation, the research develops 
an improved version of the median algorithm and 
AvgDiff, called Weighted Median Algorithm (WMA). 
WMA combines the advantages of the two methods. 
From Median Algorithm, WMA adopts the median and 
statistical test concepts. From AvgDiff, WMA adopts 
the concept of using differences in objects' spatial 
characteristics as weights. The advantages of the 
two methods are combined by utilizing the weighted 
median in the calculation of neighborhood score 

 and using a robust statistical test to determine 
the status of objects. Combining both advantages 
will improve the drawbacks of the median algorithm, 
which does not accommodate spatial characteristics 
differences. It will also improve the drawbacks of 
AvgDiff, which is not robust, and the determination 
of outliers that are only based on ranks. Two scenarios 
are used in the research. Scenario 1 is used to analyze 
the performance of WMA in detecting outliers in 
data with different spatial characteristics of objects. 
Then, scenario 2 analyzes the performance of WMA 
in detecting outliers without asking the researchers to 
determine the right m. The simulation is conducted 
10.000 times to measure the accuracy of the methods.

II. METHODS

The research outlines the proposed algorithm, 
WMA. The algorithm is a systematic practical method 
to do a computation, in this case, detecting an outlier. 
The algorithm consists of inputs, effective step-by-
step methods, and output. WMA is an improved 
version of the median algorithm by Chen et al. (2008) 
and AvgDiff by Kou et al. (2006). The improvement 
is made by adding different objects’ spatial 
characteristics as weights to the median algorithm. 
It is done by changing the median into the weighted 
median in calculating the neighborhood score ( ). 
It also uses statistical tests in determining outliers as a 
substitute for top m outliers in AvgDiff. The inputs are 
as follows. It shows S as a set of spatial objects 
{s1, s2, ... , sn}, k as the number of the nearest neighbors 

of a central object. The neighbors are determined based 
on a spatial configuration, notated by NNk (i), and X as 
a set of attribute values {x1, x2, ... , xn}, in which x1 is 
the attribute value of spatial object i.

Meanwhile,  is the value that measures 
the spatial relationship of the neighbors to the central 
object based on spatial characteristics information. 
Several characteristics can be used to define this 
relationship, such as the inverse distance, the length 
of the shared border, and the area. The concept gives 
larger weights to objects whose spatial relationship is 
stronger (Taha et al., 2019). The weights must satisfy 
the following condition

       (1)

Those notations are used to define the following 
steps for WMA. First, k nearest neighbors for each 
object i are defined using spatial relationships, such 
as Queen Contiguity, Rook Contiguity, and others. 
Then, spatial weights are calculated based on spatial 
information (length of the shared border (borderj)) 
between the central object i and each of its neighbors 
using Equation (2).

        (2)

The weights are used to calculate the weighted 
median of the nearest neighbors. A weighted median 
is the 50% weighted percentile. The k-ordered 
nearest neighbors attribute values of x1, x2, ... , xk with 
weights , xi will be the 
weighted median if it satisfies the following condition 
(Edgeworth, 1887). If an ordered nearest neighbor (xi) 
has a total weight of its previous neighbors of less than 
or equal to ½, and the total weights thereafter is less 
than equal to ½, the xi is the weighted median.

       (3)

The weighted median is called the neighborhood 
score . This neighborhood score represents the 
condition of neighbors of a central object. It can be 
defined as follows. The neighborhood score  
is selected from the attribute values of weight ordered 
neighbors which satisfy the condition in Equation (3).

                  (4)

Then, the comparison score is calculated. The 
score measures the differences between a central 
object  and its neighborhood score . The 
comparison score indicates how big the difference of 

IN
 P

RE
SS


114 ComTech: Computer, Mathematics and Engineering Applications, Vol. 13 No. 2 December 2022, 111−121

a central object and its neighbors. If the comparison 
score is extremely high, it shows that the central object 
has a very different characteristic from its surrounding 
and vice versa. The calculation uses Equation (5).

        (5)

Finally, the status of objects (normal/outlier) is 
defined using the statistical test in Equation (6). The μ* 
and σ* denote the robust mean and standard deviation, 
respectively. According to Chen et al. (2008) and 
Kolbaşi and Ünsal (2019), the median is used as μ*, 
and Median Absolute Deviation (MAD) as σ* of H set 
{h1, h2, ... , hn}.

                  (6)

In the research, simulations are conducted. 
Generated data with predetermined outliers are used 
so their existence can be traced. The situation is 
hardly met when real data are used. The reason for 
using generated data is for the ease of evaluating the 
performance of the outlier detection methods (Ernst & 
Haesbroeck, 2017).

Figure 3 shows the spatial configuration of the 
objects. Numbers in Figure 3 show the ordered code 
for the objects. The values (attributes) for these spatial 
objects are generated based on the spatial lag model 
with a high positive autocorrelation value. It ensures 
that all spatial objects are normal objects. Normal 
object in spatial data has attribute values that tend to 
be similar or do not extremely differ from the attribute 
values of their neighboring objects. The spatial lag 
model is used because it is assumed that there is a 
relationship between the attribute value of an object 
and the attribute value of its surrounding objects. High 

positive spatial autocorrelation is used because it is in 
line with normal object definitions. A parameter (ρ) is 
used to measure the degree of spatial autocorrelation. 
The high value of ρ (close to 1) indicates strong 
positive spatial autocorrelation, so clusters of nearby 
spatial objects with similar attribute values will be 
formed.

The model used to generate the attribute 
values of the 36 spatial objects is shown in Equation 
(7). It has ρ = 0,9, indicating strong positive spatial 
autocorrelation. Strong autocorrelation shows that 
all the objects have strong relationships and affect 
one another. That is why they tend to have the same 
characteristics. In this condition, it can be easily 
controlled that there is no outlier besides the defined 
or generated ones

      (7)

Then, the spatial outlier can be created by 
selecting a particular normal object and replacing its 
attribute value with a more extreme value. The spatial 
outlier attribute value is generated as follows. For 
extremely high spatial outlier, it is shown in Equation 
(8). Meanwhile, the extremely low spatial outlier is in 
Equation (9).

       (8)

        (9)

Extremely high spatial outliers are the type 
of spatial objects with very high attribute values 
compared to their nearest neighbors. Meanwhile, the 
spatial outlier of the extremely low type is a spatial 
object with a very low attribute value compared to its 
nearest neighbors.

Figure 3 Spatial Configuration of Spatial Objects for Data Generation

IN
 P

RE
SS


115An Improved Weighted..... (Zerlita Fahdha Pusdiktasari et al.)

The simulation is done with two scenarios. 
The first scenario (scenario 1) is applied to the 
median algorithm and WMA. The scenario is set so 
there are outliers in a data set with different spatial 
characteristics among the objects. Figure 4 illustrates 
a situation where the objects have various areas and 
lengths of shared borders.

In the research, the length of the shared border 
is used as the weight. The unit of length is explained 
through some illustrations in Figure 4. It shows the 
subsets of the spatial configuration depicted in Figure 
3. Figure 4(a) shows objects 1 and 2 sharing 1 unit 
border length. Then, Figure 4(b) shows objects 10 and 
11 sharing 2 units of border length. Meanwhile, Figure 
4(c) shows objects 9 and 16 sharing 2 units of border 
length. The method used to define a neighboring object 
in the research is Rook Contiguity.

In this scenario, objects 18 and 7 in Figure 3 are 
defined as outliers. In outlier detection methods that 
do not accommodate spatial characteristics, object 18 
has the same contribution as objects 17 and 24 to the 
central object (object 20). However, object 18 has a 
really long shared border with object 20 compared 
to objects 17 and 24 to object 20. It indicates that 
object 18 dominates the neighborhood of object 
20, which makes it a good choice to represent the 
neighbors. Hence, it is good to choose it to represent 
the neighbors. These characteristics make object 20 
an outlier because it has an extremely different value 
from its neighbors. The simulation gives a ‘TRUE’ 

result if the method can correctly detect objects 18, 7, 
and 20 as outliers. The simulation is conducted 10.000 
times to calculate the accuracy of the methods. Each 
simulation generates normal object values   first using 
Equation (7) and outlier values   for objects 18 and 7 
using Equations (8) and (9).

Scenario 2 is applied to AvgDiff and WMA. The 
scenario is formed with the same spatial configuration 
(Figure 3) and weights as scenario 1 but is focused 
on analyzing the performance of WMA in detecting 
outliers without determining the exact m. Each 
simulation generates normal object values   first using 
Equation (7), defines two objects randomly as outliers, 
and generates outlier values   for the two selected objects 
using Equations (8) and (9). The simulation will give a 
‘TRUE’ result if the method correctly detects the two 
objects defined before as outliers. The simulation is 
conducted 10.000 times to calculate the accuracy of the 
method. According to its algorithm, the determination 
of outliers in AvgDiff uses top m outliers. Based on 
this concept, with 36 objects, AvgDiff detects 2 
( ) objects with the highest DO 
as outliers. Meanwhile, in WMA, the determination of 
outliers is based on statistical tests using Equation (6). 

 The accuracy is calculated using Equation 
(10). The higher the accuracy of the method is, the more 
precise the method is in detecting spatial outliers that 
the researchers have predetermined. The improvement 
of spatial outlier detection methods is shown in 
Figure 5. The base of the spatial outlier detection 

                                                       
         (a)                                                  (b)                                            (c)

Figure 4 The Illustration of How to Measure the Length of Shared Border

Figure 5 The Improved Spatial Outlier Detection Methods Flowchart

IN
 P

RE
SS


116 ComTech: Computer, Mathematics and Engineering Applications, Vol. 13 No. 2 December 2022, 111−121

methods improvement in this research is Spatial 
Statistics. The method then improved to be a robust 
outlier detection method in Median Algorithm, but it 
does not assign different spatial characteristics. Spatial 
Statistics is also improved by assigning different spatial 
characteristics in Average Difference Algorithm, but it 
is not robust. Therefore, in the research the advantages 
of those two methods are combined by improving a 
Weighted Median Algorithm. The method is robust 
and assign different spatial characteristics.

    (10)

III. RESULTS AND DISCUSSIONS

The data are generated for 36 normal objects 
before being adjusted to the settings in scenarios 1 and 
2. The data generation is done according to Equation 
(7). Each generation produces 36 attribute values for 
36 objects. Moran’s I test is conducted to ensure that 
the 36 objects are normal objects. Table 1 shows the 
results of Moran’s I test for a one-time generation.

Table 1 The Result of Moran’s I Test 
in the Generated Data

Moran’s I Statistic 0,44202157
p-value 1,587e-06

Based on the results of Moran’s I test 
in Table 1, the p-value is not greater than α = 5%, 
leading to the rejection of the null hypothesis. This 
rejection indicates that the generated attribute values   
have a strong spatial autocorrelation. Objects with 
strong spatial autocorrelation show that they have 
attribute values   that tend to be the same as the attribute 
values of their nearest neighbors. This statement is 

also the definition of a normal object. Thus, it can be 
guaranteed that all objects with generated attribute 
values   are normal objects.

From 36 normal objects, objects 18 and 7 are 
selected as spatial outliers. This selection is based 
on the characteristics of objects 18 and 7, which 
emphasize the importance of weights. Object 18 has 
13 nearest neighbors, and object 20 is one of them. 
Object 20 shares 5 units of length common border 
with object 18. Meanwhile, it only shares 1 unit with 
object 17 and 2 units with object 24. Object 20 has 
values   that are similar to objects 17 and 24. However, 
the area around object 20 is dominated by object 18 
with extreme values. So, object 18 represents the 
neighbors of object 20 well. This condition makes 
object 20 an outlier because object 20 differs from its 
neighbors (object 18). Although the generated outliers 
are objects 18 and 7, object 20 is also an outlier.

Object 7 is selected as an outlier to show that 
the neighbors with the greatest weights do not always 
dominate. Object 3 is one of the neighbors of object 
7. Both share a common border of 2 units. Other 
objects that are neighbors of object 3 only share 1 
unit, namely objects 2, 4, 8, and 12. However, object 
7, with extreme value, is not considered dominating. 
The reason is that objects 2, 4, 8, and 12 with similar 
values have 4 of 6 units border of object 3. So, those 
objects are a better representation of the neighbors of 
object 3. Because object 3 has similar values to the 
values of its neighbors, it will not be considered an 
outlier.

This scenario ensures that the true outliers 
are objects 18, 7, and 20.  Therefore, a good outlier 
detection method can detect these three objects as 
outliers. The results of running the median algorithm 
and WMA for one-time simulation are presented in 
Table 2. The simulation is conducted one time so that 
the DO object can be analyzed. Although there are 
36 objects, only 10 objects with the highest DO are 
presented in Table 2 for simplification.

Table 2 Comparison of the Results of Median Algorithm and WMA Based on Scenario 1

Rank Object Index DO of Median 
Algorithm

Object Index DO of WMA

1 18 6,65746525 18 6,90930272
2 7 5,46865086 20 6,84920951
3 12 2,14256891 7 4,97271731
4 29 1,58279921 12 2,03841279
5 9 1,56876925 11 1,68248516
6 24 1,52384865 10 1,66903583
7 2 1,51474089 9 1,65483052
8 10 1,44017602 3 1,54885561
9 3 1,30381040 29 1,48769293
10 17 1,13747811 2 1,42073491

IN
 P

RE
SS


117An Improved Weighted..... (Zerlita Fahdha Pusdiktasari et al.)

The results of outlier detection in Table 2 show 
that objects with DO values more than 3 in the median 
algorithm are objects 18 and 7. These two objects are 
indeed the two predetermined outliers. However, it 
is unable to detect object 20 as an outlier. The object 
is not even included in the top 10 objects with the 
highest potential as outliers. On the contrary, WMA 
detects those objects 18, 20, and 7 as having more than 
3 DO values, which indicates those objects as outliers. 
This result indicates that WMA performs better than 
the median algorithm in detecting outliers in data with 
heterogeneous characteristics of objects. Simulations 
are conducted 10.000 times to measure the accuracy 
of the methods. 

The determination of object status (normal/
outlier) is only based on its rank (top m outlier) and 
does not use statistical tests. It is due to the use of 
absolute differences in its algorithm, resulting in 
values that do not follow a normal distribution. This 
concept is computationally advantageous because 
the calculations become faster. However, this method 
requires the researchers’ knowledge regarding the 
number of outliers (m) in the data. This information is 
rarely unknown.

WMA is developed to overcome this weakness 
of AvgDiff. The improvement is made while keeping 
the advantages of AvgDiff in accommodating the 
spatial characteristics to the calculation of DO. Table 3 
presents the results of running the two methods based 
on scenario 2 for a one-time simulation. In this one-

time simulation, objects 18 and 32 are the randomly 
selected outliers. However, in line with the explanation 
in scenario 1, object 20 is also an outlier because it is 
the neighbor of object 18.

Based on Table 3, AvgDiff successfully detects 
the 3 predetermined outliers as the objects with the 
highest DO. However, based on the concept of top 
m outliers in AvgDiff, m is calculated by the formula 
m = 5% × n. With n = 36, it is m = 5% × 36 = 1,8 ≈ 2. If 
the researchers do not have a priori information about 
the number of outliers in the data, AvgDiff will only 
identify 2 objects with the highest DO as outliers, 
while, in fact, there are 3 outliers. The method fails to 
detect object 20 as an outlier.

WMA can identify the number of outliers and 
which objects are the outliers. WMA detects objects 
18, 32, and 20 as the 3 objects with the highest DO 
without prior determination about the number of 
outliers in the data. From the results in Table 3, the 
DO values of objects 18, 32, and 20, which exceed 
3, indicate that WMA successfully detects those 
predetermined objects as outliers. It confirms that 
WMA performs better than AvgDiff. Both produce the 
same detection results, but WMA can automatically 
determine outliers. Meanwhile, AvgDiff requires a 
predetermined correct number of outliers (m). Based 
on Table 4, which shows the simulation results, the 
WMA overperforms the median algorithm with an 
accuracy of 80,45%. Meanwhile, the accuracy of the 
median algorithm is 0,81%.

Table 3 Comparison of AvgDiff and WMA Based on Scenario 2

Rank Object Index DO of AvgDiff Object Index DO of WMA
1 18 8,7768072 20 9,65291968
2 32 6,8198899 18 9,59576428
3 20 5,8187378 32 5,43849870
4 14 4,0480307 16 2,67597554
5 31 3,5675264 14 2,43632067
6 15 3,0593516 15 2,40146974
7 25 3,0189807 33 2,36647726
8 36 2,9466585 6 1,77197493
9 17 2,8080744 17 1,49408098

10 12 2,7121336 31 1,26290926

Table 4 The Accuracy of the Methods Based on Two Simulation Scenarios

Scenario Methods
Median Algorithm WMA

1 0,81% 80,45%
AvgDiff WMA

2 87,25% 84,36%

IN
 P

RE
SS


118 ComTech: Computer, Mathematics and Engineering Applications, Vol. 13 No. 2 December 2022, 111−121

Simulations are conducted 10.000 times to 
measure the accuracy of both methods. Based on the 
results in Table 4, AvgDiff’s accuracy is 87,25% while 
WMA is 84,36%. Although the accuracy of WMA 
is not higher than AvgDiff, WMA can avoid a more 
serious problem when researchers cannot determine 
the exact m.

The WMA is the improved version of the 
median algorithm, which is weighted based on the 
spatial characteristics of the objects. When WMA is 
applied to objects with homogeneous characteristics, 
the same weight for all objects will produce DO, which 
is exactly the same as the DO of the median algorithm. 
The median algorithm and WMA are applied to data 
with homogeneous spatial characteristics objects to 
show these conditions. The detection results can be 
seen in Table 5.

The robustness of the method is also confirmed 
by Chen et al. (2008) by developing median algorithm 
to overcome the drawback of spatial statistics with 
masking and swamping effects. The research results 
confirm that median algorithm is a robust method 
which can overcome the effects of masking and 
swamping. The result is in line with the results of 
research by Su (2011), and the validity of the method 
is also confirmed by Wang, Wang, Hong, and Wan 
(2004). These three previous studies show the good 
performance of the median algorithm for homogeneous 
spatial characteristics objects. Thus, when the object 
has homogeneous characteristics, WMA is as good 
and robust as a median algorithm. However, when 
the objects have heterogeneous spatial characteristics, 
WMA performs better than a median algorithm.

On March 11th, 2020, the World Health 
Organization (WHO) declared Covid-19 as a global 
pandemic. In Indonesia, as of December 20th, 2021, 
this virus has caused more than 4,6 million confirmed 
infection cases and around 144.000 confirmed 
deaths (Mathieu et al., 2020) . East Java is one of 
the provinces with the highest number of cases of 

Covid-19. Although some cases are still found in 
several areas, the case rate is low enough, and the 
recovery rate has increased to 4,1 million (Satuan 
Tugas Penanganan COVID-19, 2021). However, the 
government should not be off guard. The wave of the 
spread of Covid-19 can happen again at any time. In 
addition, new variants of Covid-19 which are the result 
of mutations continuously appear, such as the Alpha, 
Beta, Delta, and Gamma I variants (Duong, 2021) to 
the newest one, Omicron (WHO, 2021) Therefore, 
studies related to this pandemic need to be continued. 
They can be used as a consideration in determining 
prevention and efforts to overcome the spread of 
Covid-19 in the future. If the detected outliers are 
extremely high (areas with high values   surrounded by 
areas with a tendency to low values), the government 
is suggested to make a particular strategy to reduce 
the transmission rate or the number of Covid-19 cases 
in that area. If the detected outliers are extremely low 
(areas with low values   surrounded by areas with high 
values), the outliers can be considered a pilot area 
in suppressing the number of Covid-19 cases to be 
applied in other areas.

The model is applied to determine the unusual 
behavior based on the number of confirmed Covid-19 
cases. Areas with significantly different behavior are 
considered outliers. The data are the incidence rate 
(transmission) which is the ratio of the accumulated 
number of positive confirmed cases of Covid-19 to 
the population in the city/regency in East Java from 
March 2020 until December 2021. The data are taken 
from the covid19.go.id. Then, the neighboring method 
used is Queen Contiguity, considering the irregular 
shape of the object (areas). Spatial information as 
weight is Euclidean distance, based on the CDC 
statement (2020) that Covid-19 can spread through 
air contaminated with droplets and small airborne 
particles containing the virus. Table 6 shows the top 
ten objects (areas) that have the most risk of becoming 
spatial outliers.

Table 5 The Results of Median Algorithm and WMA in Detecting Outliers in Data 
with Homogeneous Characteristics Objects

Rank Object Index DO of AvgDiff Object Index DO of WMA
1 13 4,78593827 13 4,78593827
2 23 4,13793846 23 4,13793846
3 27 1,73384047 27 1,73384047
4 25 1,64925227 25 1,64925227
5 17 1,55829115 17 1,55829115
6 9 1,35445116 9 1,35445116
7 20 1,09954773 20 1,09954773
8 24 1,00967065 24 1,00967065
9 29 0,98937845 29 0,98937845
10 26 0,94638824 26 0,94638824

IN
 P

RE
SS


119An Improved Weighted..... (Zerlita Fahdha Pusdiktasari et al.)

The difference in the detection results between 
the median algorithm and WMA is because the median 
algorithm assumes that all objects (areas) have the same 
characteristics as outlined in the concept of contiguity 
(neighborhood). Based on the East Java map in Figure 
6, it can be seen that Mojokerto Regency and Pasuruan 
Regency (with the tendency of low values) dominate 
the neighboring Sidoarjo Regency. It causes the 
median algorithm to choose areas with low Covid-19 
cases to represent neighboring Sidoarjo Regency, 
resulting in Sidoarjo Regency with high Covid-19 
cases being detected as outliers. However, city/district 
areas in East Java have different spatial characteristics, 
and in Covid-19 cases, distance is important, not only 
contiguity. So, the median algorithm is unable to 
accommodate this case properly. 

WMA only detects Surabaya City as an outlier. 
The WMA result states that Sidoarjo Regency is not 
an outlier considering differences in characteristics, 
especially the distance between regions as spatial 
information. The distance of Sidoarjo Regency, which 
is very close to Surabaya City, is indicated by a fairly 
large weighting value. This condition makes Surabaya 
City dominate the neighbor of Sidoarjo Regency. 
Hence, Sidoarjo Regency is not detected as an outlier 
because it has the same high value as its neighbors 
(Surabaya City). In other words, Sidoarjo Regency 
with a high Covid-19 incidence rate is normal because 
it is influenced by its neighbors, which also have 
a high Covid-19 incidence rate. With regard to the 

goodness of the method, WMA is more suitable to be 
used to detect outliers in the case of the incidence rate 
of Covid-19.

Apart from the differences in the three methods 
described, there is one thing in common. All three 
methods detect Surabaya City as an outlier. Some factors 
support this condition. Its geographical position, which 
is a coastal settlement, makes Surabaya have a high 
potential as a stopover and settlement for immigrants. 
In addition, its highly dense population and large port 
make Surabaya have a very big role in receiving and 
distributing industrial goods. Then, as a trade center, 
Surabaya City is the second largest metropolitan city 
after Jakarta. Malls and cafes in this city are the largest 
compared to other cities in East Java. These places are 
the main entertainment for the citizens of Surabaya 
City. Psychologically, it is not easy for its citizen to 
refrain from gathering and spending time outside their 
homes. So, their mobility is still high as it is not easy 
for them to stay home. It causes a high incidence rate 
as well. However, those complex characteristics of 
Surabaya City cannot be found in other cities around 
it. Although currently in Indonesia, the incidence rate 
of Covid-19 has decreased, this finding can be used 
as a consideration for the government in the future if 
a similar case occurs. The government is advised to 
pay attention and focus on prevention and action for 
Surabaya, as it is a center of mobility and the entrance 
and exit of East Java.

Table 6 The Results of Outlier Detection on Covid-19 Incidence Rate in 38 Cities/Regencies in East Java

1 2 3 2 4 2 5
1 37 12,505 37 0,0439 37 8,124
2 15 3,842 15 0,0231 25 2,355
3 30 2,461 25 0,0209 15 2,241
4 9 2,279 30 0,0094 30 2,032
5 10 1,846 9 0,0069 9 1,704
6 7 1,6227 14 0,0067 24 1,222
7 8 1,3171 6 0,0058 10 1,174
8 34 1,2410 16 0,0054 8 1,167
9 38 1,1638 38 0,0051 38 1,109

10 2 1,0450 10 0,0051 6 1,068

Note: 1 = Rank; 2 = Object Index; 3 = DO of Median Algorithm; 4 = DO of AvgDiff; 5 = DO of WMA

Figure 6 The Map of Cities/Regencies in East Java

IN
 P

RE
SS


120 ComTech: Computer, Mathematics and Engineering Applications, Vol. 13 No. 2 December 2022, 111−121

IV. CONCLUSIONS

In the research, a method for detecting spatial 
outliers is developed, namely WMA. It is an improved 
version of the median algorithm and AvgDiff. WMA 
is confirmed to be as good and robust as the median 
algorithm when applied to objects with homogeneous 
spatial characteristics. When the objects have 
heterogeneous spatial characteristics, WMA performs 
better than a median algorithm. Even though the 
accuracy of WMA is not much higher than AvgDiff, 
the use of WMA can prevent a serious false detection 
problem when there is no prior information about 
the true number of outliers. With this concept, if it is 
applied to data on Covid-19 cases in cities/districts in 
East Java, it is possible to provide detection results. 
The Surabaya City and Sidoarjo Regency are a group 
of outliers with extreme values   compared to other 
areas around them.

Currently, WMA focuses only on univariate 
nonspatial attributes with one information for the 
weights. In the future, researchers can improve it. 
So, it is suitable for data with multivariate nonspatial 
attributes and a combination of spatial information 
for the weights. WMA can also be improved to detect 
outliers in groups.

REFERENCES

Aggarwal, V., Gupta, V., Singh, P., Sharma, K., & Sharma, 
N. (2019). Detection of spatial outlier by using
improved Z-score test. In 2019 3rd International
Conference on Trends in Electronics and Informatics
(ICOEI) (pp. 788–790). IEEE. https://doi.
org/10.1109/ICOEI.2019.8862582

Araki, S., Shimadera, H., Yamamoto, K., & Kondo, A. 
(2017). Effect of spatial outliers on the regression 
modelling of air pollutant concentrations: A 
case study in Japan. Atmospheric Environment, 
153(March), 83–93. https://doi.org/10.1016/j.
atmosenv.2016.12.057

Ayadi, A., Ghorbel, O., Obeid, A. M., & Abid, M. (2017). 
Outlier detection approaches for wireless sensor 
networks: A survey. Computer Networks, 129, 319–
333. https://doi.org/10.1016/j.comnet.2017.10.007

Baba, A. M., Midi, H., & Abd Rahman, N. H. (2021). A 
spatial outlier detection method for big data based 
on adjacency weighted residuals and its application 
to COVID-19 data. Economic Computation and 
Economic Cybernetics Studies and Research, 55(3), 
87–102. https://doi.org/10.24818/18423264/55.3.21.
06

Bosman, H. H., Iacca, G., Tejada, A., Wörtche, H. J., & 
Liotta, A. (2017). Spatial anomaly detection in 
sensor networks using neighborhood information. 
Information Fusion, 33(January), 41–56. https://doi.
org/10.1016/j.inffus.2016.04.007

Chen, D., Lu, C. T., Kou, Y., & Chen, F. (2008). On 
detecting spatial outliers. GeoInformatica, 12, 455–
475. https://doi.org/10.1007/s10707-007-0038-8

Colak, H. E., Memisoglu, T., Erbas, Y. S., & Bediroglu, 
S. (2018). Hot spot analysis based on network
spatial weights to determine spatial statistics of
traffic accidents in Rize, Turkey. Arabian Journal
of Geosciences, 11, 1–11. https://doi.org/10.1007/
s12517-018-3492-8

Djenouri, Y., & Zimek, A. (2018). Outlier detection in 
urban traffic data. In WIMS '18: Proceedings of the 
8th International Conference on Web Intelligence, 
Mining and Semantics (pp. 1–12). https://doi.
org/10.1145/3227609.3227692

Duong, D. (2021). Alpha, Beta, Delta, Gamma: What’s 
important to know about SARS-CoV-2 variants of 
concern? CMAJ: Canadian Medical Association 
Journal, 193(27), E1059–E1060. https://doi.
org/10.1503/cmaj.1095949

Edgeworth, F. Y. (1887). On observations relating to several 
quantities. Hermathena, 6(13), 279–285.

Ernst, M., & Haesbroeck, G. (2017). Comparison of local 
outlier detection techniques in spatial multivariate 
data. Data Mining and Knowledge Discovery, 31, 
371–399. https://doi.org/10.1007/s10618-016-0471-
0

Fitriani, R., Pusdiktasari, Z. F., & Diartho, H. C. (2021). 
Growth interdependence in the presence of spatial 
outliers: Implementation of an average difference 
algorithm on East Java regional economic growth, 
2011-2016. Regional Statistics, 11(3), 119–132. 
https://doi.org/10.15196/RS110306

Fu, W., Zhao, K., Zhang, C., Wu, J., & Tunney, H. (2016). 
Outlier identification of soil phosphorus and its 
implication for spatial structure modeling. Precision 
Agriculture, 17, 121–135. https://doi.org/10.1007/
s11119-015-9411-z

Goovaerts, P. (2005). Detection of spatial clusters and 
outliers in cancer rates using geostatistical filters 
and spatial neutral models. In Proceedings of the 
Fifth European Conference on Geostatistics for 
Environmental Applications (pp. 149–160). https://
doi.org/10.1007/3-540-26535-X_13

Helwig, Z. D., Guggenberger, J., Elmore, A. C., & Uetrecht, 
R. (2019). Development of a variogram procedure to
identify spatial outliers using a supplemental digital
elevation model. Journal of Hydrology X, 3(April),
1–11. https://doi.org/10.1016/j.hydroa.2019.100029

Ijaz, M. F., Attique, M., & Son, Y. (2020). Data-driven 
cervical cancer prediction model with outlier 
detection and over-sampling methods. Sensors, 
20(10), 1–22. https://doi.org/10.3390/s20102809

Khan, D., Rossen, L. M., Hamilton, B. E., He, Y., Wei, R., 
& Dienes, E. (2017). Hot spots, cluster detection 
and spatial outlier analysis of teen birth rates in 
the U.S., 2003–2012. Spatial and Spatio-Temporal 
Epidemiology, 21(June), 67–75. https://doi.
org/10.1016/j.sste.2017.03.002

Kolbaşi, A., & Ünsal, A. (2019). A comparison of the 
outlier detecting methods: An application on Turkish 
foreign trade data. Journal of Mathematics and 
Statistical Science, 5, 213–234.

Kou, Y., Lu, C. T., & Chen, D. (2006). Spatial weighted 
outlier detection. In Proceedings of the 2006 

IN
 P

RE
SS


121An Improved Weighted..... (Zerlita Fahdha Pusdiktasari et al.)

SIAM International Conference on Data Mining 
(pp. 614–618). Society for Industrial and Applied 
Mathematics.

Lu, C. T., Chen, D., & Kou, Y. (2003). Algorithms for 
spatial outlier detection. In Third IEEE International 
Conference on Data Mining (pp. 597–600). IEEE. 
https://doi.org/10.1109/ICDM.2003.1250986

Mathieu, E., Ritchie, H., Rodés-Guirao, L., Appel, C., 
Gavrilov, D., Giattino, C., Hasell, J., Macdonald, 
B., Dattani, S., Beltekian, D., Ortiz-Ospina, E., & 
Roser, M. (2020). Coronavirus (COVID-19) cases. 
Retrieved from https://ourworldindata.org/covid-
cases

Nguyen, T. T., Vu, D. T., Trinh, L. H., & Nguyen, T. L. H. 
(2016). Spatial cluster and outlier identification of 
geochemical association of elements: A case study 
in Juirui copper mining area. Bulletin of the Mineral 
Research and Exploration, 153(153), 159–167. 

Prastawa, M., Bullitt, E., Ho, S., & Gerig, G. (2004). A brain 
tumor segmentation framework based on outlier 
detection. Medical Image Analysis, 8(3), 275–283. 
https://doi.org/10.1016/j.media.2004.06.007

Pu, J., Wang, Y., Liu, X., & Zhang, X. (2019). STLP-OD: 
Spatial and temporal label propagation for traffic 
outlier detection. IEEE Access, 7, 63036–63044. 
https://doi.org/10.1109/ACCESS.2019.2916853

Sajana, O. K., & Sajesh, T. A. (2018). Detection of 
multidimensional outlier using multivariate spatial 
median. Journal of Computer and Mathematical 
Sciences, 9(12), 1875–1881.

Satuan Tugas Penanganan COVID-19. (2021). Penanganan 
COVID-19 2021: Kesembuhan melebihi 4,1 juta, 
kasus aktif tersisa 4 ribu dan vaksinasi melampaui 
161 juta orang. Retrieved from https://covid19.
g o . i d / p / b e r i t a / p e n a n g a n a n - c o v i d - 1 9 - 2 0 2 1 -
kesembuhan-melebihi-41-juta-kasus-aktif-tersisa-4-
ribu-dan-vaksinasi-melampaui-161-juta-orang

Shekhar, S., Lu, C. T., & Zhang, P. (2001). A unified 
approach to spatial outlier detection. Retrieved 
from https://hdl.handle.net/11299/215495

Shukla, S., & Lalitha, S. (2021). Spatial analysis of water 
quality data using multivariate spatial outlier 
detection algorithms. GANITA, 70(2), 87–96.

Su, P. C. (2011). Statistical geocomputing: Spatial outlier 
detection in precision agriculture (Master's thesis). 
University of Waterloo.

Taha, A., Onsi, H. M., El Din, M. N., & Hegazy, O. M. 
(2019). A model for spatial outlier detection based on 
weighted neighborhood relationship. arXiv Preprint, 
1–12. https://doi.org/10.48550/arXiv.1911.01867

Tang, J., & Ngan, H. Y. T. (2016). Traffic outlier detection 
by density-based bounded local outlier factors. 
Information Technology in Industry, 4(1), 6–18.

Tepanosyan, G., Sahakyan, L., Zhang, C., & Saghatelyan, 
A. (2019). The application of Local Moran's I to 
identify spatial clusters and hot spots of Pb, Mo and 
Ti in urban soils of Yerevan. Applied Geochemistry, 
104(May), 116–123. https://doi.org/10.1016/j.
apgeochem.2019.03.022

Tobler, W. R. (1970). A computer movie simulating urban 
growth in the Detroit region. Economic Geography, 
46, 234–240. https://doi.org/10.2307/143141

Ulak, M. B., Ozguven, E. E., Vanli, O. A., & Horner, M. 
W. (2019). Exploring alternative spatial weights to 
detect crash hotspots. Computers, Environment and 
Urban Systems, 78(November), 1–9. https://doi.
org/10.1016/j.compenvurbsys.2019.101398

Van Zoest, V. M., Stein, A., & Hoek, G. (2018). Outlier 
detection in urban air quality sensor networks. 
Water, Air, & Soil Pollution, 229, 1–13. https://doi.
org/10.1007/s11270-018-3756-7

Wang, S., & Serfling, R. (2018). On masking and 
swamping robustness of leading nonparametric 
outlier identifiers for multivariate data. Journal of 
Multivariate Analysis, 166(July), 32–49. https://doi.
org/10.1016/j.jmva.2018.02.003

Wang, Z. Q., Wang, S. K., Hong, T., & Wan, X. H. (2004). 
A spatial outlier detection algorithm based multi-
attributive correlation. In Proceedings of 2004 
International Conference on Machine Learning 
and Cybernetics (IEEE Cat. No.04EX826) (pp. 
1727–1732). IEEE. https://doi.org/10.1109/
ICMLC.2004.1382054

WHO. (2021). Informasi terbaru tentang Omicron. 
Retrieved from https://www.who.int/indonesia/
news/detail/30-11-2021-informasi-terbaru-tentang-
omicron

Wu, H., Tang, X., & Wang, Z. (2018). Probabilistic automatic 
outlier detection for surface air quality measurements 
from the China national environmental monitoring 
network. Advances in Atmospheric Sciences, 35, 
1522–1532. https://doi.org/10.1007/s00376-018-
8067-9

Xia, H., An, W., Li, J., & Zhang, Z. (2022). Outlier 
knowledge management for extreme public 
health events: Understanding public opinions 
about COVID-19 based on microblog data. Socio-
Economic Planning Sciences, 80(March), 1–12. 
https://doi.org/10.1016/j.seps.2020.100941

Xiao, F., Wang, K., Hou, W., & Erten, O. (2020). Identifying 
geochemical anomaly through spatially anisotropic 
singularity mapping: A case study from silver-gold 
deposit in Pangxidong district, SE China. Journal 
of Geochemical Exploration, 210(March), 1–20. 
https://doi.org/10.1016/j.gexplo.2019.106453

IN
 P

RE
SS