Microsoft Word - 15-3374_s1_ETASR_V10_N2_pp5423-5427


Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5423-5427 5423 
 

www.etasr.com Qamar et al.: EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data … 

 
EDAMS: Efficient Data Anonymization Model 

Selector for Privacy-Preserving Data Publishing 
 

Tehreem Qamar 

Department of Computer Science and 

Software Engineering 
Jinnah University for Women 

Karachi, Pakistan 
tehreem.qamar@juw.edu.pk 

Narmeen Zakaria Bawany 

Department of Computer Science and 

Software Engineering 
Jinnah University for Women 

Karachi, Pakistan 
nsb@juw.edu.pk 

Najeed Ahmed Khan 

Department of Computer Science & 

Information Technology 
NED University of Engineering & Technology 

Karachi, Pakistan 
najeed@neduet.edu.pk 

 
Abstract—The evolution of internet to the Internet of Things 

(IoT) gives an exponential rise to the data collection process. This 

drastic increase in the collection of person’s private information 
represents a serious threat to his/her privacy. Privacy Preserving 

Data Publishing (PPDP) is an area that provides a way of sharing 

data in their anonymized version, i.e. keeping the identity of a 

person undisclosed. Various anonymization models are available 

in the area of PPDP that guard privacy against numerous 

attacks. However, selecting the optimum model which balances 

utility and privacy is a challenging process. This study proposes 
an Efficient Data Anonymization Model Selector (EDAMS) for 

PPDP which generates an optimized anonymized dataset in terms 

of privacy and utility. EDAMS inputs the dataset with required 

parameters and produces its anonymized version by 

incorporating PPDP techniques while balancing utility and 

privacy. EDAMS is currently incorporating three PPDP 

techniques, namely k-anonymity, l-diversity, and t-closeness. It is 
tested against different variations of three datasets. The results 

are validated by testing each variation explicitly with the stated 

techniques. The results show the effectiveness of EDAMS by 
selecting the optimum model with minimal effort. 

Keywords-data anonymization; privacy-preserving data 

publishing; k-anonymity; l-diversity; t-closeness 

I. INTRODUCTION  

The advent of IoT, high processing speed hardware, and 
cloud storage with high bandwidth communication produces 
vast amounts of data which would be unthinkable a couple of 
decades ago. Due to these advancements, around 2.5 quintillion 
bytes of data are created each day [1]. Such huge production of 
information not only advances users’ quality of life, but also 
enhances various vital administrations. The data collection 
process is not governed by a single entity [2]. The applications 
used in order to perform daily routine activities efficiently are 
constantly saving, collecting, and tracking user data. Moreover, 
companies are encouraged to release their micro-data in order 
to facilitate data analysis that eventually supports providing 
new business opportunities [3, 4]. However, the release of 
micro-data results in tracking the public and private lives of 
concerned individuals, thus putting their privacy at risk [3, 5, 
6]. A typical data collecting and publishing scenario is depicted 
in Figure 1. In the data collection phase, data holders gather 

data from individuals, i.e. record owners (e.g. Ahmed, Haris, 
Laraib, Sana). In the publishing phase the data are provided to 
data recipients who can be data miners or other third parties 
that can make use of that data for their own purposes.  

 
Fig. 1.  Data collection and data publishing 

The published records may contain sensitive information 
[7-11]. To secure data owners’ privacy and to avoid data 
exploitation, eradicating identifiable attributes like name, 
address, telephone number, and social security numbers is a 
common practice prior to data release. However, this simplistic 
technique is not sufficient to guarantee the protection of record 
owners. Data publishing in a way that they contain no sensitive 
information and the privacy of record owners remains intact is 
termed as PPDP [7]. Typically, PPDP deals with publishing of 
data in an anonymized way, i.e. the data contain sensitive 
information but that information cannot be linked with its 
owner, while being still useful for the interested parties. 
Various methods have been proposed [12-15] for transforming 
data into their anonymized version. These methods differ in 
their capabilities of preventing linking of data owners which 
can eventually harm their privacy. There is no standard method 
for selecting a particular anonymization technique. Technique 
selection is highly dependent on the type of dataset and its 
sensitive attributes. The publisher has to anonymize data by 
using multiple techniques in order to select the most suitable. 
This is not only expensive in terms of time and resources but 
also requires sufficient knowledge in order to choose the 
appropriate method to convert the actual data into their 

Corresponding author: Tehreem Qamar


Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5423-5427 5424 
 

www.etasr.com Qamar et al.: EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data … 

 
anonymized version. Selection of an inappropriate method may 
cause data loss therefore it is necessary to select a method 
which could provide results at the optimum level of its utility 
with the least possible loss of data. Keeping in view the 
aforementioned problems, this study aims to propose a model 
that can identify the most suitable technique for anonymizing a 
certain dataset with minimum information loss. The main 
contributions of the current study are: 

• The development of a model that helps the data holder who 
has no particular knowledge of data anonymization 
techniques to release data anonymously. 

• The selection of the most appropriate method according to 
the nature of the respective dataset. 

• The generation of an anonymized dataset with least 
information loss and maximized utility. 

II. LITERATURE REVIEW 

Various real world attacks indicate the significance of 
preserving individual privacy when distributing personal 
information. Many times data released by companies for 
research purposes ended up with hurting individual privacy. 
The re-identification of individuals happens when they get 
linked with some other available external information is termed 
as linking attack [12]. Some reported incidents regarding 
released data that got linked with external information are 
summarized in Table I.  

TABLE I.  UNCIDENTS REPORTED AGAINST LINKING ATTACKS 

Privacy 

breach 
Dataset used Results 

[16] 

Health dataset 

from Washington 

State 

43% identification by linking the dataset 

with newspaper stories containing the 

word “hospitalized”. 

[17] 

Prescription data 

of South Korean 

residents 

100% individuals in the dataset were re-

identified. Data were encrypted prior 

release. 

[12] 

Medical records 

of state employees 

of Massachusetts 

Governor of Massachusetts was identified 

when the dataset was linked with the 

publicly available voter enrollment list. 

[18] 
Three month 

credit card records 

90% identification by analyzing buying 

patterns 

[9] AOL dataset 

One of the users was identified and 

interviewed by New York Times within 

three days of data release 

[19] Netflix dataset 
99% of records were identified with 8 

movie ratings 

 
Authors in [16] collected a health dataset from Washington 
State, which did not contain patient names. However, 43% of 
the individuals were successfully identified by linking the 
dataset with the newspaper stories containing the word 
“hospitalized”. Authors in [17] conducted experiments on the 
encrypted prescription data of 23,163 South Korean Resident 
Registration Numbers (RRNs). They claimed that they were 
able to re-identify 100% of the data and concluded that 
encrypted data are also vulnerable. Author in [12] described the 
re-identification of the dataset released by Group Insurance 
Commission (GIC) that included medical records of the state 
employees of Massachusetts and was intended to facilitate 
medical research. The dataset contained demographic data, for 

example, birth date, gender, and zip code. It was explained how 
easily William Weld (the then governor of Massachusetts) was 
identified by linking the Massachusetts voter enrollment list 
with the information given by GIC. Authors in [18] studied a 
credit card report of 3 months consisting of 1.1 million 
individuals and uniquely identified 90% of them via analyzing 
only four spatiotemporal points. They reported that the buying 
patterns with a use of a credit card make an individual’s 
privacy vulnerable. A similar incident has been reported in 
2006 when AOL released 20 million search queries of its users 
and within three days of its release one of its users was 
identified and interviewed by New York Times [9]. A few 
months later, Netflix also faced re-identification of its users in 
the dataset it released for the development of an accurate movie 
recommendation algorithm. The data were attacked by authors 
in [19], and they showed that external information can be 
linked to identify or to link the data with the respective 
individual. 

PPDP is a way of releasing anonymized data while 
preserving individual privacy [6]. In PPDP, the data are 
generally represented as a Table of Explicit Identifiers, Quasi 
Identifiers, Sensitive Attributes, and Non-Sensitive Attributes, 
where Explicit Identifiers is a set of attributes that explicitly 
identifies the individual, and Quasi Identifiers are those that 
could potentially identify the individual. Sensitive person-
specific information such as salary, real time location and 
disability status are considered as Sensitive Attributes while the 
term Non-Sensitive Attributes contains all attributes that do not 
fall into the previous three categories. Numerous techniques 
and models have been proposed in PPDP for producing 
anonymized data such as k-anonymity [12], l-diversity [13], 
and t-closeness [20], which have become the foundation of 
many other models [22-26] and are therefore used in EDAMS. 

III. THE EDAMS MODEL 

A. Preliminaries 

Let T be an original data table of the following form: 

T = {DI1, DI2,……, DIj, QI1, QI2,……, QIk, SA1, 
SA2,… SAn}  

where, DIs are Direct Identifiers, the attributes which should be 
removed prior data publishing, QIs are Quasi Identifiers, the 
non-sensitive attributes which when linked with external data 
can reveal the identity of a record owner, and SAs are Sensitive 
Attributes, the private information related to a record owner.  

B. Methodology 

The proposed data anonymization model initially makes use 
of k-anonymity, l-diversity and t-closeness as privacy models, 
and generalization and suppression as PPDP operations. The 
utility that guarantees the optimum information loss is 
Information Loss (ILoss) metric [27], which measures the loss 
of information by calculating the uncertainty that occurred in 
generalizing a value which relies upon how many other values 
cannot be distinguished from it. The overall anonymization 
process is depicted in Figure 2. 


Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5423-5427 5425 
 

www.etasr.com Qamar et al.: EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data … 

 
Fig. 2.  EDAMS’s data anonymization process 

The process comprises of 5 steps. In the first step, the 
original data are taken as input that clearly marks the DIs, QIs 
and SAs. After realizing the attribute’s nature, the sensitivity of 
the overall dataset is calculated. As the sensitivity is computed, 
the generalization hierarchy of the QIs is generated. And, on 
the basis of sensitivity of the dataset, the optimum privacy 
model is selected for its anonymized version. The sensitivity of 
the dataset is calculated by: 

Sensitivity	percentage �
	������	��	��	

������	��	��
	x	100    (1) 

If the sensitivity is 0 that means no sensitive attribute is 
present in the dataset and k-anonymity privacy model will be 
used. Applying k-anonymity requires the value of k to be used 
optimally because it is responsible for the utility ratio of the 
dataset [28]. EDAMS makes use of two PPDP operations, i.e. 
generalization and suppression. The generalization lattice is 
created for each QI. DI and the attributes that cannot be 
generalized will get suppressed in the resulting anonymized 
table. When the above two steps are completed then the 
optimum model is chosen on the basis of sensitivity. The 
information loss is calculated via ILoss metric [7] and the data 
holder will get the anonymized version of the data with least 
cost. Figure 3 depicts the applied algorithm. 

 
Fig. 3.  Data anonymization algorithm for EDAMS 

IV. EXPERIMENTS 

EDAMS is developed using Java that run on a 2.4GHz Intel 
Core i5 Processor with 6GB RAM. Three datasets with their 
customized versions were examined for the assessment of the 
model, namely UCI Adult dataset [29], Employee’s Salary 
dataset [30], and Crime Incident dataset [31] along with their 
different variations. Each dataset has been evaluated twice. 
Firstly with EDAMS and secondly with each method 
separately applied to it in order to get the optimal result in a 
process termed as Hit and Trial. Its results are shown in Table 
II. 

A. Case 1: Adult Dataset 

The dataset in [29] contains 30,162 records. It consists of 9 
attributes in total: sex, age, race, marital-status, education, 
native-country, work class, occupation, and salary-class. Three 
variants of this dataset were considered of having no DI. The 
first variation took all attributes as QIs. The second variation 
considered occupation as an SA and the rest of them as QIs, 
and the third variation included six of them as QIs and two of 
them, i.e. marital-status and occupation, as sensitive. 

1) Selection via EDAMS 

When dataset is taken as input to EDAMS, its sensitivity is 
calculated, i.e. the ratio of SA over Quasi Attributes. 
Considering the first variation, when there is no SA the 
sensitivity between SA and QI becomes 0% which means 
although the dataset has no direct sensitive information, it can 
serve as a tool for linking attacks. In this case, EDAMS 
suggested k-anonymity for the respective dataset with 
maximum information loss of 60%. Table II represents the 
chosen models with maximum information loss when the same 
procedure was applied to all of its variants.  

TABLE II.  RESULT OF ADULT DATASET VIA EDAMS 

DI QI SA 
Privacy 

model 

Sensitivity 

ratio 

Max. 

information loss 

0 9 0 k-anonymity 0% 60.37% 

0 8 1 l-diversity 11% 62.50% 

0 7 2 l-diversity 22% 57.14% 

 
2) Selection via Hit and Trial Model 

For the verification of the results obtained by EDAMS each 
variant of the dataset is tested with each privacy model in order 
to find the best suitable model for the respective dataset. The 
threshold values are selected from the lowest possible values to 
the values where change in threshold values does not affect the 
result. Considering the same variant, the data holder has to try 
each and every possible combination of different methods 
which demands a substantial amount of time. Table III depicts 
the results obtained from different combination of methods 
employed through hit and try model. 

TABLE III.  HIT AND TRIAL MODEL RESULTS 

Privacy model Threshold value Max. information loss 

k-anonymity 

3 17.08% 

5 57.31% 

7 57.31% 

l-diversity 

3 54% 

5 61% 

7 61% 

t-closeness 

0.02 100% 

0.2 91% 

0.8 54% 

k-anonymity and 

l-diversity 

k=3, l=2 54% 

k=5, l=3 61% 

k=5, l=7 61% 

k-anonymity with 

t-closeness 

k=3, t=0.002 100% 

k=5, t-0.2 91% 

k=5, t=0.8 61% 

k-anonymity with 

l-diversity and t-

closeness 

k=3, l=2, t=0.002 100% 

k=5, l=3, t=0.2 91% 

k=7, l=5 t=0.9 61% 


Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5423-5427 5426 
 

www.etasr.com Qamar et al.: EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data … 

 
It can be seen that minimum information loss occurs when 
k-anonymity model is applied. But the identification of this 
least information loss method became possible after trying each 
model and their combinations with different threshold values. 
However, the same model is recommended by the EDAMS 
without requiring any extra effort.  

B. Case 2: Employee’s Salary Dataset 

This dataset [30] contains 1,999 records and comprises on 
five attributes (name, gender, telephone number, zip code, 
salary). Two variations were created, in which two attributes 
(name and telephone number) were considered as DIs. The first 
variant considers the rest of the three attributes as QIs while the 
second variation considers salary as SA and rest of the two, i.e. 
gender and zip code as QI. 

1) Selection via EDAMS 

The process of selection of privacy model through EDAMS 
will remain the same for every dataset. Considering its second 
variation, there are two DIs and one SA. The DIs were 
removed out rightly from its anonymized version while l-
diversity was selected as the privacy model. Table IV shows its 
results. 

TABLE IV.  SELECTION THROUGH EDAMS 

DI QI SA 
Privacy 

model 

Sensitivity 

ratio 

Max. information 

loss 

2 3 0 k-anonymity 0% 8.12% 

2 2 1 l-diversity 20% 0.00% 
 

2) Selection via Hit and Trial Model 

Analyzing the same dataset yields the results shown in 
Table V. The two models are providing the same results, 
however one of them has already been suggested by EDAMS 
(l-diversity). 

TABLE V.  HIT AND TRIAL MODEL RESULTS 

Privacy model Threshold value 
Max. information 

loss 

k-anonymity 

3 8.11705% 

5 8.11705% 

7 8.11705% 

l-diversity 

3 0.00007% 

5 0.00007% 

7 0.00007% 

t-closeness 

0.02 100% 

0.2 100% 

0.8 0.0023% 

k-anonymity and l-diversity 

k=3, l=2 0.00007% 

k=5, l=3 0.00007% 

k=5, l=7 0.00007% 

k-anonymity with t-

closeness 

k=3, t=0.002 100% 

k=5, t-0.2 100% 

k=5, t=0.8 0.0023% 

k-anonymity with l-diversity 

and t-closeness 

k=3, l=2, t=0.002 100% 

k=5, l=3, t=0.2 100% 

k=7, l=5 t=0.02 100% 
 

C. Case 3: Crime Incident Dataset 

This dataset [31] contains a total of eight attributes, namely 
last name, first name, block, gender, race, date of birth, case 

number, and crime_code with 1,058 records. Four versions of 
this dataset were formed. Last name, first name, and date of 
birth served as DIs in the first two versions and the remaining 
five attributes were taken as QIs in the first variant while 
crime_code was taken as sensitive attribute in the second 
version and the rest as QIs. The third and fourth variant took 
only the first name and date of birth as DIs and the rest of the 
structure remained the same. 

1) Selection via EDAMS 

Analyzing its second variant there is one SA along with 
four QI and three DI. The sensitivity is calculated to 13% and l-
diversity is suggested by EDAMS. Table VII summarizes the 
results of all variants. 

TABLE VI.  SELECTION MODEL IN CRIME DATASET FROM EDAMS 

DI QI SA 
Privacy 

model 

Sensitivity 

ratio 

Max. 

information loss 

3 5 0 k-anonymity 0% 60.54% 

3 4 1 l-diversity 13% 55.68% 

2 6 0 k-anonymity 0% 15.26% 

2 5 1 l-diversity 13% 64.54% 

 
2) Selection via Hit and Trial Model 

Table VII shows the results in finding the appropriate 
method for the second variant of this dataset. It is evident from 
this example that EDAMS chose the most appropriate model 
required for the respective dataset. 

TABLE VII.  SELECTION FROM HIT AND TRIAL MODEL 

Privacy model Threshold value Max. information loss 

l-diversity 

3 55% 

5 75% 

7 75% 

t-closeness 

0.002 100% 

0.2 75% 

0.9 50% 

k-anonymity and l-

diversity 

k=3, l=2 100% 

k=5, l=3 100% 

k=5, l=7 100% 

k-anonymity and t-

closeness 

k=3, t=0.002 100% 

k=5, t=0.2 100% 

k=5, t=0.8 100% 

k-anonymity with 

l-diversity and t-

closeness 

k=3, l=2, t=0.002 100% 

k=5, l=3, t=0.2 100% 

k=7, l=5, t=0.9 100% 

 
V. DISCUSSION 

The cost of producing anonymized data via hit and trial 
model is high as there is no standard method for anonymizing 
data. Data holder has to keep checking different models over 
different thresholds to achieve data anonymity with greater 
utility. Moreover, absence of knowledge regarding privacy 
models makes it more difficult for the data holder to modify the 
data into their unidentified version. However, EDAMS is 
capable of selecting the appropriate model for the respective 
dataset by applying some initial effort thereby minimizing the 
overall cost with good efficiency. 


Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5423-5427 5427 
 

www.etasr.com Qamar et al.: EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data … 

 
VI. CONCLUSION AND FUTURE WORK 

Available vast data can provide immense benefits when 
analyzed carefully. Many companies are sharing their data for 
research or other purposes. However, the data are becoming 
highly personalized as everything becomes automated, thus the 
companies need to make the necessary arrangements to protect 
their clients’ privacy. PPDP is a promising approach that can 
be used to publish data while preserving individual privacy to a 
great extent. Many techniques are available in this domain for 
the generation of anonymized data but choosing one is a 
challenging decision. This study presented the data 
anonymization model selection tool EDAMS that is capable of 
generating anonymized data with minimal effort. EDAMS 
requires the dataset and the nature of attributes to proceed with 
the selection of the optimal method among k-anonymity, l-
diversity and t-closeness. The results were validated by 
applying the techniques separately one by one on the same 
datasets and the conclusion was that EDAMS efficiently selects 
the most appropriate method. 

PPDP is still in its development stage as the researchers are 
coming up with more efficient algorithms. EDAMS is currently 
providing limited anonymization algorithm selection, however 
it has the capability to work as a classifier when trained 
rigorously. As a result, it will be capable of anonymizing any 
type of data by selecting the most efficient algorithm. EDAMS 
is dealing with linking attacks using generalization and 
suppression as PPDP techniques and k-anonymity, l-diversity, 
and t-closeness as anonymization algorithms. However in the 
future it is planned to accommodate more anonymization 
techniques to protect individual privacy against probabilistic 
attacks. 

REFERENCES 

[1] B. Marr, “How much data do we create every day? The mind-blowing 
stats everyone should read”, available at: www.forbes.com/ 

sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-
the-mind-blowing-stats-everyone-should-read 

[2] R. Madge, “Five loopholes in the GDPR”, available at: 

medium.com/mydata/five-loopholes-in-the-gdpr-367443c4248b 

[3] J. Li, Y. Tao, X. Xiao, Preservation of proximity privacy in publishing 
numerical sensitive data, Chinese University of Hong Kong, 2008 

[4] L. Gomes, “Data analysis is creating new business opportunities”, 
available at: www.technologyreview.com/s/423897/data-analysis-is-

creating-new-business-opportunities 

[5] J. Liu, “Privacy preserving data publishing: Current status and new 
directions”, Information Technology Journal, Vol. 11, No. 1, pp. 1–8, 

2012 

[6] S. Chawla, C. Dwork, F. Mcsherry, A. Smith, H. Wee, “Toward privacy 
in public databases”, available at: www.microsoft.com/en-us/ 

research/wp-content/uploads/2016/02/tcc05-cdmsw.pdf, 1948 

[7] B. C. M. Fung, K. Wang, R. Chen, P. S. Yu, “Privacy-preserving data 
publishing: A survey of recent developments”, ACM Computing 

Surveys, Vol. 42, No. 4, Article ID 14, 2010 

[8] A. Anjum, N. Ahmad, S. U. R. Malik, S. Zubair, B. Shahzad, “An 
efficient approach for publishing microdata for multiple sensitive 

attributes”, The Journal of Supercomputing, Vol. 74, pp. 5127–5155, 
2018 

[9] M, Barbaro, T. Zeller Jr., “A face is exposed for AOL searcher no. 

4417749”, available at: www.nytimes.com/2006/08/09/technology/ 
09aol.html 

[10] D. Vatsalan, P. Christen, C. M. O ’Keefe, V. S. Verykios, “An 
evaluation framework for privacy-preserving record linkage”, Journal of 

Privacy and Confidentiality, Vol. 6, No. 1, pp. 35-75, 2014 

[11] K. E. Emam, S. Rodgers, B. Malin, “Anonymising and sharing 
individual patient data”, BMJ, Vol. 350, Article ID h1139, 2015 

[12] L. Sweeny, “k-anonymity: A model for protecting privacy”, 

International Journal of Uncertainty, Fuzziness and Knowledge-Based 
Systems, Vol. 10, No. 5, pp. 557–570, 2002 

[13] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, “l-
diversity: Privacy beyond k-anonymity”, ACM Transactions on 

Knowledge Discovery from Data, Vol. 1, No. 1, pp. 1-12, 2007 

[14] N. Li, T. Li, S. Venkatasubramanian, t-closeness: Privacy beyond k-
anonymity and l-diversity, CERIAS Tech Report 2007-78, 2007 

[15] J. Liu, K. Wang, “On optimal anonymization for l+-diversity”, IEEE 

26th International Conference on Data Engineering, Long Beach, USA, 
March 1-6, 2010 

[16] L. Sweeney, Matching known patients to health records in Washington 

state data, Harvard University, 2013 

[17] L. Sweeney, J. S. Yoo, “De-anonymizing South Korean resident 
registration numbers shared in prescription data”, Technology Science, 

Article ID 2015092901, 2015 

[18] Y. A. D. Montjoye, L. Radaelli, V. K. Singh, A. S. Pentland, “Unique in 
the shopping mall: On the reidentifiability of credit card metadata”, 

Science, Vol. 347, No. 6221, pp. 536–539, 2015 

[19] A. Narayanan, V. Shmatikov, “Robust de-anonymization of large sparse 

datasets”, IEEE Symposium on Security and Privacy, Oakland, USA, 
May 18-22, 2008 

[20] N. Li, T. Li, S. Venkatasubramanian, “Closeness: A new privacy 

measure for data publishing”, IEEE Transactions on Knowledge and 
Data Engineering, Vol. 22, No. 7, pp. 943–956, 2010 

[21] P. Samarati, L. Sweeney, “Generalizing data to provide anonymity when 

disclosing information”, 17th ACM SIGACT-SIGMOD-SIGART 
symposium on principles of database systems, Washington, USA, June, 

1998 

[22] Z. E. Ouazzani, H. E. Bakkali, “A new technique ensuring privacy in big 
data: Variable t-closeness for sensitive numerical attributes”, 3rd 

International Conference of Cloud Computing Technologies and 
Applications, Rabat, Morocco, October 24-26, 2017 

[23] S. S. Devi, R. Indhumathi, “A study on privacy-preserving approaches in 

online social network for data publishing”, in: Data Management, 
Analytics and Innovation, pp. 99–115, Springer, 2011 

[24] H. Zhu, H. B. Liang, L. Zhao, D. Y. Peng, L. Xiong, “τ-Safe (l,k)-

diversity privacy model for sequential publication with high utility”, 
IEEE Access, Vol. 7, pp. 687–701, 2018 

[25] A. S. M. T. Hasan, Q. Jiang, “A general framework for privacy 
preserving sequential data publishing”, 31st International Conference on 

Advanced Information Networking and Applications Workshop, Taipei, 
Taiwan, March 27-29, 2017 

[26] S. Hamid, N. Z. Bawany, S. Khan, “AcSIS: Authentication system based 

on image splicing”, Engineering, Technology & Applied Science 
Research, Vol. 9, No. 5, pp. 4808-4812, 2019 

[27] M. O. A. Dwairi, A. Y. Hendi, Z. A. AlQadi, “An efficient and highly 

secure technique to encrypt and decrypt color images”, Engineering, 
Technology & Applied Science Research, Vol. 9, No. 3, pp. 4165-4168, 

2019 

[28] A. H. A. Omari, “Lightweight dynamic crypto algorithm for next 
internet generation”, Engineering, Technology & Applied Science 

Research, Vol. 9, No. 3, pp. 4203-4208, 2019 

[29] UCI, Adult data set, available at: archive.ics.uci.edu/ml/datasets/adult 

[30] Employee Salary dataset, available at: www.kaggle.com/varungitboi/ 
employee-salary-dataset 

[31] Open Data Philly, Crime incidents, available at: www.opendataphilly. 
org/dataset/crime-incidents