(Microsoft Word - 129-137\323\315\321 \343\345\317\355)


Al-Khwarizmi Engineering Journal,Vol. 13, No. 1, P.P. 

 
Big-data Management using Map

Department of Computer

Email:

(Received 29

https://doi.org/10.22153/kej.2017.11.004

 
Abstract 

 
Database is characterized as an arrangement of data that is sorted out and disseminated in a way that allows the 

client to get to the data being put away in a simple and more helpful way

methods of data analytics may not be able to manage and process the large amount of data. In order to develop an 

efficient way of handling big-data, this work studies t

the cloud. This approach was evaluated using Hadoop server and applied on EEG Big

proposed approach showed clear enhancement 

reduction on response time. The obtained results provide EEG researchers and specialist with an easy and fast method 

of handling the EEG big data. 

 
Keywords: Big-data, Cloud Computing, Electroencephalogram

 
1. Introduction 
 

Database is characterized as an arrangement of 

data that is sorted out and disseminated in a way 

that allows the client to get to the data

away in a simple and more advantageous way. By 

utilizing this strategy, the client will be permitted 

to direct any sort of adjustments over a specific 

arrangement of data. There are diverse sorts of 

databases that are right now being utilized by kind 

of data that should be put away. Among one the 

most generally utilized databases that is being 

utilized to get to element data is the value

database [1]. The utilization of this database to 

recover dynamic data, for example, stock data, 

has been extraordinarily examined as of late. The 

other typical kind of database is known as 

illustrative database [2], which is used for social 

event and organizing static data, for instance, 

legitimate results or land comes about that don't 

much of the time change on a steady base.

Khwarizmi Engineering Journal,Vol. 13, No. 1, P.P. 129- 137 (2017) 

ement using Map Reduce on Cloud: Case study, EEG 

Images' Data 
 

Sahar Mahdie Klim 
Department of Computer Engineering / College of Engineering / Misan University

Email: sahar_mahdi@uomisan.edu.iq 

 
29 March 2016; accepted 7 November 2016) 

https://doi.org/10.22153/kej.2017.11.004 
 

Database is characterized as an arrangement of data that is sorted out and disseminated in a way that allows the 

away in a simple and more helpful way. However, in the era of big

methods of data analytics may not be able to manage and process the large amount of data. In order to develop an 

data, this work studies the use of Map-Reduce technique to handle big

the cloud. This approach was evaluated using Hadoop server and applied on EEG Big-data as a case study. The 

proposed approach showed clear enhancement for managing and processing the EEG Big-data with average of 50% 

reduction on response time. The obtained results provide EEG researchers and specialist with an easy and fast method 

Electroencephalogram, MapReduce, Hadoop,. 

Database is characterized as an arrangement of 

that is sorted out and disseminated in a way 

data being put 

away in a simple and more advantageous way. By 

utilizing this strategy, the client will be permitted 

to direct any sort of adjustments over a specific 

. There are diverse sorts of 

databases that are right now being utilized by kind 

that should be put away. Among one the 

most generally utilized databases that is being 

is the value-based 

utilization of this database to 

recover dynamic data, for example, stock data, 

has been extraordinarily examined as of late. The 

other typical kind of database is known as 

illustrative database [2], which is used for social 

a, for instance, 

legitimate results or land comes about that don't 

much of the time change on a steady base. 

The utilization of a database takes into account 

arranging a huge substance of data that are 

recovered from different assets in an expansive 

data framework known as data warehousing

This framework is actualized for a simple 

recovery of inquiry and investigating forms in a 

productive way, rather than getting to exchange 

forms. The data put away in the warehousing 

framework comprises of authentic 

for the most part recovered from verifiable 

other than different sources as well.

This framework works by isolating 

examination workload from exchange workload, 

with the goal that this would take into 

consideration arranging the data

different assets effectively. What's more, this 

framework capacities by enrolling 

Extraction, Transportation, Transforming and 

Loading (ETL) arrangement [4]. Moreover, other 

application procedures, for example, online 

diagnostic procedures and customer examination 

apparatuses take into consideration gathering the 

  
Al-Khwarizmi 

  Engineering  

Journal 

Case study, EEG 

Misan University 

Database is characterized as an arrangement of data that is sorted out and disseminated in a way that allows the 

. However, in the era of big-data the traditional 

methods of data analytics may not be able to manage and process the large amount of data. In order to develop an 

Reduce technique to handle big-data distributed on 

data as a case study. The 

data with average of 50% 

reduction on response time. The obtained results provide EEG researchers and specialist with an easy and fast method 

The utilization of a database takes into account 

arranging a huge substance of data that are 

recovered from different assets in an expansive 

ramework known as data warehousing [3]. 
This framework is actualized for a simple 

recovery of inquiry and investigating forms in a 

productive way, rather than getting to exchange 

put away in the warehousing 

framework comprises of authentic data, which are 

for the most part recovered from verifiable data, 

other than different sources as well. 

This framework works by isolating 

examination workload from exchange workload, 

with the goal that this would take into 

data selected from 

different assets effectively. What's more, this 

framework capacities by enrolling data through 

Extraction, Transportation, Transforming and 

Loading (ETL) arrangement [4]. Moreover, other 

application procedures, for example, online 

procedures and customer examination 

apparatuses take into consideration gathering the 


Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
130 

 
data and sending the accumulated data effortlessly 

to the business clients in a proficient way. 

Subsequently, the usage of data warehousing 

has hoisted the calculation inside the business 

sector and which to be sure has been created 

further by the execution of systems administration 

framework. This has added to updating the 

execution of an incredible measure of data in an 

auspicious productive way. The primary reason 

for data warehousing framework is to make a long 

haul stockpiling framework so as to be utilized by 

the individuals who require getting documented 

data for future reference [5]. 

In spite of the way that the foundation of 

databases takes into account a proficient and 

simple system of recovering data adequately, 

conventional databases have their own particular 

shortcoming focuses [6]. For instance, standard 

database and programming techniques were not 

prepared for being used to handle, mastermind 

and have control over an extensive volume of 

data, for instance, sorted out, semi-composed or 

unstructured data [7]. This opening prompts the 

headway of a wonder known as Big Data [8]. This 

term is utilized to delineate enormous dataset 

comprising of 4-V definitions: “Volume”, 

Velocity”, “Variety” and “Value”, (for example, 

electronic” restorative records, electrocardiogram” 

and biometrics information) [9]. 

In any case, these datasets were appeared to 

force an issue with capacity, representation and 

investigation [9]. Consequently, keeping in mind 

the end goal to settle this issue, new programming 

frameworks have been made to experience the 

issues connected with these datasets. The recently 

created programming frameworks are 

manufactured especially to arrive parallelism from 

huge accumulations of processing groups, instead 

of acquiring them from super PC. These figuring 

bunches comprise of ordinary procedures, it can 

connected to Ethernet connections or some other 

unobtrusive switches. These redesigned 

programming structures are frequently called 

“Distributed File System” (DFS) [10]. 

This framework presents bigger joins than the 

circle squares that is found in customary working 

frameworks. What's more, DFS gives an 

additional element of data replication to ensure 

against data disappointment, which for the most 

part happens because of the expansive dataset 

being entered. Subsequently, this has risen the 

need to develop the utilization of Map Reduce 

keeping in mind the end goal to precisely prepare 

Big-Data and concentrate the potential 

prescriptive and prescient Big-Data [11] [12]. 

Investigative is generally used to represent diverse 

purposes and strategies used to direct different 

procedure in a dataset. There are three sorts of 

examination. [11]:  

A) “Descriptive” investigation: A method that is 

utilized to subtract the dataset of premium and 

make typical reports that can be utilized to 

respond inquiries, for example, "what was the 

deal? What is the issue? What activities are 

required? ".  

B) “Predictive” investigation: As a reason for 

future data being given by illustrative systematic 

methodology, a prescient logical methodology has 

been created. This methodology utilizes the need 

of utilizing measurable models of the chronicled 

datasets with a specific end goal to figure more 

data about what's to come. Prescient logical 

methodology is a valuable methodology of noting 

inquiries like "why is this event? What will 

happen next?" this sort of prescient methodology 

depends on the how shut the data acquired as 

contrasted and other factual models. 

C) Prescriptive investigation: this kind of 

expository methodology includes the utilization of 

different data model, for example, multi variables 

incitement and recognizing the connection 

between different variables. This methodology is 

proficient for noting certain inquiries, for 

example, "What could happen if a specific 

situation is utilized? What is the most fitting 

situation to be actualized? "  

Elite registering approach [13] is a technique 

for organizing parallel pathways for running 

anomalous state application programs in a 

compelling, more tried and true and less repetitive 

way. This strategy applies for structures that work 

at a skimming motivation behind 1012 

consistently. Prevalent enrolling is utilized for 

handling convoluted issues and reasonably 

fulfilling research practices using bleeding edge 

PC workplaces, affectation and by executing 

distinctive preparing resources. 

The distinctions among lattice processing and 

disseminated registering frameworks are as per 

the following [14]: 

a) A disseminated figuring framework can 

deal with hundreds or a large number of PC 

framework, which are described of having a 

constrained access of preparing assets, for 

example, CPU, memory and capacity. Then again, 

the lattice processing framework concentrates 

more on the proficiency of coordinating 

heterogeneous frameworks with administration 

servers, stockpiling, ideal workload and system.  

 
Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
131 

 
b) A matrix registering framework is all the 

more particularly intended to improve calculation 

inside different authoritative spaces, which to a 

great degree varies from the customary conveyed 

figuring framework.  

For the greater part of figuring frameworks, a 

solitary processor is generally utilized alongside 

its principle memory, store and nearby plate. 

Beforehand, applications that called for parallel 

PCs with numerous processors and certain 

equipment projects were utilized. Be that as it 

may, the interest of utilizing extensive scale web 

administrations required the interest of all the 

more registering being done in light of 

establishment. What's more, registering hubs were 

appeared to significantly diminish the expense 

when contrasted with exceptional reason parallel 

machine. Along these lines, recently created 

registering offices have added to the improvement 

of more upgraded programming structures. The 

upside of using such structures is that it keeps the 

unwavering quality issues that may emerge when 

figuring equipment framework is made out of a 

great many free sections that could neglect to 

work appropriately [14]. 

Moreover, these systems can likewise handle 

parallelism, which may happen accidently. The 

group creates errands that can be observed by the 

expert hub, which is usually known as the 

NameNode [15], and it is accountable for piecing 

the data, duplicating it, bringing in the data to the 

circulated processing hubs, checking the status of 

the bunch and gathering/assembling the outcomes 

gained. Notwithstanding the advancement of DFS, 

other abnormal state programming has been 

composed.  

The most famous structure created is the 

MapReduce system. MapReduce is a typical 

programming system that is used for data serious 

application that are displayed by Google [16]. 

MapReduce assembles thoughts from different 

useful programming frameworks the especially 

characterizes Map and Reduce assignments so as 

to run colossal arrangements of appropriated data 

all the more proficiently. The usage of 

MapReduce permits a wide range of figurings on 

the expansive scale to be done on registering 

groups all the more viably and in a more capable 

manner that could diminish equipment 

disappointments amid the calculation procedure 

[17]. 

In this manner, the Hadoop made simpler to 

process substantial measure of data sequentially 

where it could read the entire envelope each one 

in turn [18]. While, in ordinary programming 

every document is perused and dissected 

independently. This will therefor takes time that 

longer and requires greater memory and limit. The 

“Hadoop” moreover saved attempt that is required 

in obtaining the code, as run of the mill 

programming requires to perform programming 

on the records. On the other hand the aide code is 

speedier and can be consolidated successfully. 

 
1.1. Problem Declaration 

 
As a result of the extension the data volume in 

the figuring space, a magnificent enthusiasm of 

realizing more beneficial technique that go for 

securing, separating and planning the data signal 

is principal. Electrophysiological data is 

noteworthy for investigation, management and 

clinical investigation in epilepsy and other 

essential issue. “EEG” data is a kind of 

biomedical sign dataset and clinical “Big-Data” 

that have "4 V" (volume, combination, speed and 

regard), which including more than 100 “multi-

channel” signal. Without a doubt, Which 

imperative to actualize successful methodology so 

as to oversee huge EEG datasets. One of the 

ordinarily utilized strategies as a part of dissecting 

data is Ensample Empirical Mode Decomposition 

(EEMD) strategy which help in breaking down 

EEG flags yet this technique takes longer time if 

data are broke down successively.  

This work should answer the underneath 

inquiries:  

• What does the MapReduce and Hadoop 

can offer as far as capacity and examination of 

EEG Big-Data? 

• How does the Hadoop vary from typical 

programming methods in preparing the EEG Big-

Data? 

 
1.2. The Solution Proposed 

 
In order to improve the adequacy of separating 

“EEG Big-data” for all the additionally 

sympathetic and straightforwardness of 

considering patient cases, “EEG Big-data” ought 

to be completed close by “Hadoop” by using 

MapReduce technique. The use of MapReduce 

and “Hadoop” on scattered structures, for 

instance, “Cloud Computing” can add to the basic 

advancement in clinical Big-Data planning and 

utilize. Likewise, it will present new open 

entryways in the creating time of “Big-Data” 

examination and enhance the aftereffect of clinical 

“EEG Big-Data” explanatory gadgets.  


Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
132 

 
The change of the “Hadoop” made it a great deal 

less requesting to consider and dismember 

immense measure the data in a favorable 

profitable way. The “Hadoop” executes the use of 

the “Map” and “Reduce” shapes, that let the 

designer to control the data that they are enthused 

about examination. Besides, Hadoop plays out the 

work speedier, where it takes each one of the 

archives and explore every one of them 

meanwhile, however in regular programming the 

records ought to be poor down freely. 

Additionally, “Hadoop” obtain the code in a 

speedier and compelling course as differentiated 

and the standard programming technique.  

Thusly, it is basic to outline the upsides of 

realizing the “Hadoop” in the figuring space of 

taking care of “EEG Big-data”, This procedure 

will agree to the clinical and EEG Signal get 

ready examiners, to procure their required 

information and accomplish their destinations 

swiftly and definitely, with slightest tries. In 

addition, this proposed system will make it 

straightforward to manage them “EEG” huge data 

in improved way.  

This search shed the light on the points of 

interest that the “Hadoop” and MapReduce had 

added to the preparing and the programming 

space. Furthermore, it used an example of “EEG 

Big-data” to be taken care of using “Hadoop” 

server in light of the MapReduce method and 

depict an overhauled strategy for data use and 

examination in a capable, brisk, and correct way. 

This accomplishment, evaluated the planned 

approach of using the MapReduce methodology 

on “Hadoop” server in light of flowed figuring 

structure, to set up the “EEG Big-data” to make it 

basic for clinical individuals and “EEG” signals 

investigators to recuperate the required data for 

their work and analyzes, absolutely with brief era. 

This planned system and they got contributed 

occurs, which will be showed up in later ranges, 

could be considered as a distress in the Clinical 

field specially the “EEG” signals taking care of. 

 
1.3. Paper Organization 

 
Next to this section, the second section will 

specify foundation focused on the issue 

understudy. The third section will the talk about 

the instruments, approach and methodology that 

are utilized as a part of this study and their choice 

criteria. The proposed approach and the 

assessment test will be examined in the fourth 

area. In addition, the fifth area will contain the 

outcomes got from the proposed approach and 

their examination notwithstanding correlations 

amongst the and the nearest related work, with a 

specific end goal to assess the commitment and 

upgrades that this study has advertised. At last yet 

essentially, the 6thsection will abridge the work 

demonstrate the conclusions got from this study. 

 
2. Evolution of Big-data 
  

In the course of recent years, the growing 

usage of using web organizing, new 

improvements, for instance, the possibility of 

“Internet of Things” (IoT) make huge measure of 

information that can yield extremely valuable data 

if fittingly managed. It is called as “Big Data”. 

These datasets portray voluminous measure of 

composed, semi sorted out and unstructured 

information.  

Along these lines, “Big Data” has created to 

restrain the value, arrangement and velocity of 

datasets; in view of that reality, the conventional 

advances of planning information were not able. 

This is by virtue of it divided, hard to reach, 

excessively immense, move too fast, and 

unnecessarily trapped. Pros evaluate that 80 to 90 

of affiliations' information are unstructured and 

the measure of unstructured information is turning 

out to be rapidly and much of the time usually 

than composed database. Therefore, routine 

programming methods, for example, “Relational 

Database Management System (RDBMS)”, and 

“SQL” were ended up being too progressively to 

set up the volume of information in favorable path 

since they cannot oversee Petabyte and Exabyte of 

information.  

Dataset is continually updating and making 

new information that they ought to be readied, the 

more quickly of information is being created, the 

more basic to have a base that can scale to process 

information as quick as it is changed and “Big 

Data” consolidate unstructured information that it 

doesn't stay in line segment of database. 

 
3. Proposed Methodology 
 

The used Electrophysiological information for 

this examination was gotten from, which record 

for patients inside the five days time of their 

specialist's office stay and release multi-channel 

dataset. Remembering the true objective to record 

electrophysiological information, 30-40-channels 

are regularly included. Each of these channels is 


Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
133 

 
made out of 20 “EEG”, 4 “EKG”, 1 channel for 

oxygen checking, 2 occupies are incorporated into 

respiratory hailing and 1 is used for watching 

circulatory strain. Around, 20 EDF records are 

exposed for each patient inside the 5 days cross 

period. Each EDF report is made out of 

information record and each information record 

contains seconds of sign demonstrated by a 

particular number of tests are showed up in record 

header and chronicled as Cloud Wave Metadata 

[19].  

In the aide, the information is changed over 

into an once-over and that in truth could save time 

while differentiating qualities and foresee delay in 

the run time as it has a record. In the wake of 

changing over the information to an once-over, 

the product designer would have the option of 

picking the information that are required. Along 

these lines, the aide has diverse components, 

where it has a record, change over the substance 

to once-over and make the measure of the 

information more diminutive concurring the 

information which required. When the customer 

needs to intensify the range of the information, the 

cursor ought to be moved. Regardless, this 

movement is not any more required as “JAVA” 

can use to oversee content records that might be 

segregated by commas and space and a while later 

these information can carry on to the rundown by 

changing over the substance archive to an aide. 

By then, when moving to the diminishing or 

count, the customer can make the code that is 

required on the once-over that was made. The 

summary does not have a cursor since it is 

considered as a document that starts with zero and 

the once-over could be gathered into a key or a 

value.  

The transport of the information could be on 

more than one server so that if we rent from the 

“cloud” various servers, this information ought to 

be added to the cloud. By then, a request ought to 

be directed to accumulate the information inside 

the information coordinator. In this way, while 

selecting the aide decrease, the engineer needs to 

give the name of the particular yield and the way 

could be an IP deliver which is conferred to 

various diverse servers. The implies that required 

while coordinating an aide decrease are the 

underneath:  

 
1. Created database building 

2. Database inclusion. 

3. Dataset processing and looping.  

4. Building the guide work, which reduces that 

we did.  

 
There were particular classes. The principal class, 

that is the overall public class. The static class, 

which is fanned and rest mapper that reduces 

stage from “JAVA”. The product design needs to 

choose the kind of the information and execute is 

to the mapper. 

 
4. The Experiment of Evaluation 

 
“Hadoop” is an engine that considers running 

techniques simply in a brief period. As a product 

engineer or an architect, it is fundamental to know 

the establishment of the “Hadoop”; which 

suggests the spot of where each one of the records 

will be secured at. Each one of the information 

can be secured in H-work coordinator and in case 

that the designer ne.eds to cha.nge the n.ame of the 

record, the earth route n.eed to cha.nge. In order to 

chan.ge the earth way, the course of the “Hadoop” 

should be changed from windows settings. By 

then, to turn the path on a database, the database 

that the architect need to manage should have the 

sa.me st .ructure to create an authentic test on it. By 

then, remembering the ultimate objective to make 

the aide, the product build needs to consider the 

structure of the database, which could be a 

substance record, picture or some other course of 

action. 

The following stride that should be done is 

determining the course that I have for putting 

away every one of the data inside an data 

organizer. While managing the Hadoop, every one 

of the data that we have as a content record will 

be gathered and assembled bit by bit inside the 

info envelope. It could be conceivable that we 

would have more than one organizer however the 

greater part of the envelopes ought to be as a 

content and ought to have the same structure. This 

is significant as though the software engineer 

needs to build up a little program; the calculation 

that we need to create to break down the data 

needs steady structure. Along these lines, when 

we reach to the guide structure, every one of the 

data that enters that guide ought to have a 

consistent structure for more proficient 

investigation. In this manner, every one of the 

data in an organizer ought to have the same 

number of sections and the same data sort, while 

these datashould be composed in a specific 

arrangement. Case in point, if the software 

engineer takes a bit of data as a number and after 

that it shows up in one document as a content, this 

would bring about a noteworthy issue. At that 

point, the “Hadoop” will build the guide on every 

one of the documents that are found inside the 


Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
134 

 
info envelope, regardless of what number of 

organizers we have. 

Presently, we should assemble a guide with a 

specific end goal to do a file (rundown) of the 

data. To develop this progression, the content 

document should be changed over to information 

or list, as this progression will add to min.imizing 

the span of the data and dispose of the 

considerable number of data that are not required. 

For our situation, we just need to know the 

pa.tient's ID and the time that light was killed. In 

this manner, we are no more keen on the various 

data that are found in the rundown. 

Notwithstanding the way that the guide is steady, 

we will have the capacity to extricate the data that 

we require from it as indicated by the kind of data 

that we need to separate from the guide. The work 

on the last stride is to create the yield record, 

which has every one of the qualities that have 

been determined by the developer. For our 

situation, we picked the estimations of which time 

has been killed after 12 pm as it were. Inside the 

“Hadoop”, there is an order that gives the data as 

per its accessibility, we simply call attention to the 

data and the yield organizer that are found inside 

the “Hadoop”. It has been accounted for that the 

season of “CPU” for our d.ata was zero because of 

the little size of the documents that we have.  

The beneath square graph compresses the 

strategy took after amid the execution procedure. 

 
Fig. 1. Proposed Approach Block Diagram 

 
The above blo.ck diagram contains the steps of 

the two phas.es, “Map” and “Reduce”. Then the 

next section will show the obtained outcome and 

comparison between the proposed approach and 

the previous one. 
 

5. Experimental Results 
 

In this search, we utilized the 

inputfiletemazepam_ effects_on_sleep.txt [24] 

keeping in mind the end goal to acquire the 

results.txt record. At that point, we utilized the 

Map-Reduce-EEG.java to lead the guide 

procedure and afterward create the diminish 

procedure keeping in mind the end goal to get the 

information that are of our fundamental attention. 

We were occupied with acquiring the information, 

where the time of lights turn off after 05:00AM. 

The data underneath demonstrates the Java file, 

Start Map 

phase 

EEG Big-data 

Entry 

Data Conversion 

Divide and 

Distribute Data on 

Multiple Servers 

Data ordering 

Data compression 

End Map phase. Go 

to Reduce Phase 

Start Reduce 

Phase 

Enter Query 

Retrieve Data 

Select Suitable 

Data 

List Selected Data 

Load Requested 

Data 

End  


Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
135 

Map-Reduce-EEG.java. The following is the 

“JAVA” assessment c.ode test for the trial. 
 
 
5.1. The Function of The Map 
 

That show the evaluation code for the map 

ph.ase. 

 
public void MapEEG (LongWritable key, Text 

value, O.utputCollector<Text, Text> output, 

Reporter reporter) 

throwsIOException { String roweeg = 

value.toString(); String curser = null; String 

Tokenizereeg = new StringTok.enizer(line,"\t"); 

String var_agent= s.nextToken(); 

While (eeg.hasMoreTokens()) 

{curser= eeg.nextToken();}  

 String timeOfSleep =curser; ou.tput.collect(new 

Text(timeOfSleep), new Text(timeOfSleep)); }  

 
5.2. Reducer Class  

 
That show the “JAVA” code for the reduce ph.ase. 
 

public st .atic class ReduceEEG extends 

M.apReduceBase implements Reducer< Text, 

Text, Text, Text> 

{ public void reduceRRG( Text key, Iterator 

<Text> values, Ou.tputCollector<Text,Text> 

output, Reporter reporter)  

throwsIOException 

{while (values.hasNext())  

{ TextSec=va.lues.next();LocalTimeLocTim= new 

LocalTime(sec ); 

if(LocTim.isAfter(LocalTime.MORNING)) {  

output.collect(key, new Text(LocTim));  

               }}} 

As indicated by the “Hadoop” capacity above, we 

have initially extricated the JAVA document that 

has every one of the records and data in it. At that 

point, we led the guide procedure to change over 

every one of the data to rundown that has record 

and in this way, making the correlation procedure 

simpler that it is utilized to be in typical 

programming. At that point, in the lessen 

procedure; we picked the qualities that the light 

turned off after 05.00AM. The string line quality 

will be put away in the content document and it 

will take the last esteem token. The line will 

continue moving to the following worth token and 

after that give the estimation of the last token, 

where it generally take the last esteem and store it 

in “java” record. The underneath figures 

demonstrates the response time about in the wake 

of making the query utilizing the proposed 

strategy and contrasting it and the old data 

structure technique. 

 
Fig. 2. Experimental Results Request Time. 

 
Fig. 3. Experimental Results Hit Rates vs Miss 

Rates 

 
The previous figures demonstrates the pattern 

of the reply time for the experimental results, 

demonstrating the improved reply time of the 

proposed ap.proach in correlation with the 

conventional strategy for data structure.  

The previous results demonstrate clear 

enhancement on the reply time and precision of 

the retrieved data, with normal general 

improvement with average response time of 0.36s, 

and average hit rate of 93.33%. On the other hand, 

the old data structure method showed average 

response time of 2.37s and average hit rate of 

83.30%. To have approximately 65% 

enhancement on the response time and 12% 

enhancement on the accuracy and hit rate. Which 

is great contribution, especially when dealing with 

big data that contains huge about of stored and 

distributed data. 

 
Sahar Mahdie Klim                       Al-Khwarizmi Engineering Journal, Vol. 13, No. 1, P.P. 129- 137 (2017) 

 
136 

 
6. Conclusions  
 

Hadoop is an engine that works alongside the 

MapReduce with a specific end goal to help the 

software engineer or designer to acquire the 

information that they require and bar the various 

data that they need not bother with. In any case, 

Hadoop needs a server with excellent attributes to 

create the work and every one of the procedures 

that we have to perform. In this study, we utilized 

the utilization of the MapReduce strategy on 

Hadoop server to process EEG big data and make 

the dispersion on cloud. Accordingly, this has 

made it less demanding for examination and 

getting that information of concern.The analysis 

were actualized on Hadoop, clear upgraded results 

were acquired and contrasted and customary 

information structure strategy, which 

demonstrated a normal of half improvement over 

the conventional technique. 
 

7. References 
 
[1] Londhe, S. R., Mahajan, R. A., &Bhoyar, B. 

(2013). Overview on Methods for Mining High 

Utility Itemset from Transactional Database. 

International Journal of Scientific Engineering 

and Research (IJSER), 1(4), 12.  

[2] Adrian.A. Big Data Challenges. 2013. 
Database Systems Journal vol. IV, no. 3, 31-

40. 

[3] Krishnan, K. (2013). Data warehousing in the 
age of big data. Newnes.  

[4] Jin, Q., Liao, H., Srinivasan, S., &Xu, L. 
(2014). U.S. Patent No. 8,903,762. 

Washington, DC: U.S. Patent and Trademark 

Office.  

[5] Cloud security alliance. 2013. Big Data 
Analytics for Security Intelligence. 1-22. 

[6] Hernandez, M. J. (2013). Database design for 
mere mortals: a hands-on guide to relational 

database design. Pearson Education.  

[7] Barbulescu, M., Grigoriu, R., Halcu, I., 
Neculoiu, G., Sandulescu, V. C., Marinescu, 

M., &Marinescu, V. (2013, January). 

Integrating of structured, semi-structured and 

unstructured data in natural and build 

environmental engineering. In Roedunet 

International Conference (RoEduNet), 2013 

11th (pp. 1-4). IEEE.  

[8] Mayer-Schönberger, V., &Cukier, K. (2013). 
Big data: A revolution that will transform how 

we live, work, and think. Houghton Mifflin 

Harcourt.  

[9] Birke, R., Bjoerkqvist, M., Chen, L. Y., 
Smirni, E., &Engbersen, T. (2014). (Big) data 

in a virtualized world: volume, velocity, and 

variety in cloud datacenters. In Proceedings of 

the 12th USENIX Conference on File and 

Storage Technologies (FAST 14) (pp. 177-

189).  

[10] Vashist, S., & Gupta, A. (2014). A Review 
on Distributed File System and Its 

Applications. International Journal of 

Advanced Research in Computer Science, 5 

(7).  

[11] Das. T, Kumar. M. BIG Data Analytics: A 
Framework for Unstructured Data Analysis. 

2013. Vol 5, 0975-4024. 

[12] Lee, K. H., Lee, Y. J., Choi, H., Chung, Y. 
D., & Moon, B. (2012). Parallel data 

processing with MapReduce: a survey. 

AcMsIGMoD Record, 40(4), 11-20.  

[13] Bryan, B. A. (2013). High-performance 
computing tools for the integrated assessment 

and modelling of social–ecological systems. 

Environmental Modelling & Software, 39, 

295-303.  

[14] Pourqasem. J, Karimi. S and Edalatpanah. S. 
Comparison of Cloud and Grid Computing. 

(2014). American Journal of Software 

Engineering, Vol. 2(1), 8-12. 

[15] Mukhopadhyay, D., Agrawal, C., Maru, D., 
Yedale, P., &Gadekar, P. (2014, December). 

Addressing Name Node Scalability Issue in 

Hadoop Distributed File System Using Cache 

Approach. In Information Technology 

(ICIT), 2014 International Conference on 

(pp. 321-326). IEEE.  

[16] Maniar, K. B., & Khatri, C. B. (2014). Data 
Science: Bigtable, Mapreduce and Google 

File System. International Journal of 

Computer Trends and Technology (IJCTT), 

16(03), 115-118.  

[17] Vadivel.M&Raghunath.V. 2014. Enhancing 
Map-Reduce Framework for Bigdata with 

Hierarchical Clustering. International Journal 

of Innovative Research in Computer and 

Communication Engineering,  2 (1), 2320-

9801.   

[18] Raghupathi. W &Raghupathi. V. 2014. Big 
data analytics in healthcare: promise and 

potential. Health information science and 

systems, 2 (3), 1-10. 

[19] Tsai. C, Lai. C, Chao. H, Vasilakos. 2015.  
A. Big data analytics: a survey, Journal of 

Big Data, 1-32. 

 
 	��ي ����                                                          ��     	����� 1، ا���د�13�� ا���ارز	� ا������� ا�� ،2017( 137-129( 
 

137 

 
����
$�$�س "�! ا���� � ا�������)��ام &%��� ا����ت ا�(��� +�درا�� .���، : إدارة ا� �
�ت ��ر ا���11�ت ا�)���0 ا��/ �+��� 

  
 	��ي ������  
 ھ�
�	 ا������ت�� /	�
  ����ن ����	/ ���	 ا���

��و�����  sahar_mahdi@uomisan.edu.iq:ا����
 ا
  

�� ا��2
 

م "��%&%ل إ���� "!�� 	 ���	 و�����	 (��. �4 ,�3 ا������ت . ��,
ة ا������ت، +��ف "(/(%,	 �. ا������ت ا�(�+�	 وا�(%ز,	 "!�� 	 +�(* ��(��
 ا���56 �. ا������ت��!�: إدارة ھ7ا ا��+ � +!%�� ط�� 	 ���4	 �4 إدارة ھ. ا�>)(	، ا�!�ق ا� 
�(	 �4 إدارة ا������ت �� ��ھ7ا . 7ه ا������ت ا�>)(	�


ام + ��	 ا�(�"��
�%س �4 ����/	 ا������ت ا�>)(	 ا�(%ز,	 ,�? ا����"	 ا�(�%��	(�
رس ا�� Aا��� . 
ام �Cدم ا���دوب و+(���" ��)�� + + 	�� �ھ7ه ا�
	��D 	را�
� ،�E��
5 أظ��ت +��. واI* ,�? إدارة و����/	 ا������ت ا�!�� 	 ا�( 
�	 �4 ھ7ا ا��(. +!�� �� ,�? "����ت �CH%ذة �. &%ر ا��)!�F ا�

���د�	 �4 ����/	 وإدارة �OP ا������ت% ٥٠"(�
ل +���. ���وي . ا�>)(	,�وھ% �� �%�4 ط�� 	 ���	 و����	 �4 ا�����5 �: ا������ت . � �ر�	 "��!�ق ا
 . ا�>)(