INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 12(2):276-290, April 2017. A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán Fredi Palominos*, Hernan Díaz, Lucio Cañete, Claudia Durán Departaments: Mathematics, Biology, Industrial Technology and Industrial Engineering. Universidad de Santiago de Chile Avda. Bernardo O’Higgins 3363, Santiago, Chili. (fredi.palominos, hernan.diaz, lucio.canete, claudia.duran)@usach.cl *Corresponding author: fredi.palominos@usach.cl Felisa Córdova Director at Engineering School Finis Terrae University felisa.cordova@gmail.com Abstract: The proliferation and popularization of new instruments for measuring different types of electrophysiological variables have generated the need to store huge volumes of information, corresponding to the records obtained by applying this instru- ments on experimental subjects. Together with this must be added the data derived from the analysis and purification processes. Moreover, several stages involved in the processing of data is associated with one or more specific methods related to the area of research and to the treatment at which the base information (RAW) is subjected. As a result of this and with the passage of time, various problems occur, which are the most obvious consequence of that data and metadata derived from the treatment processes and analysis and can end up accumulating and requiring more storage space than the base data. In addition, the enormous amount of information, as it increases over time, can lead to the loss of the link between the processed data, the methods of treatment used, and the analysis performed so that eventually all becomes simply a huge repository of biometric data, devoid of meaning and sense. This paper presents an approach founded on a data model that can adequately handle different types of chronologies of physiological and emotional information, ensuring confidentiality of in- formation according to the experimental protocols and relevant ethical requirements, linking the information with the methods of treatment used and the technical and scientific documents derived from the analysis. Consequently, the need to generate specific data model is justified by the fact that the tools currently associated with the storage of large volumes of information are not able to take care of the semantic elements that make up the metadata and information relating to the analysis of base records of physiological information. This work is an extension of our paper [25].a Keywords: data models; big data; EEG data organization; physiological informa- tion, metadata. aReprinted (partial) and extended, with permission based on License Number 3947080516854 [2016] ©IEEE, from "Computers Communications and Control (ICCCC), 2016 6th International Conference on". 1 Introduction This paper is an extension of [25] which delves into the specification of the different types of relevant information on the EEG records, the mathematical formalization of data and interactions expands, and indicators to quantify and predict storage requirements are proposed. Understanding the mechanisms of human reasoning and brain functioning is a central topic in neuroscience [16] [17] [18] [30] . Technological development has multiplied alternatives of Copyright © 2006-2017 by CCC Publications A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 277 technological tools to record brain activity. In turn, the development of computers has created new opportunities for processing and analyzing the data through different tools [19] [20] [21]. Moreover, the greater availability and lower prices of technology allow researchers to access many instrumental that was previously reserved for medical specialists, allowing researchers also raised new objectives and scientific concerns. As a result, in addition to the generation of new knowledge, there has been a huge growth in the quantity of EEG records, generating huge volumes of research and clinical data, in centers worldwide. The enormous size of these records, evidence that it is a collection of large volumes of data in the field of big data, which to be registered in computer systems, requires enormous storage media, which escape the most feature traditional computer systems. The origin of the electroencephalogram, an instrument that allows the capture of signals known as EEG, date back to 1875 when Richard Caton first detected electrical activity on the brain surface of animals [22]. However, the expanding use of this technology beyond the traditional medical field occurs only in the last decades together with new quantitative approaches (qEEG) allowing a deeper data comprehension beyond qualitative characterizations. [23]. The potential to use this technology spans multiple areas, such as neuromarketing [24], edu- cation [26], psychology [27], labor relations [28] and work [29] . 1.1 The EEG Data and theirs specificities Electroencephalographic data (EEG) are the record obtained by measuring brain activity, using the instrument called electroencephalographs. The measurements are obtained through a set of electrodes that are located at certain points of the human scalp, generally non-invasively, such that each electrode receives an analog signal which is then digitized by a computer. Because of the nature of the EEG data recording and digitization procedure, the temporal resolution of the signal will only be limited by the sample rate used in the experiment. While a good amount of knowledge has been gathered from an ample range of EEG frequency span (mainly between 0,5 and 30 Hz), which mean between the EEG wave ranges of delta and beta, the gamma band (>30 Hz) has been scarcely explored. To have a look at gamma frequencies it is necessary to increase the sample rate up to 256 Hz to have a confident resolution measuring brain phenomena that occur at 128 Hz maximum. If we want to go further, we only need to double the sample rate to 512 Hz to have a new maximum resolution of 256 Hz for the brain phenomena. Starting with a standard EEG configuration of sample rate 128 Hz, we can explore the initial part of the gamma oscillation between 30 Hz (the end of beta) until 64 Hz (the maximum resolution allowed by Nyquist theorem). By increasing our sample rate to 256 Hz we increased data file from 64 data points in a second to 128 data points. By setting the sample rate to 512 Hz, we ended with 256 data points per second. The fractal nature of the EEG signal makes it only finite when our technology to study it fails to go beyond. To reach real time performances in brain-machine communication or to study very precise stimulus-response experiments, sample data must be recorded at thousands of Hz with the pur- pose of seeing deeper into the different time-dimensions of the processes happening in the brain. The application of ECG in humans, although it is conditioned to the experimental protocols of each investigation, which has management, storage and processing of data concerned, have common characteristics. Sequences series of values recorded during application of an electroen- cephalogram (EEG), can be described as a set of analog sequences, called channels, such that each channel corresponds to record one of the electrodes of the electroencephalograph. The num- ber of channels varies according to the instrument. The duration of each record depends on the experimental design and conditions. However, each channel will have the same duration as the 278 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán remaining channels of the RAW register. The storage format may vary depending on the instrument type. In the case of the data corresponding to this job, they are stored in EDF format [1]. The length of each record should be as the experiment or clinical study established. Due to the nature of brain activity, and the scope of the EEG technology, RAW records are not completely clean and can be affected by different perturbations. In the clinical field are called artifacts (residual noise from the instruments, interactions between channels, and twitching or muscle movements). 1.2 Problems on the EGG Data Management Regarding the different factors related to storage, organization, processing and analysis of EEG records, it is clear that this is a complex problem of data management. The constant and often excessive amount of growth data and great accumulation of metadata generated in the processing and analysis of raw records. Loss of the meaning of information as a result of growth in the amount of data that over time hidden links between data, data processing, and results. Many physiological data should be organized according to the time in which they originate, whether as instrumental records or as a result of processing. Therefore, proper management of the timestamps associated with the data is required. In order not to incur losses of information and meaning, it is required to manage data in conjunction with information on related research, researchers, experimental people, derived data and results. Moreover, it is necessary to ensure the confidentiality of information. Finally, it is necessary to record information about the nature of the data, their growth potential, the way they have been processed and the results of the analyses performed. 1.3 The EEG records and their relation to the area of Big Data Current alternatives to storing and distributing data from the scope of Big Data are still weak in their ability to include metadata and meaning. ’Big Data refers to enormous amounts of unstructured data produced by high-performance applications falling in a wide and heterogeneous family of application scenarios’ [7] . On the other hand, the problem of big data can be discerned as two distinct problems: Big data collections and Big Data objects [8] . ’There are three fundamental issue areas that need to be addressed in dealing with big data: storage issues, management issues, and processing issues’ [12] and Big Data features can be summarized as follows: Data volume; Data velocity; Data variety; Data Value and Complexity [12] [10] . The problems in the area of Big Data arise mainly from the needs of scalability, the massive scale of the data, the heterogeneity of information, unstructured data and their distribution across multiple platforms. Resulting in problems of management, storage, portability and processing of information. Because data can be distributed across multiple sites, there are incompatibilities in the interfaces, in the definition and representation of data, which are often connected to the platform containing them. Furthermore, there are many metadata respect of the information and problems arise in the transmission of data over networks [8] . Moreover, "Big Data Also Brings About New Opportunities for discovering new values, Helps us to gain an in-depth understanding of the hidden values and incurs also new challenges, for example how to effectively organize and manage such datasets" [9]. Advances in information technology (IT), the rapid growth of so-called cloud computing and the Internet of Things (IoT) [9] have increased opportunities to generate and accumulate data, exceeding the capacity of researchers and technology companies to respond to this problem with a systematic and integrated solution. Until 2003, "five exabytes (1018 bytes) of data A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 279 were created by human. Today this amount of information is created in two days. Big Data requires a revolutionary step forward from traditional data analysis, characterized by its three main components: variety, velocity, and volume " [10] . "For solutions of permanent storage and management of large-scale datasets disordered, distributed file systems [24] and NoSQL [26] databases are good choices" [9]. McKinsey Global Institute [11] , the potential of Big Data is mainly in five sectors: Health- care, Public Sector, Retail, Manufacturing and Personal Location Data. Moreover, the research trends in the field of Big Data analytics, point to the heterogeneity of the data and subsequent incongruity that occurs in highly unstructured data; the scalability; the combination of RDBMS and NoSQL Database Query Optimization Systems Issues in HiveQ [7]. Experience with EEG records and derived information concerning this work shows that we are in the presence of a problem of massive data rather than a problem of Big Data, however, they share many of the characteristics and problems associated with big data. In fact, the volume of data not only increases but that each action generates new records or associated metadata that systematically increase storage needs. Moreover, when there is greater availability of instrumental EEG, the growth rate of the volume of data increases linearly in proportion to the increased availability of devices. Treatment, storage organization and EEG records and their analysis, share the problems inherent of Big Data, in several aspects: • There are differences between different instruments to capture EEG records and have a high degree of inconsistency. • The EEG data and its derivatives within the scope of those who is called Big Data Col- lections [8]. EGG data management is fully compatible with the integration of RDBMS, particularly in the management of metadata to facilitate analytical EEG records. • EEG records management and its derivatives is a data management problem high scalabil- ity. In many types of research, data usually are not discarded but remain stored for future reference, links or reviews. Although there storage formats for EEG records, such as the European Data Format (EDF), these records are rather unstructured nature. "In order to design meaningful analytics, it is mandatory that big data input sources are transformed into a suitable, structured format" [7]. 2 Stages in the treatment of EGG records The processing and analysis of EEG data go through different stages or states. Initially, after his capture, the EEG records untreated (called RAW) can contain different types of disturbances (artifacts), which should be identified and subtracted from the register, before applying methods of analysis. Then the treatment of previously released data, which will be subjected to different methods, create new records of information (in digital format), which are the input stage of analysis and preparation of scientific reports and clinical report. The different stages being experienced treatment EEG records can be seen in Figure 1. The following sections provide a brief description of each of the stages or states of the pro- cessing and analysis of EEG registrations. 2.1 Data cleaning The aim the cleaning process is the removal of artifacts [2]. The results of the data cleaning process have important effects on EEG records both in its duration as its size. During the 280 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán Figure 1: Steges in the treatment of EEG records process, the analog sequences captured in each of the channels (RAW) are converted to digital streams. Thus, the accuracy and relevance of the digital signal generated will depend on the method parameters that are applied in the purification process. There are different types of alternative processes for the cleaning of raw records Each EEG record may be subjected to a sequence of various cleaning methods. Some of these results will be discarded during treatment and others will be stored for later analysis. In order to properly interpret the results of the clearance records, you must record the sequence of the methods applied, and the specific values of the parameters that each of the methods was applied. In the case of EEG records considered in this work, methods and purification processes used are the following: • Importing Data: Due to the Emotiv EPOC instrument records the sensor signals in 14 useful channels (called: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 , AF4), a file format of European data format (EDF), the import is done through the built-in tool BIOSIG EEGLAB [4] software. Generally, you must import all channels, but there is the possibility of importing a few. • Epoch’s selection: It refers to select periods of records that contain events of interest to the investigation, in which people objects of study, they are subject to some kind of stimulus. At this stage, you can also perform other actions considered pre-processing such as changing the sampling rate, filter the data or re-reference them. 2.2 Processing and data exploration It is understood by data processing, to all actions that aim to inquire into the information contained in the records or derive new information from existing data. There are different methods and techniques that can be used at this stage [4]. The specific order in which they are used, the number of times that it was applied as well as its parameters, depend on the objectives of the research and the findings that researchers performed during the same. Processing methods according to their purpose and their effect on the data can be classified into: • Exploratory methods: Specifically, rather than processing methods they are techniques to explore the characteristics and peculiarities of each EEG record. Generally make use of the help of graphical tools that allow the display of one or more channels, thus forming a A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 281 better impression and understanding of the records. Including loading and channel display (load and view channel locations), ERP Plotting images, and Plotting Spectra and Maps Channel [4] are frequently mentioned. • Methods of processing and generation of macro data: It refers to those methods that act on the data to modify or to generate new data. Within this group are, for example, those for determining the baseline (to avoid skewed by the presence of low-frequency artifacts or analysis), ICA using data decomposition [2], work with ICA components, the decomposition time / frequency [4], etc. 3 Context of the capture, storage, processing, and analysis of EEG data Scientific and clinical studies that require registration, treatment, storage and analysis of EEG data, demand the articulation of investigative processes as well as different types of elements, such as clinical patients, volunteers, premises, equipment, technical staff, clinicians, and researchers (see figure 2). The relevant part of these activities corresponds to drawing conclusions: technical, scientific and clinical diagnostics, which must be properly documented. The intervention of so many different factors, not only generates abundant records of informa- tion of different nature but also requires an organization and storage that includes the different interactions between different components, in order to avoid levels of confusion and distancing from reality, a product the abundant data. This paper addresses the definition of a data model (see figure 4) that supports the devel- opment of an information system based on a storage strategy and organization that considers existing relevant interactions in the context of studies related to the EEG data. Figure 2: The context in which they exist and the EEG data is generated The following sections describe the conceptual data model developed to meet the requirements described previously. 3.1 Scientific/clinical studies, research and dependences In any context of treatment and analysis of EEG data, there will be planned investigative processes (projects) conducted by researchers or clinical staff specialized, which will be assigned to a set of physical units (clinics or laboratories), where all the experimental activities will be 282 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán conducted. The enclosures also house the instruments and computers with the necessary software for the treatment of captured EEG records. 3.2 Research experiences and records The projects require the development of various clinical or experimental activities, which are subject to strict protocols, which therefore must be developed, conducted and supervised by qualified researchers and clinicians. The result of these activities is the generation of several EEG records (RAW), which will be generated in each of the sessions which will be subjected patients or experimental subjects. Therefore, it comes to many EEG records, which require adequately linked to experiences in which they originate, the instruments used, laboratory personnel and dependencies in which they perform. 3.3 Dimensioning and predicting storage needs The storage, management, processing, and documentation of the results of processing and analysis of EEG records obtained from research and clinical studies, requires a hardware platform that, in addition to processing power, ensure responses in appropriate time lapses and adequate storage management. Investigations that require storage of EEG records and derivates of them require computers able to process and manage files that may have average sizes of hundreds of megabytes. The type of study that originates the data, affect considerably on the number of records, duration, and size. In the medical field, the amount, duration, and size of EEG records is directly related to the pathology under study and its severity. Experts know beforehand the size and approximate duration of EEG records that are necessary to diagnose with a significant degree of reliability of the presence or absence of certain pathologies. However, in the field of new applications of EEG technology, which have opened to very dif- ferent areas outside of classical medicine, the problems have storage requirements and processing needs very variability. Any useful prediction requires the determination of significant variables that serve as the basis for accurate estimates. Suppose that S is the set of study areas related to EEG data and is IS the set of research (projects) linked around these areas, then let: • P(I): The set of all persons involved in a research I, I ∈ IS. • R(x,I): The set of EEG records captured a person x in an investigation I, I ∈ IS and x ∈ P(I). • RD(x,t,r): The set of records derived from treatment of a record EEG r obtained from a person x in research I, I ∈ IS, x ∈ P(I), and r ∈ R(x,I). If we denote by I(r) to all of the records included in a full investigation I, then the storage space of research related to the different areas of study contained in S, is given by the following expression: Z(S) = ∑ I∈IS ∑ x∈P(I) ∑ r∈R(x,I) (Z(r) + ∑ rd∈Rd(x,I,r) Z(rd)) (1) where Z(x) represents the size of an object x measured in MB. A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 283 To study this problem, It has been chosen a set of indicators whose behavior is expected to permit characterize the different types of studies and estimating storage needs requiring treat- ment and analysis of neurophysiological information. The indicators initially considered are as follows: Indicator Description MB Amount of storage required by RAW records and derivatives in each investigation. NS Amount of subjects involve in each investigation. NR Amount of records RAW captured in each investigation. ND Amount of (digital) derived records RAW records in each investigation. MS The total sum of the minute records captured in a research. NI Amount of reports generated in each investigation. Table 1: Indicators relating to a representative set of EEG investigations. 3.4 Organization of processing methods and reports One of the ultimate goals of the research is drawing conclusions and scientific / technical reports. This requires the systematic application of various methods of processing and analyzing the results, which will provide researchers with lights on the nature of the phenomena under study. The effectiveness and relevance of each method are directly related to its parameters and the order or sequence in which they are applied. In many cases, each method may require multiple applications to find a suitable parameterization for the problem, something that must be recorded in the study as a log for further review. The characterization of a processing method m any can be represented by m(n, ~V ), where n represents the number of parameters involved in the application of the method and ~V is a vector containing the formal description of parameters. Processing methods are applied to the original EEG records (RAW) or records arising from the application of previous methods (eg cleaning artifacts). The fact of applying a processing method on a specific EEG record r, can be represented by the cinchona (m,r,t,n,~v), such that the vector ~v contains the specific values of the parameters applied to register r at time t, which can give rise to a new data record r′. Thus, the full process a record EEG r can be represented as the set (m(ni, ~V )i,ri, ti,ni,~vi)/i =, 1 . . .k (2) We denote as M(r) a sequence of methods applied in a temporal sequence t1, . . . , tk over an r record. Given the research process, I involves the generation of one or more records for each indi- vidual, we represent by Mx(r) the set containing all sequences M(r) associated with all EEG records applied to the subject S. Finally, the results obtained from the analysis of all processing sequences contained in M(r) in a research I, give rise to many technical documents or scientific reports that should be properly cataloged and stored for any use present or future. Scientific and clinical reports contains the conclusions of the analysis made in the investigation and therefore should be linked to each of the records and data used in its construction. 284 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán Figure 3: Interaction between system components 4 Data model for the organization, storage and processing of large data sets EEG It needs to consider the context in which EEG records are recorded, processed and analyzed from multiple investigations; consider that the data from the processing of each of the records are in the presence of a problem of management of complex data, which also requires the coordination and integration of multiple sources of information and technologies. To keep the whole meaning of the information is necessary to include descriptive information concerning research, technical reports, and the conclusions drawn from the records and the various associations that occur between these entities. The EEGLAB [3] [4] tool has addressed this problem from the standpoint of ease of processing and exploration of the data but does not provide for integration into the information system of the context in which the records are generated. Neither includes information on technical documents with the conclusions drawn from his treatment. This paper presents a conceptual model for the organization, storage and processing of EEG records (under the Model Entity / Relationship [5]), which takes over the metadata associated with the problem. Considers context information where records are generated. It takes over the need to store the associations between records, treatment, scientific papers and technical reports, which contain the findings of the analysis developed. This proposal consists in adding information on the context in which data are generated to information concerning the organization, storage and processing of EEG records. This is achieved by a technology-based relational database component, which incorporates contextual information and semantic elements that give meaning to data volumes (see figure 4). Subsequently, on this platform may implement other applications, such as web-based interfaces or other tools that use the metadata system for maintaining and managing the links between different elements of information required to manage information EEG. 4.1 Scientific/clinical studies, research and dependences The data EEG focus of this study is captured in scientific or clinical studies during the development of properly planned processes (projects), conducted by researchers or clinical pro- fessional, in installations or laboratories that house the instruments required to capture data and experimentation. The projects are framed within these studies, and in turn, researchers or clini- cians may be linked to different projects. The diagram is shown in Figure 4 depicts this dynamic. In particular, it must be considered clearly states that all projects should be assigned to a field concerning clinical trials or scientific experience, which comprise a wide range of investigations. A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 285 4.2 Research, experimental situations, experimental subjects and EEG records Normally, an investigation involves making different experimental situations or activities, which can obtain new EEG records of experimental subjects, using various instruments. Then each EEG record, although will be associated with only one subject and a specific instrument can be used in other investigations. A similar situation applies to clinical records obtained from patients. This situation creates a set of EEG records, previously termed I (r), which can be represented as in Figure 4 by record entity. In research contexts addressed in this work, records obtained from the application of an individual experience, are always linked to a single instrument. However, you can simultaneously record multiple records across different instruments. This situation also has been reflected in Figure 4 as an aggregation (high-level entity) between Experience and Instruments entities. On the other hand, as shown in Figure 4, the association between a Research and high-level entity Experience/Instruments, is the type many to many, because an investigation can involve more than one pair Experience/Instruments, and in turn, each pair experience/instrument may be associated with multiple investigations. This relationship between Experience/Instrument and Research frequently applied to clinical studies where the same experimental situation, for example, diagnostic procedures are applied to multiple patients. On the other hand, people involved in an investigation or are subjected to a clinical procedure may eventually be required to participate in other research or studies. 4.3 Methods, sequences processing, and parameter settings. The systematic and successive application of various treatment methods on EEG records in RAW format or those derived from previous treatments is an essential feature of EEG data analysis. Though several treatments do not give useful results, many treatments if they are significant for research, so that should be recorded, including parametrization, identifying the set of input data, and the order and the moment in time they were applied. Consequently, in the exploration and processing of each EEG recording, you must register the application sequence of processing methods. The explorations on the results, all associated with a set of timestamps. Processing said the sequence of previously denoted by the expression (2). These timestamps introduce a natural chronological order that besides being a historical record of the treatment records, allows us to analyze the strategy analysis and correct it, if necessary. This situation it’s represented in Figure 4. The application of different parametrized methods results in the generation of new data records, which add to existing records. This situation is collected as a reflexive relationship called "originates", between the high-level entity called "applies" and the entity called Record. 4.4 Management of technical reports It is considered that, in general, an investigation involves different experimental situations, which are applied to different people. This creates a set of EEG records, previously has been called I(r). The processing of EEG records, either through exploratory methods or data processing allows obtaining derived information that is the basis of the conclusions contained in the technical and scientific reports. The preparation of the documents can take information from many records corresponding to a specific research process or multiple investigations, a reason that is it necessary to link the EEG records all the scientific/technical reports involved. 286 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán Figure 4: Conceptual model for the context of EEG data The biggest challenge in the treatment of biometric information is not only the need for storage space for RAW records or derivates but also store reports and other results containing the summary and details of the conclusions. 4.5 Overview of the model for the management, storage and processing of EEG records Integrating all requirements described in the preceding paragraphs, is represented in the conceptual data model is presented in figure 4. To simplify the figure, the attributes have not been incorporated into the model. This data model aims to implement a database (with documentary features), to incorporate contextual information and references to the physical location where the data is stored (URL). Specifically, the purpose of the database is as follows: (1) Integrate records and metadata to facilitate the analytic EEG records and derived data; (2) Establish a catalogue of EEG records, scientific and technical documents as well as from the investigation; (3) Allow the generation of a system capable of recording the activities of research centres and evolve if necessary; and (4) Allow the development of tools for analysis and processing of data, which are made in the context of the system, automatically integrating partial and final results of interest to research centres. To facilitate the implementation of the de model in any relational platform and therefore the portability of applications that make use of it, in figure 5 shows the standard logical design (DLS) of the conceptual model. A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 287 Figure 5: DLS for the conceptual model about context of EEG data Conclusions The growing needs to store large amounts of information relating to various areas of human and scientific work, has prompted different approaches and approaches related to the storage of information, is now called Big Data. Although these proposals address many of the problems associated with the storage, access, and distribution of information on computer media, they do not have sufficient tools to manage the semantic elements of the stored information, which is specific to each problem type. In the case of EEG records and technical and scientific reports, it is required of mechanisms for the organization, storage and processing of data, allowing the integration of the various elements of information contained in the records and mainly those generated from their treatment. Current developments in the area of storage [13] [14] provide an alternative for managing large volumes distributed in different storage media, which however are limited by the heterogeneity of the different types of data. Proposals made in this document provide a solution for these semantic constraints in the context of specific data, by integrating other technologies, such as database services (DaaS) [15] [7], which also allow manipulation of metadata and integrate different types of information, enabling the development of applications and interfaces that automate the recording and metadata generation. In addition, the technology database allows inheriting mechanisms to ensure the confidentiality of information regarding experimental subjects, the analysis, and conclusions of the records, research, and researchers. In this scenario the data model proposed is able to leverage solutions in the field of Big Data, incorporating semantic elements that enable researchers and technical staff not to lose control over data and new knowledge derived from them. The data model becomes a significant interface between the information processing methods, facilitating data analysis and generation of conclusions. Moreover, the increased availability of metadata and recording timestamps on different moments related to the processing, analyzing and drawing conclusions, promote more efficient use of computing resources by decreasing the need to reprocess data unnecessarily when those results are registered for the system level. Finally, the use of technologies of conventional databases available in the field of free software or proprietary contributes to the greater integration of information value in the new scenario underlying the current developments in the field of human-computer interfaces in science and medicine. 288 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán Acknowledgment This work was developed under the project "Definición, desarrollo e implementación de nuevos indicadores y procedimientos para la evaluación y seguimiento del rendimiento académico estu- diantil y el desempeño docente, en base información curricular, variables psico-neuro-cognitivas y herramientas basadas en matemática y lógica difusa", which receives funding from the Depart- ment of Scientific and Technological Research (DICYT) of the University of Santiago of Chile (USACH), Chile. Bibliography [1] European Data Format (EDF). [En línea], Disponible at http://www.edfplus.info/ [Ac- cesed: 17-nov-2015]. [2] T.-P. Jung et al. (1998); Removing electroencephalographic artifacts: comparison between ICA and PCA, Neural Networks for Signal Processing VIII, 1998. Proceedings of the 1998 IEEE Signal Processing Society Workshop, 63-72, DOI: 10.1109/NNSP.1998.710633 [3] A. Delorme, S. Makeig (2004); EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis, Journal of neuroscience methods, 134(1): 9-21, 2004. [4] EEGLAB Tutorial: Table of Contents, http://sccn.ucsd.edu/eeglab/eeglabtut.html [5] Chen Peter (1976), The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems, 1(1): 9-36. [6] E.F. Codd (1970); A relational model of data for large shared data banks, Communications of the ACM, 13(6):377-387. [7] A. Cuzzocrea, I.-Y. Song, y K. C. Davis (2011); Analytics over large-scale multidimensional data: the big data revolution, Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP, 101-104. [8] M. Cox, D. Ellsworth (1997); Managing big data for scientific visualization, ACM Siggraph, 97: 21-38. [9] M. Chen, S. Mao, Y. Liu (2014); Big Data: A Survey, Mobile Networks and Applications, 19(2): 171-209. [10] S. Sagiroglum, D. Sinanc (2013); Big data: A review, in Collaboration Technologies and Systems (CTS), 2013 International Conference on, 42-47. [11] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A.H. Byers (2011); Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, 2011. [12] S. Kaisler, F. Armour, J. A. Espinosa, W. Money (2013); Big Data: Issues and Challenges Moving Forward, 2013 46th Hawaii International Conference on System Sciences, 995-1004. [13] Zikopolous Paul, Deroos Dirk, Deutsch Tom, Lapis George (2012); Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill, 2012. [14] HPCC Systems. [Online]. https://hpccsystems.com/ A Solution for Problems in the Organization, Storage and Processing of Large Data Banks of Physiological Variables 289 [15] H. Hacigumus, B. Iyer, S. Mehrotra (2002); Providing database as a service, en Data Engi- neering, 2002 Proceedings. 18th International Conference on, 29-38. [16] B.B. Biswal et al. (2010); Toward discovery science of human brain function, Proceedings of the National Academy of Sciences, 107(10): 4734-4739. [17] A. Paul Alivisatos et al. (2012); The Brain Activity Map Project and the Challenge of Functional Connectomics, Neuron 74(6): 970-974, DOI: http://dx.doi.org/10.1016/j. neuron.2012.06.006 [18] Monti M. M., Parsons L. M., Osherson D. N. (2009): The boundaries of language and thought in deductive inference, Proceedings of the National Academy of Sciences, 106(30): 12554-12559. [19] Evan Heit (2015); Brain imaging, forward inference, and theories of reasoning, rontiers in Human Neuroscience, January 2015, Vol. 8, Article1056. [20] Vinod Goel, Brian Gold, Shitij Kapur (1997); Sylvain Houle, The seats of reason? An imaging study of deductive and inductive reasoning, NeuroReport, 8: 1305-1310. [21] Ranjit A. Thuraisingham, Georg A. Gottwald, On multiscale entropy analysis for physiolog- ical data, Physica A: Statistical Mechanics and its Applications, Volume 366, 1 July 2006, Pages 323-332. [22] Hass L.F. (2003); Hand Berger (1873-1941), Richard Caton (1842-1926), and elec- troencephalography, Journal of Neurology, Neurosurgery and Psychiatry, 74(1):9, doi:10.1136/jnnp.74.1.9. [23] Kececi H., Degirmenci Y. (2008); Quantitative EEG and cognitive evoked potentials in anemia, Clinical Neurophysiology, 38(2): 137-143, doi:10.1016/j.neucli.2008.01.004 [24] Rami N. Khushaba et al. (2013); Consumer neuroscience: Assessing the brain response to marketing stimuli using electroencephalogram (EEG) and eye tracking, Expert Systems with Applications, 40:803-3812. [25] Fredi E. Palominos, Hernan Diaz, Felisa M. Cordova, Cañete Lucio, Claudia A. Du- ran, Model for the organization, storage and processing of large data banks of physi- ological variables, Computers Communications and Control (ICCCC), 2016 6th Interna- tional Conference on, e-ISBN 978-1-5090-1735-5, IEEE Xplore,173-179, DOI: 10.1109/IC- CCC.2016.7496757 [26] Díaz, Hernán, L. Cañete, F. Palominos, C. Costa y F. Córdova (2012); Neurotechnologies for Education Improvement: Self-Knowledge After Opening the Black-box, Conference Proceed- ings of the International Symposium Research and Education in Innovation Era. ISREIE, 4th Edition Arad. Journal Plus Education, 8(2): 44 - 52. [27] Paul Howard-Jones et al. (2013); Neuroscience and Education: Issues and Opportunities, TLRP Teaching and Learning Research Program; ESRC Economic and Social Research Program, University of London, ISBN: 0-85473-741-3, 2013. [28] Maria Kozhevnikov (2007); Cognitive Styles in the Context of Modern Psychology: Toward an Integrated Framework of Cognitive Style, Psychological Bulletin, 133(3): 464-481. [29] Matthew D. Lieberman (2007); Social Cognitive Neuroscience: A Review of Core Processes, Annu. Rev. Psychol. 58:259-289. 290 F. Palominos, H. Díaz, F. Córdova, L. Cañete, C. Durán [30] Raja Parasuraman, George Mason, Glenn F. Wilson (2008); Putting the Brain to Work: Neuroergonomics Past, Present, and Future, Human Factors, 50(3): 468-474. DOI 10.1518/ 001872008X288349.