International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 15, No. 13, 2021
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
ODK-X: From A Classic Process to A Smart Data
Collection Process
https://doi.org/10.3991/ijim.v15i13.22945
Iman Tikito ()
Mohammed V University, Rabat, Morocco
iman.tikito@gmail.com
Nissrine Souissi
Mines-Rabat School, Rabat, Morocco
Abstract—Data collection is one of the first and main phases of the data life
cycle. It enables improvements to be made across all phases of the data lifecycle.
In this sense, we have proposed a data collection process qualified as Smart. For
our smart data collection process, we have adopted the principles of the smart
data approach allowing less data to be transmitted to the analysis and storage
processes, while maintaining better data quality. In addition, we also used Edge
computing since it provides services with faster response and better quality, com-
pared to cloud computing. To experiment this process on mobile data, we propose
to extend a mobile data collection software solution and adopt one of the key data
collection methods. In this paper, we tested our smart data collection process via
the ODK-X software suite and were able to identify the added value of our pro-
cess compared to the one used by default during collection.
Keywords—Smart Collect, Mobile Data, ODK-X, Data collection, Edge com-
puting
1 Introduction
To experience the smart collection process [1] on mobile data, we come to imple-
ment it by extending a software solution for mobile data collection and adopt a data
collection method among the five key methods of data collection which are presented
with their advantages and disadvantages in article [2]. These five methods are: Surveys,
Interviews, Focus Groups, Observations, and Textual or Content Analysis. Thus, our
choice was based on the use of a survey to document perceptions, attitudes, and
knowledge within a clear and predetermined sample of individuals.
The implementation aims to demonstrate Smart collection process added value [3]
compared to a classic data collection process [4], [5]. We are particularly interested
here in the collection of mobile data. To minimize cost and workload while being effi-
cient in providing better data quality, we adopted Electronic Data Collection (EDC) to
establish the survey as the best approach in our case according to several articles [6],
[7], [8].
28 http://www.i-jim.org
https://doi.org/10.3991/ijim.v15i13.22945
mailto:iman.tikito@gmail.com
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
Depending on the space and quantity of units covered by the survey, gathering and
entering the data collected requires appropriate logistics [9], [10], [11]. Now with the
use of new technologies like smartphones and tablets in data collection, researchers has
no longer need to use paper formats for data collection. Electronic data collection sig-
nificantly reduces the time between collection and decision-making by eliminating the
step of entering the data collected. Thus, the data can be transmitted automatically to
databases once collected. Aside from saving time and money, electronic collection of
survey data allows real-time monitoring of the collection process progress [12].
There are a variety of tools and digital platforms that facilitate the collection of data
from smartphones and tablets; some are more effective than others. The study presented
in CartOng [13] raises 26 tools dedicated to electronic data collection. Through the
comparison between the described tools, choosing ODK-X as a tool is clearly the best
choice for our needs [14], [15].
The Open Data Kit is a community of developers, staff within institutions and or-
ganizations developing Open Data Kit for collect, manage and use data in resource-
limited settings. ODK is deployed as a collection of GitHub repositories that anyone
can use under the Open-Source license. New or old versions are provided through the
GitHub portal. Transparent and open problem management is carried out by the Open
Data Kit alliance. ODK consists of 68 packages covering different features of ODK and
is constantly improving [16].
The survey carried out in this paper concerns an engineering school wishing to know
the home establishment (CPGE) of students who obtained good grades in the first year
of school while allowing an analysis by gender, by CPGE, etc. This survey concerns
only students enrolled in the school. Thus, it is requested to create a form containing
the necessary fields to collect the appropriate data.
This article is organized into 4 sections. Besides the introduction, the second section
begins with a presentation of ODK-X software suite, then conduct our case study ac-
cording to the classic collection process thus the proposed Smart collection process.
Then, the third section present the results of the two approaches before concluding the
article in forth section.
2 Research Method
The Open Data Kit (ODK), as mentioned in several articles like [13], [17] and [16]
is one of the best-known software suites in this field and many researchers continue to
improve it in order to better meet the various user expectations. It is easy to learn and
no extensive training is required to handle it. It was designed to be used by anyone with
or without programming skills, thanks to its operation sufficiently adapted to a large
population. It consists of a suite of mobile data collection tools and has desktop and
mobile applications for data collection and management as well as a server for synchro-
nizing the collected data.
iJIM ‒ Vol. 15, No. 13, 2021 29
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
The objectives of ODK as mentioned in [18] are:
• Make the tools modular and customizable so that they can be easily interoperable for
each deployment.
• Exploiting interfaces and open standards so that solutions are not “compartmental-
ised” in monolithic packages of difficult company to understand and maintain.
Following the article [19], the use of ODK is constantly increasing and this is linked
to the fact that it is free and open source allowing to appropriate the tool with good
programming knowledge, otherwise a simple use of its wide range is also sufficient for
some cases. However, the reason for its popularity is also linked to its online support
community which is active and constantly improving the various components.
On their official website [20], the community has released two suites ODK and
ODK-X that coexist and that one does not replace the other. Each suite contains tools
that work together to collect, use, and manage data, but the two suites require different
levels of technical skills. In general, ODK tools are easier to use, require less configu-
ration, and are widely adopted. However, for a complex study and with technical skills,
ODK-X tools may be better suited.
Our choice fell on ODK-X for the following reasons:
• Flexible tool suite that supports complex workflows through JavaScript customi-
zation
• Non-sequential navigation
• Bidirectional synchronization
• Data management on the device
One of the main goals of ODK-X tools is to reduce the complexity that organizations
encounter with the software engineering skills required when designing data manage-
ment applications. ODK-X allows developers and data managers to build data manage-
ment applications that consist of survey forms as well as JavaScript-based applications
as needed. Organizations generally use productivity software such as Excel to create
these applications. Thus, the skills required to create a data management application are
based on writing a form definition in XLSX then processed by the XLSForm tool, or
simple web programming using HTML and JavaScript for personalized presentations.
Advanced web programmers can easily implement fully customizable web pages to
collect, manage and visualize data on an Android device [20].
ODK-X [19] enables the creation and customization of domain independent mobile
applications that meet the needs of an organization within the constraints imposed by
Android. ODK-X's protocols and timing structures are designed to be adaptable under
extreme mobile network conditions, such as long periods of disconnection or low band-
width and high latency. ODK-X replicates data to mobile devices, allowing the Frame-
work to retain full functionality in disconnected environments.
The ODK-X suite offers various services to the user to keep their tool powerful and
flexible. The most relevant points based on their official website [20], which offers
fairly comprehensive documentation on the tool, are:
30 http://www.i-jim.org
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
• Synchronize bidirectional data: A two-way synchronization protocol allows us to
create data management applications with:
─ Monitoring surveys and data collection locations
─ Pre-filled forms for faster data collection
─ Data can be synchronized on all devices from the server
• Offline data collection: Allows users to collect data without an internet connection.
The form data can be synchronized with the server when the user has Internet access.
• Linked and embedded surveys: ODX-X tools allow to open and edit other surveys
with links to the original survey, and create a subform (nested) relationship between
surveys or links relationships between data.
• Data view on device: Analyze and visualize entire data sets directly on the device
via graphical, map and tabular views and filtered views.
• User access control: Control of privileges to view, modify and delete data for dif-
ferent users and groups.
• Customizable survey feed and appearance: Using basic web development
(HTML, JavaScript, and CSS) to specify the layout of almost any screen seen by
data collectors.
2.1 ODK-X: Classic collection process
The data collection phase is defined in the literature [21] as a means of acquiring raw
data from one or more specific environments. Thus, we will follow the collection pro-
cess proposed in [22] which we consider to be the classic process used by default and
is defined in six steps.
• Define the objectives of data collection: Identify the CPGEs establishments of
the students who obtained the best results in their 1st year of the engineering cy-
cle.
• Develop a list of questions of interest: For example, the first question being “Did
you spend your 1st year at this establishment?” aims to target our audience.
• Establish data categories: The fields to be completed and which are linked to
the identification of the student are: Last Name, First Name, Gender, CIN (Iden-
tity Card Number), CNE (Student Card Number) and Registration Number. We
consider that the student may not know his Registration Number which is linked
only to the establishment, or even his CNE which is rarely used, hence the prop-
osition of the CIN field as well. The Last Name and First Name fields also appear
in the form out of habit.
• The other fields related to this case are: City CPGE, Name CPGE, Year of
success in 1st year, Grade obtained in 1st year.
• Design and test the data collection form: The first section of the created form
involves verification of the user: If it is actually among the targets of our investi-
gation, whether he spent his first year at school. Thus, a first question is manda-
iJIM ‒ Vol. 15, No. 13, 2021 31
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
tory before completing the form: Did you spend your 1st year at the establish-
ment? Accordingly, we must create the adequate lines in form under the Survey
tab (see Table 1).
Table 1. Excel json code configuration file - display of the
first screen of the Default Process.
Clause begin screen end screen
Type note select_one
Values_list yesno
Name Skip1Year
Display.prompt.text
Did
you spend your 1st
year at the school?
Display.prompt.image img/school.png
Required TRUE
Display.required_message.text Please reply to:
The answer to this question is mandatory to continue the process. The choices "Yes"
or "No" are defined in the choice tab of the Default_Processus.xlsx form. An error mes-
sage is then displayed, stating that this step is essential thanks to the mention "TRUE"
in the "Required" column of the form under the Survey tab. On a negative response on
the first screen of the survey, the user is considered off-target. Thus, an end of process
screen will be displayed to the user.
Once this first step is verified and the user responds positively to the question, the
second step is to define the fields to fill in to meet our objective. In this case, the type
of each field represents the only check performed in the default process.
Regarding the form, no requirement has been defined, so no field will be required
when entering. The choice of fields is made according to the following three aspects:
• The student's identifier to subsequently verify the accuracy of the information filled
in during the analysis phase.
• The gender and year of success of the 1st year for an analysis by year and gender as
expressed at the beginning.
• The grade for the first year at the establishment, city and name CPGE.
The type of fields is consistent with the nature of the information expected, the CNE
and Registration Number are integer types. In order to minimize the number of errors
entered, we have chosen to use a drop-down list for the following fields: Gender, CPGE
Name, CPGE City and Year of passing the 1st year. While the Grade is a decimal field,
the other fields like Last name, First name and CIN are of type text.
Gender accepts three values namely Female, Male and Other mentioned under the
choices tab of the form. Changes to the choices sheet contain the response lists defined
for our drop-down lists. The headers used correspond to:
• Choice_list_name: The group name for all the answers in a choice set.
• Data_value: The data value to select.
32 http://www.i-jim.org
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
• Display.title.text: The text the user will see to select this value.
In the same way as the gender, we have filled in the Name CPGE and City CPGE
with the complete list presented in official website. Our target being the students still
present in the school, so the year of success in 1st year cannot be lower than 2015.
When the user proceeds to the next step, he will find an end screen.
• Collect and validate data: This part will be described in section 5 (Results). It is
about collecting data according to defined test cases.
• Analyze data: This part will be described in section 5 (Results). This is to discuss
the results of the data collected.
2.2 Smart collection process
In this section, we will follow the Smart data collection process, which is made up
of the seven sub-processes representing the steps [1].
Planning: The planning sub-process allows us to define the strategies to be fol-
lowed, the requirements and to know our client better. This is a set of activities to be
performed in order to define all the strategies relating to data collection.
• Identify customer requirements. The functional requirements raised are as fol-
lows:
─ Identify students who spent their 1st year at the institution
─ Know the Moroccan CPGEs of origin of the students who obtained the best grades
in the 1st year at the institute
─ To be able to filter by genre
─ To be able to filter by year of obtaining in order to make an analysis for each
promotion
• Define the protocol's strategy: Surveys [23] are a popular means of data collection
because they are inexpensive and can provide a wide perspective. In this thesis, we
have opted for the survey using a mobile device. Mobile data collection [24] is a
method of compiling qualitative and quantitative information using a mobile device.
This approach will allow us to increase the speed and accuracy of data collection,
the efficiency of service delivery and the productivity of program staff.
• Finally, to make our survey of better quality and for validation purposes, we
added a second method of data collection, which is “Textual or content analysis” for
the reliability of the results provided and this thanks to the administrative file held
by the school.
• Define a search strategy: Based on the CartOng article [13], a comparison of 26
tools used for mobile data collection allowed us to choose ODK-X as the tool that
best meets our need.
• Define an enrichment strategy: The enrichment strategy will combine two ap-
proaches, the first being the survey that will be done regularly at the start of each
iJIM ‒ Vol. 15, No. 13, 2021 33
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
academic year among target students. The second approach is to complete the miss-
ing information if necessary based on the administrative file containing the grades
and information necessary for each student, in order to have results reflecting reality
and not just sampling.
• Define a storage strategy: The objective of this survey being to subsequently ana-
lyze the success keys of students from CPGEs, we will then keep track of data from
the last 5 years only. Any data beyond 5 years will be destroyed to free up the space
dedicated to storage.
• Define an evaluation strategy: The administrative file collects all of a student's in-
formation during their course. Based on the high rate of accuracy of the information
present in this file, our assessment will be made by relying solely on the data pro-
vided by this file.
• Validate strategies: The confidence measure linked to the administrative file will
allow us to estimate the quality and validity of the data. The goal is to have all the
students who obtained the best 1st year grades at the institute for each academic year.
Protocol: The result of this planning allowed us to put in place the appropriate ac-
tions and strategies to create a form that best meets the need while being adapted to the
process. Once the key objectives have been set, the Protocol sub-process describes the
activities to be carried out to obtain the desired data.
• Define inclusion/ exclusion criteria
─ Languages to include/ exclude: We only include the same language used in the
administrative file.
─ Sources to include/ exclude: Data collected through the forms ODK-X, refer to
the first survey: 2nd year students who spent their 1st year at the institute, and 3rd
year students who spent their 1st year at the institute.
─ Beyond the 1st survey, only 2nd year students who have spent their 1st year at
the institute will be concerned by these surveys.
• Define integration criteria: We must unify values of the size of the field “Grade”
obtained in the first year for its format in storage can be written with two decimal
places.
• If the user fills in at least one of the “Last name” or “First name” fields, the value
of the field must be masked when storing data to respect anonymity.
• Choose methods to use: Depending on the search strategy defined, the method as-
sessed useful is the ODK-X software suite.
• Define procedures to be used: Following the defined protocol strategy, the proce-
dure to be used will be a survey based on electronic forms that will be completed by
the students of the institute who meet the pre-established criteria.
• Define customer satisfaction rate: To measure customer satisfaction, we will need
to have a rate of over 70% of the population present to judge that our data collection
is relevant.
Data management:
• Receive data
34 http://www.i-jim.org
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
─ Referential of procedures: In the first phase of the form we ask the user if he has
already spent his 1st year at the institute. However, a footnote mentions that the
answer to this question is only intended for Moroccan students, from a CPGE and
who are still studying at the institute as 2nd and 3rd year students only. The ob-
jective of the note is to not waste students’ time who have integrated the institute
through a faculty who hasn't followed their first year at this institute for example.
─ Development of data summary: When filling the form by a user, we notice the
items described below and it’s the disadvantage of forms in general: Incomplete
forms, Duplicate forms, Forms with incorrect data
─ Improve collection: To solve the problem of incomplete forms, we have put a
number of management rules. Thus, one of the CIN, CNE or Registration Number
fields must be filled in to successfully identify the student and make the necessary
checks. An error message is displayed preventing going to the next step if the
condition is not met. This view is possible thanks to the addition of the constrained
column in the form under the Survey tab (see Table 2).
Table 2. Constraint on user identification
type name display.prompt.text constraint
display.con-
straint_message.text
text CIN CIN
((data('Registration-
Number') != null) || (data
('CIN') != null) ||
(data('CNE') != null))
One of the following
fields must be speci-
fied: CNE, Registra-
tion Number or
integer CNE CNE
((data(' Registration-
Number ') != null) ||
(data('CIN') != null) ||
(data('CNE') != null))
One of the following
fields must be speci-
fied: CIN, Registration
Number or
integer RegistrationNumber RegistrationNumber
((data(' Registration-
Number ') != null) ||
(data('CIN') != null) ||
(data('CNE') != null))
One of the following
fields must be speci-
fied: CIN, CNE or
Similarly CPGEs the Name field is also mandatory and implicitly CPGEs City, since
it feeds the CPGEs name once a choice is selected from the list of available cities. This
correspondence is done thanks to the choice_filter column which allows to create this
link between the two columns. If the user does not enter the CPGE Name, an error
message is displayed preventing him from going to the next step of the form. The choice
to make this field mandatory is to know the list of CPGEs of students who obtained
better grades in their 1st year. The display of this view is possible thanks to the addition
of the constraint column in the form under Survey tab.
Finally, to meet the other requirements, we need to know the year of passing the 1st
year as well as the grade obtained. Likewise, we've added an error message that displays
if the condition is not met by adding the correct condition under the Survey tab of the
form.
In order to improve the consistency of the data sent without reducing the perfor-
mance of the form, we have opted to add a verification step after the user has filled in
all the mandatory fields.
iJIM ‒ Vol. 15, No. 13, 2021 35
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
Thus, only one call will be made to the server for data verification in order to mini-
mize the number of calls to the server and the response time. The data is checked first
by CIN if filled in otherwise by CNE and in the event that none of these fields is filled
in we check the data by Registration Number.
The verification is done on all non-empty fields even if the field is not mandatory,
based on the administrative file present in the server. The call to this verification is
made thanks to the column "calculation" in the form under the Survey tab with the
value: VerifData (data ('Name'), data ('First name'), data ('CIN '), data (' CNE '), data ('
Matricule '), data (' Gender1 '), data (' City_CPGE '), data (' Name_CPGE '), data (' year
'), data (' Grade ')). The verif variable is used to define the order of verification when a
user enters several identifiers. The identifiers are the CIN, the CNE and the Registration
Number.
We then check the output value if it is false in this case the data is not compliant and
the form cannot be finalized with an error message displayed. In this case, we are using
the constrained column in the form under the Survey tab to display the error message.
The choice not to mention the error explicitly is due to the fear of intentionally divulg-
ing information to an individual who is groping for the exact data with our help speci-
fying the origin of the error.
• Manage sources: The list of CPGE names and the list of CPGE cities cannot be
deduced from the administrative file because they can change from one year to an-
other depending on the students admitted, for this we have put all the data in a csv
file conforms to the official website. This file is read using queries built into the
Smart_Processus.xlsx file under the queries tab. The call to this data is made in the
survey tab of the same file by adding the value of the query_name field in the "val-
ues_list" column of the desired field.
• Smart Data L0: The result up to this point represents the valid data conforming to
that present in the administrative file.
• Create data
─ Criteria creation: If the data is not sufficient to achieve customer satisfaction,
that is at least 70% of the students answered the form, the missing data can be
created internally until the desired threshold is reached.
─ The students concerned by this creation are student spent their 1st year at the in-
stitute and currently at their 2nd or 3rd year.
─ Data validation: The data will follow the same validation procedure as that pre-
sented when receiving the form from a student.
─ Smart Data L0: The result up to this point represents the valid data conforming
to that present in the administrative file.
• Integrate data
─ Sensitivity analysis: Knowing well that our choice to keep the same graphic form
as the one present in the classic approach, allows us to understand the influence
of the missing data on the result. Reason why we send implicitly the value of the
36 http://www.i-jim.org
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
Sector field from the administrative file, while sending the form to the server to
complete the information that we judge important to analyse.
─ Statistical analysis: In this case, we can analyze the number of students who
filled in the CIN field, the CNE field and finally the Registration Number field,
with the objective of knowing which of these fields can be defined for identifica-
tion.
─ We can also know the number of participants who did not spend their first year at
the establishment and who still answered the questionnaire.
─ Analysis of data to be excluded: In the context of anonymity, we can hide the
first and last names of students, especially as this information does not add any
value to us in our process. Thus, this check is added in the formulaFunctions.js
file in JavaScript and called by the value securite_nom (data ('Name')) or secu-
rite_nom (data ('First name')) mentioned in the "calculation" column of the form
under the Survey tab.
─ Analysis of conflicting results: For this scenario, no data will be contradictory
compared to another, since the value of the data is compared to the administrative
file before completing the questionnaire.
─ Data comparison: In our case, this step is not necessary. All data in the same
column has the same type.
─ Integration: We want the format of the Grade to be structured as follows 1X.XX
with X € [0, 9]. The toPrecision method allows to format a number to a defined
length, in our case we set it to 4. This format change will not impact data verifi-
cation, but it will be implemented when sending the form to the server.
─ In addition, we add a condition so that the “Grade” field is between 10 and 20
(data ('Grade')> = 10) && (data ('Grade') <20)). Note that the minimum passing
grade may vary from year to year for another reason why we have set a score of
10 as the minimum threshold. If this condition is not met an error message is
displayed.
─ Smart data L1: The results obtained are valid data in accordance with those pre-
sent in the administrative file while respecting the anonymity of the users by hid-
ing their Name and First name if filled in by the student. All the values in the
"Grade" field respect the same format. Finally, some necessary data is added when
sending the form without an additional step by the user.
• Assess
─ Evaluation of quality criteria: The total coverage of customer needs is due to
the good quality of the criteria established in the planning phase.
─ Evaluation of data quality: The quality of the data is respected and conforms to
the requirements declared by the customer thanks to the conformity of the rec-
orded values.
─ Evaluation of integration criteria: All the data are in the same format, so we
can deduce that the integration criteria are well respected.
iJIM ‒ Vol. 15, No. 13, 2021 37
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
• Synthesize: Since this approach involves communication with an external server for
data verification, the ODK-X architecture has undergone a slight improvement by
using edge computing [25] in architecture.
Once the form is validated and saved to App Designer, synchronization with ODK
Aggregate is the second step. We can then download the data from the server using the
correct authentication. On a mobile device, data cannot be viewed or created unless the
phone is synchronized with the server, thus allowing data to be downloaded.
We can thus start a new form, or follow the modification of a form that we have
already initiated from ODK-X Survey or ODK-X Tables. The modification we have
added to the process is the call to the external server which is only done at the last step
of the form and this in order to lighten the process and only make the call when it is
necessary. Knowing that a blocking step following three attempts by the same phone
on the same form is envisaged to also limit the number of external calls.
Once the user completes the form, he can keep the changes only on his phone by
updating OI File Manager, or submit it to the server. The administrator will be able to
view the data sent to the server from his computer once synchronized.
Presentation of the results
• Summarize: Present the recorded data to the server.
• Deduce the results from the data:
o Identify gaps: In this case, no deviation is identified by following this process.
• Interpret the data: The interpretation of the data makes it possible to identify the
CPGEs from which the students who obtain the best grade in the 1st year of the
engineering cycle come.
• Define persistent problems: A student can complete the form as many times as
they want if all the information is correct.
• A user can try to fix errors on a form many times and this can be the cause of
malware.
• Identify improvements: When submitting a form we can verify that the customer
ID does not already exist in our database. Otherwise, we may display an error
message stating that the user has already taken the survey.
• For the second issue raised we plan to set a countdown that will block the user
from retrying if they exceed 3 attempts, until they are unblocked by an admin.
Enrichment
• Analyze the need of enrichment: The gender field is not mandatory, but we can
do a gender analysis to have statistics by gender as well. Thus, we can add this
value at the moment of sending a valid form based on the value present in the
administrative file when it’s not entered.
• Analyze the need of future enrichment: We need enrichment data every year,
once the 1st year students pass.
Visualization
This part will be described in the next section. It's about visualizing the results and
discussing them.
38 http://www.i-jim.org
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
3 Results and Discussions
The test cases allow us to put several scenarios that can be encountered when filling
out the form. Then, we compare the results obtained for each sub-process based on
these test cases (see Fig. 1).
Fig. 1. Comparison measurements.
For example the first case that has the merit of being present is when the first ques-
tion of the form is answered in negative way. We then begin the different cases with a
positive answer to the first question on the form.
Unlike the classical approach results that returned all records, the result of the same
test cases using the smart approach returned fewer result. Thus, the parameters used to
measure the quality of data between the classic collection process and the Smart col-
lection process:
• Number of data with empty mandatory field
• Number of duplicate data
• Number of invalid data (incorrect value)
• Data rejected
Using the classic collection process, we notice that all records are accepted even
though in our case only 5 test cases are correct. In the other case, the Smart collection
process will reject all the test cases with missing mandatory fields or invalid fields, but
will only accept one case of duplicate data, which was noticed at the end of the process
and noted as an improvement.
Thus, the result of our test using the Smart approach, allowed us to record only 6
results. Beside, to fully unfold our process, we need to launch a second test campaign
given the low number of data collected. However, we have noticed that some cases of
iJIM ‒ Vol. 15, No. 13, 2021 39
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
duplication can be generated due to a different use of the student ID. For this, we pro-
pose as an area of improvement for the next collection to add this verification.
4 Conclusion
In this paper, we have tested Smart data collection process via the ODK-X software
suite and we have managed to identify the added value of this process compared to the
one used by default during collection. To this end, we carried out the same test case
according to the two processes. The first, which is by default, allowed all cases to be
collected and sent to the server, although some data are worthless.
Upstream, our experiment raised the blocking points and the disadvantage of the
workflow used by default to achieve a satisfactory result while meeting the customer's
end goal. The user is guided so as not to provide incomplete or erroneous information
and this in order to assemble clean and structured data to optimize analysis time and
storage space. With architecture ODK-X we raised the added validation step does not
change the overall system instead, it allows to refine the collection without impacting
the performance of the tool.
One of the strengths of our process is the continuous improvement of the data and
this thanks to the dedicated steps to check if the data meets the pre-established plan or
not and if not, the action to be taken to cure it. Knowing that an evaluation of the meth-
ods used is also an asset in remedying a mistake in planning. The process is a way to
organize the steps to follow while putting in place the means for continuous improve-
ment and steps to a reflection on the choices to be made.
5 References
[1] I. Tikito and N. Souissi, "A Smart Process of Data Collect," Proceeding: Intelligent Systems
and Computer Vision (ISCV), pp. 1-7, 2020b.
[2] E. Paradis, B. O'Brien, L. Nimmon, G. Bandiera and M. A. T. Martimianakis, "Design:
selection of data collection methods," Journal of graduate medical education, vol. 8, no. 2,
pp. 263-264, 2016.https://doi.org/10.4300/jgme-d-16-00098.1
[3] I. Tikito and N. Souissi, "Towards a systematic collect data process," International Journal
of Big Data Intelligence, vol. 7, no. 2, pp. 72-84, 2020a. https://doi.org/10.1504/
ijbdi.2020.107374
[4] . Tikito and N. Souissi, "Data Collect Requirements Model," Proceedings of the 2nd inter-
national Conference on Big Data, Cloud and Applications. ACM, p. 4,
2017.https://doi.org/10.1145/3090354.3090358
[5] I. Tikito and N. Souissi, "Methodology of Data Systematic Review: a step-by-step guide,"
Proceedings of the 3rd international Conference on Big Data, Cloud and Applications,
2018.
[6] A. Tella, "Electronic and paper-based data collection methods in library and information
science research: A comparative analyses," New Library World, vol. 116, no. 9/10, pp. 588-
609, 2015. https://doi.org/10.1108/nlw-12-2014-0138
40 http://www.i-jim.org
https://doi.org/10.4300/jgme-d-16-00098.1
https://doi.org/10.1504/ijbdi.2020.107374
https://doi.org/10.1504/ijbdi.2020.107374
https://doi.org/10.1145/3090354.3090358
https://doi.org/10.1108/nlw-12-2014-0138
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
[7] K. Avi, G. Nicholas, G. Thomas, K. John D and S. Mark J, "Validation relaxation: a quality
assurance strategy for electronic data collection," Journal of medical Internet research, vol.
19, no. 8, p. e297, 2017. https://doi.org/10.2196/jmir.7813
[8] A. Flaxman, A. Stewart, J. Joseph, N. Alam, S. Alam, H. Chowdhury, M. Mooney, R.
Rampatige, H. Remolador, D. Sanvictores, P. Serina, P. Streatfield, V. Tallo, C. Murray,
B. Hernandez, A. Lopez and I. Riley, "Collecting verbal autopsies: improving and stream-
lining data collection processes using electronic tablets," Population health metrics, vol. 16,
no. 1, p. 3, 2018. https://doi.org/10.1186/s12963-018-0161-9
[9] S.-W. Jun, M. Liu and S. Lee, "BlueDBM: Distributed Flash Storage for Big Data Analyt-
ics," ACM Transactions on Computer Systems (TOCS), vol. 34, no. 3, p. 7, 2016.
[10] H. Cai, B. Xu and L. Jiang, "IoT-based Big Data Storage Systems in Cloud Computing:
Perspectives and Challenges," IEEE Internet of Things Journal, 2016.
https://doi.org/10.1109/jiot.2016.2619369
[11] R. Kaur, I. Chana and J. Bhattacharya, "Data deduplication techniques for efficient cloud
storage management: a systematic review," The Journal of Supercomputing, vol. 74, no. 5,
pp. 2035-2085, 2018. https://doi.org/10.1007/s11227-017-2210-8
[12] M. Nayak and K. Narayan, "Strengths and weakness of online surveys," IOSR Journal of
Humanities and Social Science, vol. 24, no. 5, pp. 31-38, 2019.
[13] CartOng, "Benchmarking of Mobile Data Collection Solutions," 26 Janvier 2017. [Online].
Available: https://blog.cartong.org/wordpress/wp-content/uploads/2017/08/Benchmark-
ing_MDC_2017_CartONG_2.pdf. [Accessed 2019].
[14] P. L. Bokonda, K. Ouazzani-Touhami and N. Souissi, "A Practical Analysis of Mobile Data
Collection Apps.," International Journal of Interactive Mobile Technologies, vol. 14, no.
13, pp. 4749-4754, 2020. https://doi.org/10.3991/ijim.v14i13.13483
[15] P. L. Bokonda, K. Ouazzani-Touhami and N. Souissi, "Open Data Kit: Mobile Data Col-
lection Framework for Developing Countries," International Journal of Innovative Tech-
nology and Exploring Engineering (IJITEE), vol. 8, no. 12, pp. 4749-4754, 2019.
https://doi.org/10.35940/ijitee.l3583.1081219
[16] P. L. Bokonda, K. Ouazzani-Touhami and N. Souissi, "Mobile Data Collection Using Open
Data Kit," in International Conference Europe Middle East \& North Africa Information
Systems and Technologies to Support Learning, Springer, 2019, pp. 543-550.
https://doi.org/10.1007/978-3-030-36778-7_60
[17] H. Carl, L. Adam, A. Yaw, T. Clint, B. Waylon and B. Gaetano, "Open data kit: tools to
build information services for developing regions," Proceedings of the 4th ACM/IEEE in-
ternational conference on information and communication technologies and development.
ACM, p. 18, 2010. https://doi.org/10.1145/2369220.2369236
[18] Y. Anokwa, C. Hartung, W. Brunette, G. Borriello and A. Lerer, "Open-source data collec-
tion in the developing world," Computer, vol. 42, no. 10, pp. 97-99, 2009.
https://doi.org/10.1109/mc.2009.328
[19] B. Waylon, S. Samuel, S. Mitchell, L. Clarice, B. Jeffrey and A. Richard, "Open Data Kit
2.0: A services-based application framework for disconnected data management," in Pro-
ceedings of the 15th Annual International Conference on Mobile Systems, Applications,
and Services. ACM, 2017. https://doi.org/10.1145/3081333.3081365
[20] odk-x, "opendatakit," 2020. [Online]. Available: https://docs.opendatakit. org/odk-x/. [Ac-
cessed 2020].
[21] I. Tikito, M. El Arass and N. Souissi, "Meta-Analysis of Data Collect Methods," Journal of
Computer Science, vol. 15, no. 8, pp. 1184-1194, 2019. https://doi.org/10.3844/
jcssp.2019.1184.1194
iJIM ‒ Vol. 15, No. 13, 2021 41
https://doi.org/10.2196/jmir.7813
https://doi.org/10.1186/s12963-018-0161-9
https://doi.org/10.1109/jiot.2016.2619369
https://doi.org/10.1007/s11227-017-2210-8
https://blog.cartong.org/wordpress/wp-content/uploads/2017/08/Benchmarking_MDC_2017_CartONG_2.pdf.%20%5b
https://blog.cartong.org/wordpress/wp-content/uploads/2017/08/Benchmarking_MDC_2017_CartONG_2.pdf.%20%5b
https://doi.org/10.3991/ijim.v14i13.13483
https://doi.org/10.35940/ijitee.l3583.1081219
https://doi.org/10.1007/978-3-030-36778-7_60
https://doi.org/10.1145/2369220.2369236
https://doi.org/10.1109/mc.2009.328
https://doi.org/10.1145/3081333.3081365
https://docs.opendatakit.org/odk-x/
https://doi.org/10.3844/jcssp.2019.1184.1194
https://doi.org/10.3844/jcssp.2019.1184.1194
Paper—ODK-X: From A Classic Process to A Smart Data Collection Process
[22] V. R. Basili and D. M. Weiss, "A methodology for collecting valid software engineering
data," IEEE Transactions on software engineering, no. 6, pp. 728-738, 1984.
https://doi.org/10.1109/tse.1984.5010301
[23] S. M. Kabir, Basic Guidelines for Research: An Introductory Approach for All Disciplines,
First ed., B. Z. Publication, Ed., Bangladesh, 2016, pp. 201-275.
[24] A. Samaddar, A. Ajay, A. Keil, A. Rai, S. Sharma, S. Pal, A. Arora, S. Marwaha, S. Islam,
A. Gupta and R. K. Paul, "Open data kit for diagnostic crop production survey at landscape
level in India," International Maize and Wheat Improvement Center (CIMMYT), pp. 11-
17, 2019.
[25] H. Li, K. Ota and M. Dong, "Learning IoT in edge: Deep learning for the Internet of Things
with edge computing," IEEE network, vol. 32, no. 1, pp. 96-101, 2018. https://doi.org/10.
1109/mnet.2018.1700202
6 Authors
Iman Tikito has more than 8 years of international experience as Business Analyst
and Engagement manager, working for a multinational company. She holds a double
Master Degree in IT Applied to Offshore Development from the University of Moham-
med V at Morocco, a master degree in Offshore Development of Information Systems
from University of Bretagne Occidental at France. She’s currently pursuing her Ph.D.
in Science and technology for the engineer at Mohammed V University in Rabat, EMI-
SIWEB Team, Rabat, Morocco. Email: iman.tikito@gmail.com
Nissrine Souissi is a full Professor at Systems Engineering and Digital Transfor-
mation Laboratory (LISTD), SSDT Team, Computer Science Department, MINES-
RABAT School (ENSMR), Morocco. She obtained her PhD in computer science from
the University Paris-Est Creteil (UPEC) in 2006, France and an Engineer degree from
Mohammadia School of Engineers (EMI) in 2001, Morocco. Her research interests in-
clude process engineering, business process management, digital transformation and
data engineering. Email: souissi@enim.ac.ma
Article submitted 2021-03-28. Resubmitted 2021-04-29. Final acceptance 2021-04-30. Final version pub-
lished as submitted by the authors.
42 http://www.i-jim.org
https://doi.org/10.1109/tse.1984.5010301
https://doi.org/10.1109/mnet.2018.1700202
https://doi.org/10.1109/mnet.2018.1700202
mailto:iman.tikito@gmail.com
mailto:souissi@enim.ac.ma