Vol. 1, No. 1 | Jan – June 2017 SJCMS | P-ISSN: 2520-0755 | Vol. 1 | No. 1 | © 2017 Sukkur 96 Spatial Data Analysis: Recommendations for Educational Infrastructure in Sindh Abdul Aziz Ansari, M. Abdul Rehman, Ahmad Waqas, Shafaq Siddiqui Department of Computer Science, Sukkur IBA, Pakistan aziz.mscs2013@iba-suk.edu.pk , rehman@iba-suk.edu.pk , ahmad.waqas@iba- suk.edu.pk, shafaq.siraj@iba-suk.edu.pk Abstract Analysing the Education infrastructure has become a crucial activity in imparting quality teaching and resources to students. Facilitations required in improving current education status and future schools is an important analytical component. This is best achieved through a Geographical Information System (GIS) analysis of the spatial distribution of schools. In this work, we will execute GIS Analytics on the rural and urban school distributions in Sindh, Pakistan. Using a reliable dataset collected from an international survey team, GIS analysis is done with respect to: 1) school locations, 2) school facilities (water, sanitation, class rooms etc.) and 3) student’s results. We will carry out analysis at district level by presenting several spatial results. Correlational analysis of highly influential factors, which may impact the educational performance will generate recommendations for planning and development in weak areas which will provide useful insights regarding effective utilization of resources and new locations to build future schools. The time series analysis will predict the future results which may be witnessed through keen observations and data collections. Keywords: Spatial analytics, Data Analytics, Education, GIS. 1. Introduction Education is highly significant element for a developing country like Pakistan. Keeping this fact in perspective, Government of Pakistan has allocated sufficient amount of budget for improving education to Grass-root level. Standardized Achievement Test (SAT) is a reform initiative, a very timely and needful strategy to explore the dynamics of student learning in Sindh province. ‘World Bank’ also recognizes the effectiveness of SAT. Project SAT focuses on attitudinal changes in teachers and students effective learning influenced by the environment and infrastructure provided on regional basis. This study presents the analysis of data collected from SAT project. According to several reports including SAT-I, II and III, the quality of education throughout the Sindh province is alarming. According to SAT-II report, a test was conducted of the students of class V and VIII in three subjects, i.e. Science, Language and Math. SAT-II results show that the overall average score in all subjects is below 30% in all regions of Sindh province, which definitely a crucial situation. There may be several reasons of such failure, like teachers, physical infrastructure, language problems or the locations of schools. ASER National Report 2015 indicates a critical education status in Sindh mailto:aziz.mscs2013@iba-suk.edu.pk mailto:rehman@iba-suk.edu.pk mailto:ahmad.waqas@iba-suk.edu.pk mailto:ahmad.waqas@iba-suk.edu.pk mailto:shafaq.siraj@iba-suk.edu.pk A. Aziz et al. Spatial Data Analysis: Recommendations for Educational Infrastructure in Sindh (pp. 96 - 107) SJCMS | P-ISSN: 2520-0755 | Vol. 1 | No. 1 | © 2017 Sukkur 97 rural areas. The report indicates that less than 40% students capable of reading, writing stories in Sindhi, Urdu or English and doing basic mathematical operations [1]. ASER 2014 Sindh rural report shows alarming situations in various aspects of its education. Table 1.1 shows statistics gathered from ASER 2014 Sindh Rural Report [2] Table 1: Learning Levels LEARNING LEVELS (CLASS 5) English 24% can read sentences in English Urdu/Sindhi 41% can read story in Urdu/Sindh Arithmetic 31% can do 2-Digit division in arithmetic FACILITIES AVAILABLE FOR GOVERNMENT PRIMARY SCHOOLS Funds: 26% Useable Water: 59% Boundary Wall: 64% Useable Toilets:48% Reform Support Unit (RSU) also shows the statistics about the condition of education in sindh. Table 2 shows statistics of SAT 2014- 15[3]. Table 2: Content Strand Based Scores ClassV Subject Content Strand Conte nt Stran d Avera ge (%) Subje ct & Overa ll Avera ge (%) Standa rd Deviati on Langua ge Reading 54.16 32.81 18.6 Writing 11.47 Math Number & Operation 18.70 18.22 12.78 Measurem ent 37.74 Geometry 14.65 Informati on Handling 11.56 Science Life Science 14.76 15.26 11.04 Physical Science 14.49 Earth & Space Science 28.46 Overall Scores (%) 22.10 11.79 1.1. Geographical Information System Geographical Information System (GIS) helps us visualize, analyse, interpret and understand data to reveal relationships and trends. According to Foorte, K.E and M.Lynch:“A geographic information system (or GIS) is a system designed to capture, store, manipulate, manage, and present spatial or geographical data” [4].In the beginning, use of GIS was aimed at the creation of maps only. The automation of paper based maps provided new idea of analysing data geographically using geometrical shapes and the database/linked data. This method was initiated by the Harvard Lab for Computer Graphics [5]. 1.2. Quantum gis QGIS (Quantum GIS) is stable open source geographic desktop application that provides efficient data viewing and analysis capabilities. Different countries and organization prefer GIS based analysis of available data that helps them in designing robust policies for the future of the country. Various independent international works have been carried out in order to infer the hidden factors that determine the progress of the education system in their particular country or region. 1.3. Geostatistics Statistics is the science of producing facts and figures based on real/sample data by applying some analytical methods like finding A. Aziz et al. Spatial Data Analysis: Recommendations for Educational Infrastructure in Sindh (pp. 96 - 107) SJCMS | P-ISSN: 2520-0755 | Vol. 1 | No. 1 | © 2017 Sukkur 98 averages, correlations, regression etc. This is an inferential approach to make decisions. The merger of GIS and Statistics came with new dimensions of analytics. In spatial/geo statistical analysis objects are represented by basic geographical symbols like lines, points and polygons. GIS presents spatial information to have independent analysis based on various features that highlight hidden patterns within data [5]. 1.4. Time Series Analysis and forecasting: Time series is a set of observed points x noted at an identified time t [6]. Plotted points express the growing or declining behaviour of data. The ordered series should be continuous in nature. Most of the time, A traditional time series is composed of two major components: Seasonal variation and Trends. Seasonal component includes analysis of growth or pattern in periods i.e. weekly, monthly, quarterly or yearly, while Trend component is based on linear increasing or decreasing trend [7]. Selection of method is based on the context and nature of data. Time Series Forecasting is a method used to predict the future data. Observed time series points x1,x2,…,xN can lead to the next possible trend xN+h where h (h for forecasting horizon) is the lead time. Most of the literature has divided forecasting in three general classes which may be used together in some situations.  Judgmental forecasts based on subjective judgment or perception.  Univariate methods based on heuristic data series having some linear trends.  Multivariate methods based on some predictors [7] According to the literature review [7] seasonality is considered as additive if it is not dependent of local mean and sum of the tables over years’ values generally are stabilized to∑ it = 0. Seasonality is considered as multiplicative if size of the seasonal variation is related to the local mean and sum of the year’s values can be normalized by modifying the averagei_t=1. Regression, Moving average and Exponential Smoothing are some of the popular forecasting methods. 1.5. Pearson Correlation Coefficient According to National Council on Measurement in Education (NCME), correlation coefficient r is a numeric value that determines the statistical relationship or dependencies between two variable/attributes [8]. This can define the positive, negative or neutral effect of an attribute on other within the same cluster. A positive relation indicates that the increasing change in attribute A, affects the attribute B positively or increasingly. Negative relation indicates a negative or decreasing effect on attribute B when attribute A changes increasingly. No relation indicates that attribute A has no effect on attribute B. It measures dependencies of variables by value -1