1249 - ready The Journal of Community Informatics ISSN: 1721-4441 Articles, special issue: ODSCSD User Centred Methods for Measuring the Quality of Open Data A project to identify metrics for assessing the quality of open data based on the needs of small voluntary sector organisations in the UK and India. We used small structured workshops to identify users’ key problems and then worked from those problems to understand how open data can help address them and what the key attributes must be for successful use. We then piloted different metrics that could be used to measure the presence of those attributes. This user-centred approach to open data research highlighted some fundamental issues with expanding the use of open data from its enthusiast base. Current metrics of the quality of open data are mostly based around the production of datasets and technical standards and not around the needs of potential users. Data portals often track progress by reporting the number of datasets that conform to the five stars of linked open data (Berners-Lee, 2006). More sophisticated attempts such as the Open Data Barometer (Davies, 2013) include measuring the progress of open data through the presence of datasets in certain categories, whether they meet legal criteria, and the existence of technical functions such as the ability to download the dataset in bulk. Caplan et al. have aggregated a list of initiatives to measure the value of open data (Caplan et al., 2014). Although there are some metrics that are specific to a sector and take into account the content of the datasets, these are derived ‘top– down’, for example, by assessing what properties the data needs to conform to regulations. While they provide a valuable perspective and are relatively easy to implement, there is no evidence that these top–down approaches address users’ most pressing concerns. As such, they are weakly linked to the impact of open data. The highly technical nature of open data in !47 Frank, M., Walker, J. (2016). User centred methods for measuring the quality of open data. The Journal of Community Informatics, 12 (2),(Special issue on Open Data for Social Change and Sustainable Development), 47-68. Date submitted: 2015-07-20. Date accepted: 2016-05-09. 
 Copyright (C), 2016 (the authors as stated). Licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5. Available at: www.ci-journal.net/index.php/ciej/article/view/1249 Mark Frank University of Southampton, United Kingdom Corresponding Author. mark.frank@soton.ac.uk Johanna Walker University of Southampton, United Kingdom J.C.Walker@soton.ac.uk http://www.ci-journal.net/index.php/ciej/article/view/1249 http://www.ci-journal.net/index.php/ciej/article/view/1249 mailto:mark.frank@soton.ac.uk mailto:J.C.Walker@soton.ac.uk The Journal of Community Informatics ISSN: 1721-4441 practice harbours the potential for citizen users to become disengaged from the process of shaping and constructing relevant quality characteristics. This prompted our research question: What is the nature of open data metrics derived from user requirements and are they viable? This project explores ‘bottom–up’ methods for measuring the quality of open data that are grounded in what users need from data to perform core functions; thus producing metrics that are more directly related to the impact of the data. It has two aims – developing a methodology for identifying metrics that are relevant to a specific user context, and identifying and evaluating some metrics for one such group. Our focus is on practical metrics that are either already in use or could reasonably be used in the near term. As such, they need to strike the right balance between being easy to implement and relevant: easy to implement in the sense that they can be used without excessive effort; relevant in that they are closely correlated with key desirable characteristics of the datasets in a given context. We believe that in exploring the perspective of the non-specialist user in some detail we are starting to address a significant gap in open data research which has, to date, focused on the implementation and strategic implications of open data. We have done this by using a combination of new methods such as structured workshops and role-plays which have their roots in traditional IT methods for investigating user requirements but have been adapted for open data. The literature on data metrics and open data metrics often refers to metadata. However, the scope of the word ‘metadata’ has not yet been clearly defined. It can be limited to structured data about a dataset (for example a data portal may have fields for author and such-like) or expanded to include any document which describes the dataset. We have used metadata in the first sense and used supporting documentation to refer to any documents which give information about the dataset. The paper is structured as follows: First we review data quality, metrics and methodologies literature, and examine the criteria for a good metric before outlining our methods. Then we present the results of the workshops, and derive and pilot our metrics. We conclude with a discussion and implications for the field and further work. Literature review Data quality characteristics The literature reviewed fell into three areas concerned with data quality and the assessment thereof. Information systems and database management literature provides general data quality research that is applicable to open data (Batini et al., 2009; Scannapieco & Catarci, 2002). Linked open data (LOD) research examines machine-readable assessments of data (Behkamal et al., 2014; Erickson et al., 2013). Finally, the socio-technical open data field has looked at data quality and measurement through the frame of barriers to usage of data (Martin et al., 2013; Zuiderwijk et al., 2012). There is no common standard of definitions for data quality (Scannapieco, Missier & Batini, 2005). Wang (1998) memorably defines it as ‘fitness for use’ but this lacks measurability and !48 The Journal of Community Informatics ISSN: 1721-4441 requires more detail in order to be operationalised. There is not even consensus on the meaning of the terms used to outline the dimensions, for instance, timeliness may be used to refer to the average age of a source or the extent to which the data is appropriately up-to-date (Batini et al., 2009). Scannepieco and Cartarci (2002) survey six sets of dimensions of data quality, representing a variety of contexts and define data quality as a set of dimensions including accuracy, completeness, consistency (at format level and at instance level), timeliness, interpretability and accessibility as the most important factors from a list of 23. Of these, accuracy and completeness were the only factors that were cited in all six sets of dimensions. Even then, few of these are absolute measures and they are often relative to a specific dataset or application. For instance, current data may in fact be too late for a specific application, and therefore not timely (Scannapieco & Catarci, 2002; Scannapieco, Missier & Batini, 2005). Data quality is also subject to tradeoffs, often between timeliness and another dimension such as accuracy or completeness (Scannapieco, Missier & Batini, 2005). These will vary with the domain, as may the attributes themselves. Reviewing the LOD literature, Zaveri et al. (2012) identified 26 dimensions of data quality. Compared to the dimensions cited above there was a far greater emphasis on provenance and other trust-based metrics, reflecting the distributed nature of open data. They also focus on amount-of-data, noting, an appropriate volume of data, in terms of quantity and coverage, should be a main aim of a dataset provider. Licensing and interlinking are also key attributes of LOD. Accurate metadata is also vital for findability and cataloguing; reflecting the fact that open data is no longer defined within an organisation and thus needs to be discoverable by anyone (Maali, Cyganiak & Peristeras, 2010; Reiche, Hofig & Schieferdecker, 2014). Reiche et al. (2014) propose metadata quality as a characteristic, being the fitness of the metadata to make use of the data it is describing. This speaks to the unpredictable use of open data. Socio-technical research such as Barry and Bannister (2014) and Zuiderwijk et al. (2012) primarily derive their quality characteristics from interviews and workshops with civil service (publishers) and academia. Discovery is a frequent issue in the literature (Conradie & Choenni 2014; Keen et al., 2013). It identifies interpretability – the users’ ability to comprehend the data in a set, and a number of aspects of interoperability, including formats, endurance, varying quality and licensing, reflecting the envisaged use of combining datasets from various sources. Although data quality assessment is well-established, the added complexities of open data – the autonomy and openness – mean that a new set of data quality areas have started being added to the literature. However, this is becoming divided into two areas – the machine- readablility issues addressed by the LOD field and the more people-oriented issues identified by the socio-technical studies. A more unifying approach may be called for. Additionally, these studies are approached from a publisher or machine-intermediary point of view, and the field of user-derived metrics is in its infancy. !49 The Journal of Community Informatics ISSN: 1721-4441 Metrics To operationalize data quality there must be a way to assess it. It is clear from the preceding section that open data quality characteristics include some that are novel to the general corpus of data quality work. Consequently, new metrics such as tau, ‘the percentage of datasets up to date in a data catalogue’ (Atz, 2014) are being developed to engage with these attributes alongside those previously identified. Many metrics are based on the technological structure of LOD such as examining consistency through the ratio of triples using similar properties (Behkamal et al., 2014) or by applying an automated test such as the Flesch-Kincaid Reading Ease. These metrics have the value of automation but cannot be performed ad hoc on all open datasets. Bizer and Cyganiak (2009) suggest three classifications of metrics for information quality filtering. The first, structured content, could be assessed statistically by analysing the structure. Context-based metrics usually rely on a third-party check, for example, against a list of trusted providers, or metadata analysis. Ratings-based metrics (such as the Five Stars of Linked Open Data) are about the information or information provider, and depend on some subjectivity or skill of the assessor creating the rating, which may often be produced algorithmically. The above suggests that the creation of metrics must address a number of different dimensions. They pertain not only to the data but its creator, and not solely to its presentation or structure but to its meaning. They may change over time and are only useful in so far as they serve the purpose of the user of the metric. There are few existing metrics that can be applied without tools to any kind of dataset. Data quality assessment methodologies Batini et al. (2009) define a data quality methodology as an operational description of a logical process to assess and improve the quality of data. This makes explicit the idea that the attempt to understand data quality is made not in and of itself, but in the service of utilisation. Pipino et al. (2002) state that there is a lack of ‘fundamental principles for […] developing useable metrics in practice and note that it is not practicable to create ‘one size fits all’, but rather these fundamental principles should be sought. Su and Jin (2004) suggest these might be derived in three ways: intuitively, systemically and user-based. Batini et al. (2009) review 13 methodologies for the assessment of data quality. These indicate a variety of approaches are employed, from questionnaires through subjective and objective metrics to statistical analyses. Most methodologies are appropriate for distributed systems, however; they generally apply to co-operative situations, where most of the parties can be considered to be aware of each other, which cannot be true for open data. This suggests there is a potential need to create a method for deriving and addressing fundamental principles amongst a variety of open data users that is appropriate for autonomous use in an extremely distributed system. It would of necessity be an audit model – one where only use decisions, and not the data itself, can be improved by users, and not the data itself. Su and Jin (2004) suggest user-based methodologies are problematic as being !50 The Journal of Community Informatics ISSN: 1721-4441 most subjective, but they are also robust in that for a specific group of users they identify their exact concerns. What makes a good metric? The preceding section suggests that open data requires a definition of data quality that is broad and loosely defined. We want to take into account more than the content of the datasets, e.g. discoverability, and we want to apply our metrics in environments where the aims are not clearly defined. We therefore propose a correspondingly broad and loose definition of a metric as ‘An observable characteristic of one or more datasets that acts as a proxy for some other characteristic of interest which is less easy to observe’. In this paper we will refer to the characteristic of interest as an attribute of the data. Choosing a metric for an attribute is similar to choosing an operational definition for a concept. Like an operational definition a good metric should be valid (closely correlated with the attribute of interest) and reliable (gives consistent results over time and between observers). It should measure attributes that matter and should be sufficiently closely tied to the attribute that it is difficult to ‘game’ the metric. In addition, during the course of the project we identified these desirable characteristics of a metric: Discriminatory. The metric should be sensitive enough to discriminate between common values of the attribute. Efficient. The less time and resource required to use it the better. In some contexts poor efficiency can lead to poor validity and reliability. If the aim is to measure a large number of datasets for a large variety of users (e.g. the Open Data Barometer) then poor efficiency may force the assessors to use a small convenience sample which potentially introduces both bias and sampling error. Transferable. The same metric can be used in a variety of different contexts – in our case a range of different user groups – and across cultural and economic variation. Comparable. This is an extension of transferability. If a metric is comparable not only is the metric transferable to a wide variety of contexts but the results can be meaningfully compared. Ideally this would result in a universal standard that transcends cultures and applications. We propose that the ideal metric would rate highly on all of these criteria: an efficient assessment (e.g. automated) that could be quickly run against a large group of datasets with high validity and reliability giving results that are comparable for a wide range of contexts. In practice there is often a trade-off between these criteria. For example, we may accept limited transferability as a cost of increased validity. Metrics lie on a spectrum between the most subjective, which involve a high degree of judgement, and the most objective, which involve little judgement. Greater objectivity is associated with greater reliability. More objective metrics may also allow for automation (for example, automatically inspecting metadata for the recent updates) which can lead to greater efficiency – although this is not always true. However, it is often hard to find an objective metric that is valid, and a subjective metric with suitable guidance and support for the assessor can have more utility. In this project we !51 The Journal of Community Informatics ISSN: 1721-4441 focused on metrics that are towards the objective end of the scale, while noting the importance of subjective metrics as an alternative. While these are all relevant criteria for a good metric, the quality of a metric depends ultimately on whether it fulfils its purpose. In the context of open data this purpose can be as varied as comparing progress of data providers, estimating the impact of open data, or evaluating the usability of a data portal. In this paper we assume the purpose is to determine the value of a group of open datasets to a defined community of users. The group of datasets is deliberately left loosely defined. It might be as large as all data published by a national government or as small as a portal from a specialist provider.’ There are also cultural considerations in choosing a metric. A metric that is hard to understand or which has an obscure relationship with the attribute it is intended to measure is unlikely to be accepted in practice – even if it is efficient, valid and reliable. We therefore focused on straightforward metrics which have a direct and logical relationship with the attribute they were intended to measure. Methodology Our interest was in identifying metrics that reflect the core concerns of users who are not open data specialist or enthusiasts. We needed to uncover the relationships between users’ problems, the information that would help them solve those problems, and the data that might supply that information. This depth and subtlety of insight would be very difficult to uncover through large-scale quantitative approachs. Therefore, we used qualitative methods to work closely with two small selected groups of potential users to explore in depth if/how open data could contribute to their work. Our approach differed from many open data events for users (e.g. hackathons) which are aimed at promoting open data and developing a community. Such events are vital to the open data movement, but they have two characteristics which made them unsuitable for our purposes: the participants come because they have some kind of special interest in open data and typically they are presented with some data and then work to find good uses for it. This means the participants are not at all representative of the populations and is a serious skewing of the sample from the point of view of our research. The data to problems approach can be very successful, but the danger is that it produces interesting solutions to relatively minor problems – problems selected because they are amenable to open data solutions – not because they are core concerns of the users. To ensure that we were addressing significant problems we reversed the order – starting with identifying problems that most concerned our users and then trying to discover how open data might help them with those problems. We were determined not to have preconceptions as to what matters about the data and let the users tell us what mattered to them. Only then did we go on to consider suitable metrics. Selection of user groups We had prior contact with voluntary sector organisations supporting the homeless in Winchester, UK. These organisations were a suitable group of users for this study because: !52 The Journal of Community Informatics ISSN: 1721-4441 We wanted to start with a well-established open data culture such as the UK, which should minimise confounding variables relating to the early stages of the availability of open data; These particular organisations are not ‘open data aware’. There is no special requirement for IT or data skills and in this respect they are typical of thousands of voluntary sector organisations; and Preliminary discussion with a voluntary sector coordinator established that the UK voluntary housing sector has several real business issues that might be addressed by open data. We used unstructured interviews and e-mail to identify three such organisations and confirm that they were suitable to participate. While we wished to restrict the study to a well-defined and limited set of user problems, we were keen to develop metrics that transcended cultural and developmental barriers and were relevant to emerging economies. We identified four organisations in Gujerat, India, who met similar criteria in terms of being relatively small, focused on a specific urban geographical area (Ahmedabad) and working with the homeless. The difference in stage of developmental growth and governmental policies meant we could not expect to exactly mirror the activities of the UK organisations, but they all delivered programmes to support the homeless or poorly housed in Ahmedabad, which we felt was sufficiently similar so as not to affect our methodology. Identifying key attributes Each group of users attended two structured workshops (see Table 1) to jointly develop and document the user’s story including: The problems they have to solve; The information they need to solve them; How open data can contribute to this information; How this data can be found; and What attributes of the data are required for using the data in this context and what attributes (if any) are preventing them from using it. These attributes were interpreted very broadly ranging from technical format and licensing arrangements through to details of the content, availability of support, currency and provenance. The key consideration was to discover, without preconceptions, which attributes are truly significant for the users. The first workshop was used to identify the most important problems facing the group and what additional information would most help them address those problems. In the period between the workshops the researchers tried to identify open datasets that could potentially supply at least some of the missing information. To do this: !53 The Journal of Community Informatics ISSN: 1721-4441 Table 1 Workshops and participants We discarded problems where we knew the required information would not be available as open data (e.g. information about named individuals); We searched the relevant government open data portal (data.gov.uk and data.gov.in) using keywords derived from the information the users needed; In the UK we searched specialist data portals such as the Shelter Databank, and in India we used exemplar projects such as Transparent Chennai and the Karnataka Learning Partnership as a guide to what might be available in Gujerat and in what form; We took advice from specialists such as the Department for Communities and Local Government (DCLG) in the UK and DataMeet in India; and We used Google as this sometimes proved more efficient at finding data than the government portal search mechanisms. As a result we selected a small number of datasets (see Appendix) that came close to providing part of the information that the users had identified in the first workshop. At the second workshop each group was presented with the selected datasets, asked to review them and decide, through a group discussion, whether and how they could they could be used in practice. This allowed us to identify important dataset attributes at high level. We then asked the participants to select one or two datasets out of those discussed that had the most potential (one dataset in the UK and two in India). They were asked to annotate these datasets with comments on what would make them useful and their annotations were encoded. We then asked the participants to tell the story of a typical situation in which they might use these !54 Workshop location Workshop type Number of attendees Number and type of organisations Winchester UK Problem and information need specification (2 hrs) 4 3
 2 x temporary shelter 
 1 x social housing Winchester UK Data selection and assessment (2 hrs) 4 3
 2 x temporary shelter 
 1 x social housing Ahmedabad, India Problem and information need specification (3 hrs) 6 4
 1 x state budget analysis 
 1 x slum rehousing 
 1 x migrant workers 
 1 x basic services for slum dwellers 
 Ahmedabad, India Data selection and assessment (3hrs) 4 3
 1 x education intervention 
 1 x basic services for slum dwellers 
 1 x slum rehousing The Journal of Community Informatics ISSN: 1721-4441 datasets. The objective of this ‘role-play’ was to recreate, to a limited extent, the environment in which our participants would be using the datasets and thus uncover any important requirements derived from their working environment which may not be obvious when focusing on the datasets themselves in a workshop environment. This allowed us to confirm and add details to the key dataset attributes. The output of the two workshops was a list of attributes that the users agreed were important if open data was to be useful. Details of the structure of both workshops are the appendices. Following the workshops we investigated possible metrics that both help to identify whether the selected attributes are present and are practical to implement. For any given attribute there are potentially an indefinitely large number of ways of measuring it. However, due the approach adopted, in practice there were few candidates for any given attribute. We wanted to evaluate each metric against our outlined criteria. We did this by piloting the metrics against a sample of datasets relevant to our community. This comprised ten datasets from the UK and five from India. The UK datasets were selected from a list generated for the Open Data Institute Housing Open Data Challenge . We chose these because they had been 1 selected as being relevant to housing by experts independent from our project. From that list we selected the first dataset from each provider to give a cross-section of providers. There was no equivalent list for India so we included the datasets that were used in the second workshop as we knew they were relevant to our users. We piloted the metrics against all the datasets and noted: The metric score for each dataset; How confident we felt in the score (a measure of objectivity and therefore reliability); and How easy it was to make the assessment (a measure of efficiency). We then assessed the validity and transferability/comparability of the metric on theoretical grounds. Results Problems and information needs The organisations attending the UK and Indian workshops had much in common in terms of their biggest problems and the information that would help them. For example, organisations in both countries struggled with identifying which welfare benefits individuals were entitled to. A full list of problems and information requirements is provided in the appendices. Attributes The most important result for this project was that five attributes of datasets were identified as being significant by this group of users. There is no accepted terminology for these http://www.nesta.org.uk/closed-housing-open-data-challenge 1 !55 http://www.nesta.org.uk/closed-housing-open-data-challenge The Journal of Community Informatics ISSN: 1721-4441 attributes (W3C 2015) so we have used terms that we believe are unlikely to cause confusion based on current usage. Discoverability. Datasets can be discovered via many different routes including general purpose search engines such as Google; government data portals such as data.gov.uk or data.gov.in; specialist intermediaries such the UK Shelter Databank; and word of mouth. As described in the methodology section, the researchers searched for relevant datasets based on the information needs of the participants using a combination of these routes. This proved to be very demanding even with the support of subject matter experts. For example, we were advised by the Ahmedabad Centre for Environmental Planning and Technology University that slum data in Excel format existed but we were unable to locate it using either Google or data.gov.in. The participants commented that it would be a very significant issue had they undertaken the search themselves. It’s a full time job [tracking down the appropriate data] isn’t it? (UK) It is an issue with people who have not looked at this data, they would not put in those titles (India) Granularity. To address some of their most pressing problems the attendees needed information about individual people and potential homes, such as knowing the benefit status of a homeless person or the addresses of landlords that will accept lodgers receiving state benefit. It isn’t sufficient to know rates of acceptance in Winchester. It has to be number 2 something street. (UK) For privacy reasons open data is most unlikely to provide this information which severely limits the utility of open data in this context. For other problems it was useful to have data aggregated at higher levels such as city or district within city. The most useful level varied according to the dataset and specific problem being addressed. For example, generic data on the cost of crime and health services is sufficient for a funding application for additional resources. But data on the cost of specific crimes and treatments is required when making the case for providing permanent housing to an individual client with a particular profile. [It’s] good for research on aggregate level but in terms of providing service [we] need more detail (India) If it is not linked to a specific ward, how useful can it be? It can give you good overview of what is happening in the area but not for an intervention. (India) Immediate intelligibility. While the attendees were very competent in their field, they often found datasets hard to interpret. An apparently straightforward field such as the number of homeless people in a city, immediately raised questions of interpretation. At one extreme someone might be considered homeless if they are forced to leave their home for temporary reasons such as a flood, at the other extreme they might be someone sleeping the streets who is not known to the local authority. Without further explanation and information about how the data is collected it is impossible to know what the figure means. Similar issues of interpretation arose for almost all the datasets examined. Over half (26 out of 51) of the annotations on the datasets expressed a need for more information. On the other hand, the role-play revealed that participants did not typically have much time to understand datasets and therefore the time to !56 The Journal of Community Informatics ISSN: 1721-4441 understand the data is a critical aspect of this attribute. For example, in the UK workshop, the attendees explored using a dataset on costs of health treatments which, while initially hard to understand, was explained by a 58-page supporting document. Despite the presence of the supporting document, this dataset was not useful to the community as it would take too long to use the document. One doesn’t have too much time to read through it (UK) Is the question that was asked ‘where are you getting the water you drink’ or ‘where is your nearest drinking water’? (India) What is difference between ‘no exclusive room’ and ‘one room’? (India) You need the documentary information that supports this. (India) Trusted/authoritative. This was a particular concern in India where participants were extremely sceptical about the veracity of government data. For example, they assumed data on slums was incomplete because, by law, local government has to support inhabitants of slums and thus there is an incentive not to include slums in the data. In the UK, participants felt it was important that data came from an authoritative source and that they understood how it was collected, particularly if they used that data as part of a funding application. 
 For slum surveys, who is collecting the data and who is implementing it? If the same agency collects data about what are the gaps in provisions it may not [unintelligible] to collect, if a third party is doing the survey and paid directly by central government then it can conduct fair, impartial surveys. (India) Linkable to other data. Both countries identified a need to discover relationships between data items that were not available in the datasets in the published format but must have existed in the raw data. For example, the UK participants needed to compare the cost of their interventions with the cost of crime and health interventions for the homeless. This can be seen as a requirement to have data in the appropriate format. Data presented in PDF or Excel format had been curated by the publishers and selected, in most cases, from a larger set of data. This meant certain choices about what would be displayed in that particular dataset had been made by the publishers, and it was not possible to ‘re-attach’ other data that had been excluded. Technologies such as LOD or HATEOAS could potentially address this requirement but current tools are beyond the scope of these users. To use this we need some other variables as well, like this many people are having own house, but not infrastructure, and we need geographical area. (India) If you can cross-check, [with certain income data] whether someone has a TV and a fridge, this can verify whether their income is correct. (India) Metrics Following the workshops we proposed and assessed metrics for each attribute. Discoverability Metrics for discoverability presented significant difficulties. It is not practical to develop a metric that takes into account all possible routes that may be used to discover a dataset. Therefore any proposed metric must be relative not only to the data being sought but also the !57 The Journal of Community Informatics ISSN: 1721-4441 route being used. Even within this constrained context, we struggled to identify a useful metric. We considered the following metric which assumes that the route to discovering the dataset is via a keyword search (which is frequently the case): ‘Given a set of keywords to search for a dataset how many alternative datasets are generated and what proportion of the alternatives include the required data’. This might act as a proxy for how quickly the route leads to the target data. It clearly raises difficulties in choosing appropriate keywords but this need not matter if the result is not sensitive to the precise choice of keywords. To test this we searched for datasets on housing stock using different combinations of keywords on data.gov.uk. However, in practice the results appear to be extremely sensitive to the choice of keywords. For example, using dwelling as a synonym for housing and supply as a synonym for stock we obtained the results in Table 2: Table 2 Sensitivity of different keywords when searching for data on ‘housing stock’ on data.gov.uk Subjective metrics also present a problem as they require the assessor to put themselves in the shoes of a typical user who may have very different skills and attitudes to the assessor. We therefore focused on an approach based closely on our own problems in discovering appropriate data. All of these tasks proved challenging: Identifying the organisation likely to supply appropriate data; Finding a data portal or other data search service used by that organization; Using the data portal/search service to produce a list of possible candidate datasets that was not too long and which we were confident had included any datasets of interest. A key concern here was that data might be referred to by a synonym (e.g. ‘dwelling’ instead of ‘house’ and as a result we might not find it); and Examining the list of candidate datasets to see if they contained the data of interest quickly. In addition: We had no way of knowing whether appropriate datasets existed other than finding them and wasted a lot of time looking for data that we never found and may not have existed; and !58 Keywords Number of datasets returned Dwelling stock 102 Housing stock 154 Dwelling supply 17 Housing supply 78 The Journal of Community Informatics ISSN: 1721-4441 When we found datasets in one format (e.g. PDF) it was often challenging to determine if they were available in a more useful format such as Excel. We constructed a metric based on the availability of solutions to these challenges (with the exception of identifying the organisation for owning the data, for which we were not able to identify any solution). For any given dataset we awarded one point for each of the following: The publisher/owner of the data has an open data portal (or similar search mechanism); The publisher/owner of that portal publishes an updated, searchable list of datasets; The publisher/owner of that portal publishes an updated, searchable list of datasets with synonyms; The publisher/owner of that portal publishes a list of datasets which are known to exist but are not currently available. This would limit the time wasted on abortive searches; and The dataset is accompanied by a list of alternative formats. Publishing in multiple formats is recommended by the World Wide Web consortium (W3C, 2015). We piloted this metric against the sample datasets. Many of these features are features of the data portal to which the dataset belongs. All but one of the UK datasets were served by data.gov.uk and none of them included a list of alternative formats. Therefore, the majority of datasets had the same score and discrimination was low. The Indian datasets, and the one UK dataset not served by data.gov.uk, had different scores and we therefore believe the low discrimination was a function of the limited datasets on which we piloted the metric. Discovering the features of the portal was time consuming, but once they were established it took only a few minutes to rate a dataset according to this metric and there was little judgement involved. We therefore found the metric to be reliable and rated its efficiency as medium. It also appears to be transferable and comparable. There is a major issue over validity because it does not measure the difficulty of the initial challenge of identifying an owner. Nevertheless, it is directly related to other challenges in discovering data and therefore has some validity in that sphere. Granularity Although the required level of granularity varies according to the problem being addressed, it is always possible to combine data with greater granularity into higher levels with less granularity, while the reverse is generally not possible. This suggests that a metric could be based on the principle that the more granular the data the better. Although there is some potential for doing this automatically, the technology is not currently at the level where we could pilot it. For the foreseeable future, most datasets will require human intervention and subject matter knowledge to recognise different levels of granularity. For a well-defined context it can be straightforward to specify the levels of granularity that are most meaningful for a type of data. An assessor can then assess datasets according to whether they include these levels. While the effective granularity of a dataset is in theory a function of the relationship between the different fields in the dataset, for the purposes of this metric we only considered fields !59 The Journal of Community Informatics ISSN: 1721-4441 independently. We piloted this approach using five levels of geographical granularity on the sample datasets. For the UK datasets we used National (i.e. UK), Country, County, City, Address (i.e. the identified building); for the Indian datasets we used National, State, District, City (or Village), Address. The results were promising. For the UK datasets, in every case it was possible to identify the level almost immediately on opening the sample dataset with very little requirement for personal judgement (in some cases data was presented by local authority and a small level of background knowledge and judgement was required to decide whether this should be classified as county or city level). This was a little harder for the Indian datasets as in several cases a dataset comprised several tables in a PDF document some of which were at state level (which was assumed from the context of the document) and some at district level. Nevertheless, the level of granularity was apparent for individual tables. Thus, the metric had high efficiency and reliability. The metric had such a direct relationship to the attribute of granularity it was hard to doubt its validity. All five levels were found among the datasets suggesting good discrimination. There seems to be little problem in theory applying the metric to other types of granularity and other data although the results would not be comparable. This approach is simple and direct but limited. It measures how granular a dataset is in a very specific context and requires prior specification of the context. We considered a more sophisticated approach, which is to measure the scope of a dataset to support different levels of granularity in a broader context including being combined with other data. This relies on the fact that some data facilitates aggregation while other data does not. For example, in the UK post codes allow for aggregating data geographically but house names do not. We refer to such linking data as class data as it indicates a class to which the individual can be allocated. Some class data is more generic than other class data in that it is not specific to a domain. The post code can be used almost wherever there is a requirement for geographical aggregation. The residential status of a house (owner-occupied, private rented, public rented, vacant) can be used for aggregation but is context specific. We explored a scale from 1 to 4 where the levels are: 1) Includes aggregated data only e.g. national statistics; 2) Includes individual unit level data but with no generic class data; 3) Includes generic class data; and 4) Includes more than one form of generic class data. We piloted this scale against the sample datasets. The metric proved to be reliable and efficient; in every case it was possible to classify the dataset on inspection with minimal judgement required. All datasets scored either 1 or 3. This casts some doubt on its discrimination as it suggests it may effectively be a two-level metric. The fact that the key data is generic suggests that the metric would have good transferability and comparability. It is harder to assess the validity. The metric measures the ability of data to participate in aggregation but it is by no means certain that this translates into aggregations that are useful for our community of users. It is a concept which needs further research. !60 The Journal of Community Informatics ISSN: 1721-4441 Immediate intelligibility To assess immediate intelligibility we considered using an automated test of data readability – similar to the Flesch-Kincaid test for document readability (Flesch, 1948) – as a metric. However, current tests of readability are designed for documents not data, and a test would need to be developed. Even then, it is not clear that such a test would be valid. The intelligibility problems that the participants came across were a function of background knowledge rather than the specific words that were used and it is hard to see how an automated readability test would detect this type of problem. We therefore focused on measuring the availability of supporting information. A simple approach is to rate datasets on the accessibility of supporting information bearing in mind that speed of intelligibility is vital. A possible scale might be (with increasing value): 1) Supporting documentation does not exist; 2) Supporting documentation exists but as a document which has to be found separately from the data; 3) Supporting documentation is found at the same time as the data (e.g. the link to the document is next to the link to the data in the search); 4) Supporting documentation can be immediately accessed from within the dataset but it is not context sensitive. This might be a link to the documentation or text contained within the dataset; 5) Supporting documentation can be accessed immediately from within the dataset and it is context sensitive so that users can directly access information about a specific item of concern. This might be a link to a specific point in the documentation or the text contained within the dataset; thus eliminating the need to search the documentation and speeding up access to the relevant material. We piloted this against the sample datasets with limited success. Evaluating the level of support involved some subjectivity in many cases e.g. Does a footnote in a spreadsheet count as level 5 support? Does supporting documentation fall into level 2 or 3? Can we be sure there is no supporting documentation because we have failed to find it? The process was efficient in that it was possible to determine the level of support almost immediately upon opening the dataset and there was good discrimination with results including levels 1, 3, 4 and 5. There is no problem in principle in transferring the metric to other domains and the results would be comparable if they are simply interpreted as measuring the speed of availability of supporting documentation. The biggest issue is validity. The metric raises some issues where datasets are available in multiple formats. Some formats such as LOD and Excel facilitate linking to supporting documentation better than others such as CSV. We intend the metric to refer to the available format of the dataset that has the best links to supporting documentation. However, as discussed under the section on discoverability it may not be easy to determine all the available formats for a given dataset. Also the metric takes no account of the quality of the supporting documentation. A point identified by Reiche et al. (2014) when discussing metadata quality is that it is one thing to quickly locate supporting documentation, but another to understand it and get the required support. !61 The Journal of Community Informatics ISSN: 1721-4441 Trustworthiness Our users trusted (or mistrusted) data for a variety of reasons: They know (or don’t know) how it was collected and processed; It comes from a trusted source; It is internally consistent and plausible; and/or It is consistent with other external sources. The first two reasons suggest metrics based on provenance. The second two suggest metrics based on consistency tests. There has been theoretical work on metrics of consistency and plausibility, see for example Prat and Madnick (2008), but this has not resulted in any usable tools or methods. We therefore primarily considered metrics based on provenance. There is extensive literature about systems for tracking provenance (Suen et al., 2013) and standards for exchanging information about provenance (Buneman, Khanna & Tan, 2000; Moreau et al., 2011; Pignotti, Corsar & Edwards, 2011). But this does not suggest metrics that could be implemented in the short term. We explored a relatively simple approach – evaluating whether the data or supporting documentation answers key questions that are relevant to provenance. Corsar and Edwards (2012) make the case that open data metadata, in addition to common requirements such as date and authors, should: If possible, expand on this with a description of the dataset's provenance. This includes describing the processes involved (e.g. screen scraping, data transformation) the entities used or generated (e.g. the downloaded timetable webpage and the generated timetable spreadsheet), and the agents (e.g. users, agencies, organisations) involved in the creation of the dataset. This record should also include the relationships between them. Ram and Liu (2009) propose seven questions (the seven Ws) which can provide the basis for this approach: 1) What is the data? 2) Who author/ organisation which created it? 3) Why was the dataset created? 4) (W)How was it collected - what events lead up to its collection? 5) When was it collected? 6) Where was it collected? 7) Which instruments were used to collect it? !62 The Journal of Community Informatics ISSN: 1721-4441 The same approach can be used objectively – simply recording whether the question has been answered – or more subjectively, but potentially with greater validity, by instructing an assessor to judge the quality of the answer. We piloted the objective approach on the sample datasets, awarding from 0 to 7 points to each dataset – one point for each of the 7 Ws for which there was an answer in the dataset or supporting documentation. Despite adopting the objective approach, it proved difficult to judge whether some of the questions had been answered or not. For example, if data refers to occupancy levels in 2012 is that sufficient information to answer the question: When was it collected? And it was time-consuming to inspect the documentation to see if the questions were answered. So we assessed reliability as medium to low and efficiency as medium. Discrimination was good with datasets being scored as low as 1 and as high as 7 (3 was the only level not represented). The metric is not context specific so it can be transferred and there seems no reason why the results should not be compared. The key concern is over the validity of the metric. In many cases the data scored quite low on the metric but was from a highly trustworthy source such as the UK Office of National Statistics. The metric takes no account of reputation-based trust (Artz & Gil, 2007), where trustworthiness of the data is derived from the trustworthiness of the source. A more sophisticated approach might take this into account. Linkable to other data The Five Stars of Linked Open Data is an accepted and easily applicable measure of open data format standards which reflects the user need to be able to discover unanticipated relationships among data. It can be interpreted not just as a technical standard but as a ‘soft’ standard, for example, making data findable and putting it in context. As this metric has already been studied and used extensively we did not do any further evaluation and accepted that it has high reliability, discrimination, transferability and comparability. We assessed the validity as medium because, while the metric can be a valid measure of the technical scope for exploring new relationships in the right hands, there are several reasons it might fail in practice. The ability to link data, especially using automated methods, depends not only on the technical format but the structure and choice of data in the dataset. Users need to have the skills, time and resources to use the data and make the linkages. Even developers found it challenging when LOD was first introduced by the UK government (Sheridan & Tennison 2010). A more valid metric might reflect the value of presenting data in multiple formats which would allow for users in different contexts to manipulate it in different ways. However, this entails a basis for weighting the value of different formats for different communities of users which would require further research. Summary Table 3 summarises our assessment of the proposed metrics. We rate each metric as high, medium or low against the criteria (except that comparability and transferability are combined for conciseness). It is important to bear in mind that these assessments were based on using the metrics on a small sample of datasets relevant to the users we worked with. Nevertheless the results indicate that there is potential for viable metrics for the key attributes for this community based round simple and direct proxies. !63 The Journal of Community Informatics ISSN: 1721-4441 Table 3 Summary of assessment of metrics Discussion The data attributes Some of the data attributes that emerged from the workshops reflected known concerns with open data. Granularity is a key element of the primary principle of the 8 principles of Open Government Data (Malmud, 2007) and the need to link data is one of the fundamental tenets of the open data movement. On the other hand, the emphasis on being able to comprehend the data quickly was less predictable. Timeliness, which is often at the centre of such metrics, was only mentioned once in the workshops, and this was in the context of how often data was collected, rather than when it was published. As part of our methodology the study was deliberately limited to a specific set of users. This was an advantage in that the participants agreed about problems, information and attributes; but it also places limits on how widely the conclusions might be applied. For example, large campaigning charities have staff whose sole task is to analyse evidence and who have the time to understand data. They are likely to be less concerned with immediate intelligibility and more concerned with ensuring data is up-to-date so that it can be used in campaigns. Other communities may have different key attributes which would require different metrics. Attribute Metric Valid Reliable Discriminatory Transferable/ Comparable Efficient Discoverabili ty 5-point scale indicating presence of features which enable discoverability. Medium High Low (for the sample datasets but this may be an exception) High Medium Large effort to assess portal – shared over many datasets. Granularity Observe whether dataset includes preselected (context-specific) levels High High High Transferable but not comparable High Levels based on presence of generic class data Medium High Medium High High Intelligibility Scale for quality of link to supporting information Medium Low High Transferable but not comparable High Trustworthin ess Number of answers to the 7 Ws Low Medium High High Medium Linkable to other data 5 Stars of Open Data Medium High High High High !64 The Journal of Community Informatics ISSN: 1721-4441 Nevertheless, there is no reason why the same approach to identifying appropriate metrics for a particular group of users cannot be used in other contexts. All of the five attributes that emerged were important in both countries. There was a difference in emphasis, possibly because of the relative maturity of open data in the two countries. Although it was still a challenge, discoverability was much easier in the UK where there are several useful portals at both central and local government levels, and there is relatively good coordination between local and central government in the collection and distribution of statistics. In India there is still a lack of effective portals at the local level and there is less coordination between central and local government. For example, while collecting data is often a function of central government, it frequently fails to provide sufficient granularity for local government who have to regulate and administer programmes based on the data. Trustworthiness was a concern in the UK but the participants felt that the reputation of the provider might be sufficient to make the data trustworthy. The Indian participants required a stronger understanding of provenance and possible unreliability before they trusted the data. Metrics The aim of the project was to investigate a different approach to open data metrics. The resulting metrics should be considered simply as ideas for discussion, refinement and further research. However, the attributes they measure have been recognised in other literature and therefore there is good reason to suppose they are applicable to a wider user community. Validity is fundamental to any metric. The temptation to measure something just because it is reliable and efficient is very strong – but should be resisted. Measuring the wrong thing well is worse than measuring the right thing badly. We assessed only one of our metrics as high validity. This was the first of the two measures of granularity – simply inspecting datasets to see if they contained data which met predetermined levels of granularity. This limited the metric to a very specific context and meant the metric had no comparability. Another possible way to increase validity is to take a more subjective approach but at a cost in reliability; for example, by asking an assessor to judge whether a dataset is quickly intelligible rather than seeking an objective proxy for intelligibility. By definition subjective approaches require judgement which may differ from one assessor to another and thus affect reliability. However, subjective approaches do not necessarily increase validity. The assessors of open data are unlikely to be representative users and may struggle to adopt the role of a user with a different skill set, attitude and environment. To some extent this can be mitigated by supplying the assessor with strong guidance, but this requires the resources to develop and test the guidance which may well have to be repeated for different user groups. We hope that our approach has at least focused attention on what really needs to be measured (the attributes) and thus raises the profile of validity. There are developing technologies and standards which may provide better metrics in the future. Bizer, Heath and Berners-Lee (2009) suggest that a PageRank type algorithm – TrustRank – could eventually emerge for measuring trustworthiness, but this would be dependent on a great many more datasets in any one domain being available. The W3C Working Group on Data on the Web Best Practices recently identified indicating the non- availability of datasets as one of its draft best practices (W3C 2015, sec. Best Practice 21), !65 The Journal of Community Informatics ISSN: 1721-4441 and the Sunlight Foundation’s fourth principle of Open Government Data recommends a full inventory of available data and helpful context on what is unlikely to be released (The Sunlight Foundation, 2015) which would address some discoverability issues. Further implications Several lessons emerged beyond the aims of this project. It was apparent that a lot of the information that participants find critical to solving problems is information about processes, for example, how to recognise and respond to different kinds of ‘legal high’. This parallels Heald’s distinction between process and event transparency (Heald, 2011). This kind of information on how or why is not typically available through open data. The workshops suggest that more research is needed into what constitutes data literacy, and what skills it might comprise to increase the impact of open data. Our users were as competent as anyone could reasonably expect: technically (they included experienced Internet and Excel users); in their knowledge of the subject matter; and also in their understanding the significance of data. Yet, they struggled to interpret aspects of every dataset that was presented to them. Furthermore, the knowledge they needed to interpret the data was specific and not necessarily applicable to other datasets. This suggests that there is scope for more work to be done on the best way to provide context to any given dataset, which would go some way to removing this onerous requirement from the user. The study was limited to one small set of users in two different environments and similar user-oriented research needs to be done for a wider variety of user groups. Each group is likely to have its own key problems, information needs and attributes, and it is only by conducting a range of similar studies will it be possible to determine the scope of any conclusions. The methodology needs to refined and could be expanded. We learned several lessons which are noted in the appendices. It would be fruitful to ask users to find their data (as opposed to doing it for them as we did) and to get their feedback on the metrics. In addition we strongly believe that our approach of starting with the needs of users who are not open data enthusiasts needs to be used more widely – not just for developing metrics but for gaining a greater understanding of the limitations of open data and how it should move forward if it is go beyond the domain of specialists. Acknowledgments The funding for this work has been provided through the World Wide Web Foundation ‘Open Data for Development Fund’ to support the ‘Open Government Partnership Open Data Working Group’ work, through grant 107722 from Canada’s International Development Research Centre (web.idrc.ca). Find out more at http://www.opengovpartnership.org/groups/ opendata. References Artz, D., & Gil, Y. (2007). A survey of trust in computer science and the semantic web. Web Semantics: Science, Services and Agents on the World Wide Web, 5 (2), 58–71. !66 http://www.opengovpartnership.org/groups/opendata The Journal of Community Informatics ISSN: 1721-4441 Atz, U. (2014). The Tau of Data: A New Metric to Assess the Timeliness of Data in Catalogues. In P. Parycek & N. Edelmann (Eds.), CeDEM14 Conference for E- Democracy and Open Government (Vol. 22, pp. 147–162). Krems, Austria. Barry, E., & Bannister, F. (2014). Barriers to open data release: A view from the top. Information Polity, 19 (1), 129–152. Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for Data Quality Assessment and Improvement. ACM Comput. Surv., 41(3), 16:1–16:52. http:// doi.org/10.1145/1541880.1541883 Behkamal, B., Kahani, M., Bagheri, E., & Jeremic, Z. (2014). A Metrics-driven Approach for Quality Assessment of Linked Open Data. Journal of Theoretical and Applied Electronic Commerce Research, 9 (2), 64–79. http://doi.org/10.4067/ S0718-18762014000200006 Berners-Lee, T. (2006, July). Linked Data: Design Issues. Retrieved 2 December 2014, from http://www.w3.org/DesignIssues/LinkedData.html Bizer, C., & Cyganiak, R. (2009). Quality-driven information filtering using the WIQA policy framework. Web Semantics: Science, Services and Agents on the World Wide Web, 7 (1), 1–10. Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems, (Special Issue on Linked Data). Buneman, P., Khanna, S., & Tan, W.-C. (2000). Data Provenance: Some Basic Issues. In S. Kapoor & S. Prasad (Eds.), (pp. 87–93). Springer Berlin Heidelberg. Caplan, R., Davies, T., Wadud, A., Verhulst, S., Alonso, J., & Farhan, H. (2014). Towards common methods for assessing open data: workshop report & draft framework. Retrieved from http://opendataresearch.org/content/2014/709/towards-common- methods-assessing-open-data-workshop-report-draft-framework Conradie, P., & Choenni, S. (2014). On the barriers for local government releasing open data. Government Information Quarterly. http://doi.org/10.1016/j.giq.2014.01.003 Corsar, D., & Edwards, P. (2012). Enhancing Open Data with Provenance. Digital Futures. Aberdeen. Davies, T. (2013). Open Data Barometer. Erickson, J. S., Viswanathan, A., Shinavier, J., Shi, Y., & Hendler, J. A. (2013). Open Government Data: A Data Analytics Approach. IEEE Intelligent Systems, 28(5). Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3). Heald, D. (2011). When Transparency meets Surveillance: External Monitoring of Country Public Finances (pp. 19–20). Newark, New Jersey. Keen, J., Calinescu, R., Paige, R., & Rooksby, J. (2013). Big Data + Politics = Open Data : The Case of Healthcare Data in England. Policy and the Internet, 5(2), 228–243. Maali, F., Cyganiak, R., & Peristeras, V. (2010). Enabling interoperability of government data catalogues (pp. 339–350). Springer. Malmud, C. (2007). The Annotated 8 principles of Open Government Data. !67 http://doi.org/10.1145/1541880.1541883 http://doi.org/10.4067/S0718-18762014000200006 http://www.w3.org/DesignIssues/LinkedData.html http://opendataresearch.org/content/2014/709/towards-common-methods-assessing-open-data-workshop-report-draft-framework http://doi.org/10.1016/j.giq.2014.01.003 The Journal of Community Informatics ISSN: 1721-4441 Martin, S., Foulonneau, M., Turki, S., Ihadjadene, M., Paris, U., & Tudor, P. R. C. H. (2013). Risk Analysis to Overcome Barriers to Open Data. Electronic Journal of e- Government, 11 (1), 348–359. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., … den Bussche, J. Van. (2011). The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, 27 (6), 743–756. http://doi.org/10.1016/j.future.2010.07.005 Pignotti, E., Corsar, D., & Edwards, P. (2011). Provenance Principles for Open Data. Nottingham, UK. Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45 (4), 211–218. Prat, N., & Madnick, S. (2008). Measuring Data Believability: A Provenance Approach (pp. 393–393). http://doi.org/10.1109/HICSS.2008.243 Ram, S., & Liu, J. (2009). A New Perspective on Semantics of Data Provenance (Vol. 526). Washington DC. Reiche, K. J., Hofig, E., & Schieferdecker, I. (2014). Assessment and Visualization of Metadata Quality for Open Government Data. In P. Parycek & N. Edelmann (Eds.), CeDEM14 Conference for E-Democracy and Open Government. Krems, Austria. Scannapieco, M., & Catarci, T. (2002). Data quality under a computer science perspective. Archivi & Computer, 2, 1–15. Scannapieco, M., Missier, P., & Batini, C. (2005). Data Quality at a Glance. Datenbak- Spektrum, 14, 6–14. Sheridan, J., & Tennison, J. (2010). Linking UK Government Data. Raleigh, N.C. Su, Y., & Jin, Z. (2004). A Methodology For Information Quality Assessment In The Designing And Manufacturing Processes Of Mechanical Products. Suen, C. H., Ko, R. K. L., Tan, Y. S., Jagadpramana, P., & Lee, B. S. (2013). S2Logger: End- to-End Data Tracking Mechanism for Cloud Data Provenance (pp. 594–602). Los Angeles, CA, USA. http://doi.org/10.1109/TrustCom.2013.73 The Sunlight Foundation. (2015). Open Data Policy Guidelines. Retrieved from http:// sunlightfoundation.com/opendataguidelines/ W3C. (2015, February). Data on the Web Best Practices. Retrieved March 4, 2015, from http://www.w3.org/TR/dwbp/ Wang, R. Y. (1998). A product perspective on total data quality management. Communications of the ACM, 41 (2), 58–65. Zaveri, A. (2012). Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. Quality assessment methodologies for linked open data. Semantic Web Journal. Submitted On, 12, 14. Zuiderwijk, A., Janssen, M., Choenni, S., Meijer, R., & Alibaks, R. S. (2012). Socio-technical Impediments of Open Data. Electronic Journal of e-Government, 10 (2), 156–172. !68 http://doi.org/10.1016/j.future.2010.07.005 http://doi.org/10.1109/HICSS.2008.243 http://doi.org/10.1109/TrustCom.2013.73 http://sunlightfoundation.com/opendataguidelines/ http://www.w3.org/TR/dwbp/