123-09-1275 The Journal of Community Informatics ISSN: 1721-4441 Special Issue on Data Literacy: Articles Graphical Perception of Value Distributions: An Evaluation of Non-Expert Viewers’ Data Literacy An ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data. This is especially true when audiences have differing degrees of data literacy, and when the aim is to make sure that members of a community, who may differ on background and expertise, will make similar interpretations from data visualisations. In this paper we describe two user studies on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. In the first user study, we find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however, we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. In the follow-up study, we ask viewers questions about quantification of frequencies, and identification of most frequent values from different types of histograms and density traces showing one or two distributions of values. Arkaitz Zubiaga University of Warwick, United Kingdom Corresponding Author. arkaitz@zubiaga.org Brian Mac Namee University College Dublin, Ireland brian.macnamee@ucd.ie !138 Zubiaga, A., Mac Namee, B. (2016). Graphical perception of value distribution: an evaluation of non- expert viewers’ data literacy. The Journal of Community Informatics, 12(3), 138—159. Date submitted: 2015-09-30. Date accepted: 2016-06-06. 
 Copyright (C), 2016 (the authors as stated). Licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5. Available at: www.ci-journal.net/index.php/ciej/article/view/1275. http://www.ci-journal.net/index.php/ciej/article/view/1275 mailto:arkaitz@zubiaga.org mailto:brian.macnamee@ucd.ie http://www.ci-journal.net/index.php/ciej/article/view/1275 The Journal of Community Informatics ISSN: 1721-4441 This study reveals that viewers do better with histograms when they need to quantify the values displayed in a chart. Among the different types of histograms, interspersing the bars of two distributions in a histogram leads to the most accurate perception. Even though interspersing bars makes them thinner, the advantage of having both distributions clearly visible pays off. The findings of these user studies provide insight to assist designers in creating optimal charts that enable comparison of distributions, and emphasise the importance of using an understanding of the limits of viewers’ data literacy to design charts effectively. Introduction Although the definition of data literacy remains somewhat fluid (Koltay, 2015), (Calzada Prado and Marzal, 2013), most definitions include an ability to interpret the outputs from data analysis. For example, Harris (Harris, 2012) defines data literacy as “competence in finding, manipulating, managing, and interpreting data, including not just numbers but also text and images”; Beauchamp (Beauchamp, 2015) defines it as “the ability to interpret, evaluate, and communicate statistical information”; and Schield (Schield, 2004) as the ability “to access, assess, manipulate, summarize, and present data”. Many of the outputs from data analysis referred to in these definitions take the form of data visualisations. In fact the importance of data visualisation is included in a number of discussions on the characteristics of data literacy (Koltay, 2015), (Wright et al., 2012), (Womack, 2014). The level of literacy that members of the general public (who are not generally trained in statistics or data analytics) bring to different types of data visualisation is not always clear. Data visualisations, however, are now ubiquitous in everyday publications such as newspapers, magazines, television programmes, and online content (Heer et al., 2010). The presence of charts on Information and Communication Technologies (ICTs) is becoming more and more important as the Web is increasingly dominated by multimedia. On social media, in particular, content is often accompanied by charts and infographics to reinforce the intended message. The choice of an appropriate chart type for a particular dataset is extremely important as it can condition subsequent interpretation by viewers. Carefully selecting the chart type that most effectively allows readers to make accurate interpretations of the data is especially important when readers have differing levels of data literacy. In this work, we conduct two user studies to assess the effectiveness of different chart types for visualising one or more distributions of values. We conduct these two user studies using a crowdsourcing platform, which enables us to survey a large and diverse set of users who are not necessarily skilful in data analytics. In the first study, we ask viewers to validate the veracity of statements about the distributions of variables shown alongside different visualisations of these distributions. Among the five types of chart compared in this first user study, we find that histograms are not only the most complete in terms of details given, but also the chart type that leads viewers to the most accurate understanding of the underlying data. We also find, however, that viewers are not good at determining the limits of what can be understood about data from different chart types, i.e. they don’t know what they don’t know. !139 The Journal of Community Informatics ISSN: 1721-4441 In the second, follow-up study, we move on to compare two related types of charts, histograms and density traces, to assess the capacity of viewers to accurately interpret charts. This user study is in turn split into two smaller studies. In the first study, we compare viewers’ ability to interpret histograms with their ability to interpret density traces when each chart shows the distribution of a single variable. In the second user study, we examine the effectiveness of different ways of visualising the distributions of two variables together in a single histogram or in a single density trace when the aim is to compare the distributions of the two variables. We compare seven different types of charts that enable comparison of distributions: histograms with overlapped, mirrored, interspersed, stacked, or cumulative bars, and density traces that are overlapped, or mirrored. This study finds that histograms lead to more accurate interpretations in both cases (i.e. showing the distributions of one or two variables), particularly when the purpose is to quantify specific frequency values. Density traces are especially helpful, on the other hand, when we want the viewer to identify the overall tendency of values within a distribution. The findings obtained from these user studies give us deeper understanding that enables us to define guidelines that graphical designers can use to create charts that most effectively display the distributions of variables. These guidelines are intended to satisfy the graphical perception abilities of diverse communities of users, encompassing viewers of different skill sets, and making sure that the chart selected for a visualisation is correctly interpreted by as many viewers as possible. Related Work Research in best conveying messages extracted from charts has focused on several aspects, including automatic generation of text summaries from charts (Demir et al., 2010), (Moraes et al., 2013), identifying the core messages of charts (Corio and Lapalme, 1999), (Demir et al., 2012), adding context to charts (Heer et al., 2009), (Hullman et al., 2013), and studying perception of information from charts (Shah and Hoeffner, 2002), (Glazer, 2011). We focus on graphical perception, as the field that studies the visual decoding of information from graphical displays. One of the best known studies on chart perception is by Cleveland and McGill (Cleveland and McGill, 1984), who define a theory to examine the elementary perceptual tasks that viewers perform when looking at charts, as well as the extent to which they lead viewers to accurate understanding. More recently both Shah and Hoeffner (Shah and Hoeffner, 2002) and Glazer (Glazer, 2011) summarise three major factors that influence viewers’ interpretations of data visualisations: (i) the visual characteristics of a chart, (ii) a viewer’s knowledge about charts, and (iii) a viewer’s background and expectations of the content in the chart. The authors highlight, however, that no single chart type is necessarily better overall than any other, and new tasks might require careful studies to choose a suitable chart. In general, researchers have pointed out that creating appropriate charts so that viewers perceive the intended message is harder than it might at first seem, and that detailed study of the effectiveness of different chart types for different tasks is required (Friel et al., 2001), (Shah and Hoeffner, 2002). Furthermore, the literature does not contain extensive studies of how well viewers can interpret charts showing the distribution of a variable. Here we focus on the visual characteristics of a chart, and its influence on graphical perception when comparing distributions of variables. !140 The Journal of Community Informatics ISSN: 1721-4441 When it comes to displaying distributions of variables with the aim of enabling comparison between distributions, numerous types of charts have been suggested in the literature. While histograms are a well-established chart type for this task (Scott, 1979), the recent tendency has moved towards boxplots and derivatives of boxplots (McGill et al., 1978). One of the best known alternatives to the standard boxplot is the violin plot (Hintze and Nelson, 1998), which is an improved version of the boxplot that incorporates a density shape, i.e., a combination of a box plot and the density trace. The boxplot is considered to be a suitable simple chart that could be easily drawn manually (Hintze and Nelson, 1998), (Muthers and Matzarakis, 2010), but that lacks detailed information on the density of a distribution. As computational tools that facilitate chart creation emerged, however, displaying density shapes in charts has gained importance because of the additional detail provided. This has led to an increase in the use of density traces, given that they are computationally easy to create and they provide the details of a distribution that cannot be seen in boxplots. In recent years, there has been substantial discussion among researchers as to whether histograms or density traces are more suitable for displaying distributions of variables for exploratory data analysis, much of which has inclined towards the use of density traces, as histograms lack detail. For instance, Silverman (Silverman, 1986) and Izenman (Izenman, 1991) argue that histograms are a traditional way to provide a visual clue of the general shape of a distribution, but that they leave much to be desired when one needs to quantify the density of an observation in a distribution of values. Scott (Scott, 2009) adds that density traces provide the essence of conveying visual information of both the frequency and relative frequencies of observations, and thus they seem more intuitively suitable for data presentation purposes. Finally, Tukey (Tukey, 1977) resorts to histograms when he intends to display a single distribution of values, but makes use of density traces when comparing two distributions of values. In this work, we look at how these two types of charts, namely histograms and density traces, affect the graphical perception of the viewer, when the goal is to acquire basic understanding of variable distributions. Despite the high volume of research on graphical perception, we found no work studying graphical perception of multiple variable distributions in a single chart. Our work addresses this issue by comparing the ability of viewers to compare the distributions of two variables when looking at histograms and density traces, and explore different settings for an optimal visualisation. Our work also complements a recent study by Javed et al. (Javed et al., 2010) in a similar direction, who studied alternative visualisations of multiple time series, and found that separate charts are suitable for comparison across time series with a large visual span, and shared-space charts are more efficient for smaller visual spans. Our work focuses specifically on creating single charts that put together distributions. User Study 1: Visualising a Single Distribution In this section we describe a user study to determine the effectiveness of different chart types for illustrating the distribution of a single variable. Experimental Method In the basic unit in our experimental method a chart is shown to a participant along with an associated textual statement about the distribution of the variable represented in that chart. Participants rate how well they think the statement corresponds to the data represented in the !141 The Journal of Community Informatics ISSN: 1721-4441 chart. This is repeated for different combinations of three different factors: (i) the underlying distribution of the variable, (ii) the type of chart used to show the data, and (iii) the type of statement made about the data. Overall, we include four different variables, five distinct chart types, and four different types of statements. The four artificially created variables used in the study were: (i) ages of customers of an online movie service, (ii) ages of members of a youth sports centre, (iii) salaries of a city’s residents, and (iv) scores of students in an exam. Each variable exhibited a different distribution. Figure 1 shows histograms illustrating the distributions of these variables. Five commonly used chart types were selected for this study: (i) bar charts showing the average value of the distribution, (ii) bee swarms, (iii) boxplots, (iv) stacked bar charts, and (v) histograms. Figure 2 shows an example of each. All of the charts shown during the user studies were created using the R programming language , for which we provide details to 1 reproduce each of the charts under study. Figure 1: Histograms depicting the shapes of the data distributions under study. R: http://www.r-project.org/1 !142 http://www.r-project.org/ The Journal of Community Informatics ISSN: 1721-4441 The textual statements were of the following four types: (i) “the data ranges from X to Y”, (ii) “most data points fall around X”, (iii) “most data points fall under/over X”, and (iv) “data points are clustered to either side of X”. For each chart type and variable combination, two versions of each statement type were presented to participants: one that was true and one that was false. For example, for the data shown in the first histogram in Figure 1 the true statement “the data ranges from 30 to 42”, and the false statement “the data ranges from 25 to 45” were used. Figure 2: Examples of each chart type used for the user study. The combinations of variables (4), chart types (5), and statements (8) amounted to a total of 160 different tasks. Participants in our experiments were shown one task at a time and had to rate the accuracy of the statement shown on a five point Likert scale: strongly agree, agree, neutral, disagree, and strongly disagree. Additionally, participants could opt for an alternative choice impossible to tell from this chart. Tasks were presented in random order to control for !143 The Journal of Community Informatics ISSN: 1721-4441 learning effects. With 50 ratings collected for each of the 160 tasks, we gathered a total of 8,000 ratings. We conducted our experiments using crowdsourcing through Mechanical Turk. The use of a crowdsourcing platform such as Mechanical Turk for this study is motivated by Heer and Bostock (Heer and Bostock, 2010), who showed that it is an effective and reliable way in which to perform graphical perception studies. To take part in the study participants did not need to have any prior expertise in data analytics as we were interested in measuring the ability of average, non-expert viewers to interpret different chart types. We did, however, restrict participation to US-based participants to control for English language capability. We also restricted participation to participants with at least a 95% HIT acceptance rate, which is Mechanical Turk’s internal measure of how well participants perform tasks on the platform. A high HIT acceptance rate guarantees that participants have been deemed reliable in other experiments and filters automated bots. Table 1: Distribution of correct answers as defined in the ground truth (T: True, F: False, I: Impossible to tell). The ground truth for each task was manually annotated by the experiment designers, with the following distribution of responses: 42 cases were true, 54 were false, and 64 were impossible to tell. Table 1 shows the distribution of the ground truth responses, broken down by variable, chart type and statement type. The most important differences in these distributions relates to the chart types. All of the tasks showing a simple average bar chart fall into the “impossible to tell” category as the average bar chart does not provide enough evidence to assess the associated statements. With bee swarm charts and histograms it is possible, in all cases, to assess each statement. With boxplots and stacked bar charts it is possible to assess only some statements. !144 The Journal of Community Informatics ISSN: 1721-4441 Results We examine the data collected in these experiments in three ways: (1) inter-rater agreement to assess the level of agreement in the responses given by different participants; (2) accuracy to assess how well participant responses match the ground truth and (3) a confusion matrix to understand the types of errors made by participants. Inter-rater Agreement. We measure inter-rater agreement using Krippendorff’s alpha coefficient (Krippendorff, 2012). Overall, the 8,000 ratings show a fair level of inter-rater agreement of 0.39‑ . Table 2 shows inter-rater agreement values for each chart, statement, and 2 variable. We see two major differences here. Firstly, with regard to statement type, participants tend to agree when assessing the ranges of variables and whether variable values are above or below a given threshold; and tend to disagree when asked about values being clustered around a certain value. Secondly, with regard to chart type, participants showed a larger degree of agreement for bee swarms and histograms; and a much lower degree of agreement for the other three chart types. This is likely due to the high number of answers that are impossible to tell. Table 2: Inter-rater agreement values by item, and overall. Accuracy. To compute the accuracy values, we rely on majority voting, i.e., the rating that has been chosen by most participants. This allows us to choose a single rating from the 50 provided for each task. For the purposes of computing accuracy, we collapse ratings of agree and strongly agree to true, and ratings of disagree and strongly disagree to false . The final 3 accuracy values reported here refer to the number of cases in which the majority vote of ! We report the strength of agreement using the benchmarks suggested by Landis and Koch (Landis and Koch, 2 1977) for interpreting kappa. ! In fact, participants seemed reluctant to choose strong judgements, choosing agree and disagree much more 3 than strongly agree and strongly disagree. !145 The Journal of Community Informatics ISSN: 1721-4441 participants coincides with the ground truth. An overall accuracy value, and values broken down by variable, chart type and statement, are shown in Table 3. Table 3: Accuracy values by item, and overall. On statements of the types “data ranges from X to Y” and “points fall under/over X”, participants were substantially more accurate (90% and 75%, respectively) than for the other two types of statements (55% and 52.5%). More specifically we found that participants struggled with bar and stack charts when assessing “points fall around X” statements, and with stack charts when assessing “points clustered to either side of X” statements. Regarding the chart type, the most accurate answers were those for bee swarms and histograms (both above 90%). This is slightly surprising as these are relatively complex chart types for non-expert viewers. Even though both bee swarm charts and histograms potentially allow viewers to determine the veracity of all of the statements, and provide similar information, viewers seem to find it slightly easier to comprehend values from a histogram. Finally, participants struggled slightly to answer questions about the student scores data. This data has a bimodal distribution that could be more difficult for viewers to parse. Confusion Matrix. Table 4 shows a confusion matrix for all tasks (note that Imp. refers to responses of impossible to tell, that Neutral did not occur in the ground truth, and that cells marking correct responses are highlighted in bold). The precision for each category is also included. Most notably here, we observe that when the correct response was impossible to tell, participants mostly deemed statements false (45.9% of the time), or even true (24.9% of the time), and only identified 23.8% of the cases correctly. When the correct response was either true or false, participants again rarely chose impossible to tell as the answer. Taken altogether we believe that these results indicate that, although participants do well when !146 The Journal of Community Informatics ISSN: 1721-4441 assessing true cases (accuracy 75.8%) and false cases (72.1%), they have trouble when facing charts that do not enable them to determine the veracity of a statement and do not recognise this shortcoming. We were surprised that participants did not use the neutral choice in these cases (the neutral response was only used in 6% of cases). Table 4: Confusion matrix for all the tasks combined (in %). Overall, the main finding from this study was that viewers found histograms were the easiest to interpret of the five chart types studied. Using this finding, we designed a follow up study to determine the ability of viewers to compare the distributions of two variables when visualised using either histograms or density traces. User Study 2: Expanding Charts to Visualise Multiple Distributions We set out to conduct user studies to measure the ability of viewers to interpret differences between two variable distributions for the purpose of exploratory data analysis (Tukey, 1977). We split this exploration into two user studies. The first study investigates whether viewers find histograms or density traces easier to interpret when viewing the distribution of a single variable. Once this is established, the second study investigates which chart type is most effective for comparing the distributions of two different variables. We also conducted these user studies through the Amazon Mechanical Turk crowdsourcing platform. Given that we wanted to conduct the user study with viewers that were not necessarily skilful in data analytics, and that we were looking at relatively simple perception tasks involving the quantification of values from charts, the use of a crowdsourcing platform presented a suitable environment for our purposes. During both of these user studies, we set up the tasks on Mechanical Turk without restrictions on the expertise of participants in terms of their ability to decode charts. In order to make sure that participants were reliable, we again restricted participation to participants with a HIT acceptance rate of at least 95%, and also a number of completed tasks of at least 100. These settings have been found to be suitable to prevent participants who cheat (Heer and Bostock, 2010). Our basic experimental unit consisted of showing a chart to a participant and asking them a series of questions about the chart in order to assess the accuracy with which they could interpret it. For each experimental unit we first showed the participant an entry page that displayed the chart in question, along with instructions explaining that they would be asked a series of questions about the distribution of the variable displayed in the chart. Figure 3 shows an example of this entry page. Once a participant clicked on the Start button, they were shown a question displayed next to the chart. Participants had to provide an answer to the question !147 The Journal of Community Informatics ISSN: 1721-4441 and then click ‘Next’ to proceed to the next question. This process repeated until the participant has answered all of the questions associated with the chart. Each experimental unit was completed by 50 different participants. For each participant, we collected the answer they provided to each question, as well as the response time measured from the moment they first saw the question to the moment they clicked ‘Next’ having answered the question. Figure 3: Entry page of the user study shown to participants. In this paper, we report three values in the results section: (1) response time in seconds, (2) accuracy of respondents (calculated as the percentage of participants that provided an answer within a 10% error rate of the ground truth , and (3) the error rate for participants that were 4 accurate according to point 2. Note that the error rate is computed as the relative difference between the correct answer and participant’s answer, i.e., the deviation from the correct answer, so we can measure how close the responses were for those that were exactly or nearly accurate. We differentiate between accuracy and error rate given that our purpose is to measure both the ability of viewers to find the answer in the chart –for example, instead of mixing up axes and giving an answer for the wrong axis–, and the ability to provide a precise answer for those who found it. The charts shown during both user studies display data from the Vietnamese Living Standard Study (VLSS), which is available online , and has been used for instance by Tukey (Tukey, 5 1977) for exploratory data analysis. The data from the VLSS contains household per capita We used 10% as a reasonable percentage to consider that participants were able to identify where to 4 get the answer from in the chart, and their response was close enough to the correct answer. Likewise, this enables us to compare responses for histograms and density traces, which display different scales of values. VLSS Data: http://www.tc.umn.edu/~zief0002/Comparing-Groups/Data/VLSSperCapita.csv 5 !148 http://www.tc.umn.edu/~zief0002/Comparing-Groups/Data/VLSSperCapita.csv The Journal of Community Informatics ISSN: 1721-4441 expenditures for 5,999 Vietnamese households, divided into rural and urban areas. This enabled us to separate expenditure values into these populations, and thus to show two distributions of values that viewers had to compare. All of the charts shown during the user studies were created using the R programming language, for which we provide details to reproduce each of the charts under study. User Study 2.1: Comparing Histograms and Density Traces for Visualising a Single Distribution In the first of our user studies, we compare the ability of participants to interpret visualisations of the distribution of a single variable using histograms and density traces. This section describes the experimental method used for this study and the results of the study. Experimental Method This study used the experimental method described at the beginning of Section 4. Viewers were asked to answer questions about two different types of charts: Histograms Histograms (cf. (Scott, 1979), (Guha et al., 2001)) are a very commonly used way to graphically represent the distribution or a variable. A histogram shows tabulated frequency values that give the gist of how data is distributed. Given that they present tabulated frequency values, the width of each of the bins in the plot must be predefined. Among the numerous methods to define the bin width (Wand, 1997), we relied on Sturges’ rule (Sturges, 1926) to create the histograms for our study. As a result, expenditure values ranging from 0 to 1800 for the urban population, and from 0 to 3100 for the rural population, were split into bin widths of 100. We used R’s hist() function to plot histograms. Previously, it has been found that the orientation of the bars in a histogram has an effect on the viewers’ perception. Fischer et al. (Fischer et al., 2005) concluded that vertical bars allow viewers to react quicker and make decisions faster than horizontal bars. Therefore, in this work we focus on histograms displaying vertical bars. Figures 4(b) and 4(d) show histograms displaying the distributions for expenditure values obtained from the VLSS data, one for the rural population and one for the urban population. One advantage of histograms is that bins facilitate quantification of the frequency for each range of values. As has been pointed in the literature, however, the lack of a detailed visualisation of more points of the distribution can make accurate perception by viewers difficult. Density traces: Different from histograms, density traces do not use tabulated data, and instead the distribution is visualised as a continuous, single line that depicts how frequencies change across the range of possible values. To draw a density trace a kernel function is required to extract frequency values and draw the line. In this case, we rely on the commonly used kernel method introduced by Epanechnikov (Epanechnikov, 1969). We used R’s density() function to plot density traces. Figures 4(a) and 4(c) show density traces displaying distributions for !149 The Journal of Community Informatics ISSN: 1721-4441 expenditure values obtained from the VLSS data, one for the rural population and one for the urban population. Density traces present the advantage over histograms that a more detailed representation of the whole distribution is shown which, it has been suggested, makes them a more suitable chart for visualising both the frequency and relative frequencies of observations. One could also expect, however, that quantification of frequencies for specific points might be difficult for viewers from a curved density trace. With a focus on identifying the overall tendency of each distribution of values, and quantifying frequencies depicted in the charts, we asked participants to provide the following values based on interpreting the displayed charts: • Minimum expenditure value in the distribution. • Maximum expenditure value in the distribution. • Most frequent value (MFV) in the distribution. • Frequency value for an expenditure of $200. • Frequency value for an expenditure of $500. • Frequency value for an expenditure of $1,000. Results Tables 5, 6, and 7 show the average results for accuracy, error rate, and response time for histograms and density traces from this study. To compute the average response times, we remove response times above the 95th percentile, and those below the 5th percentile, to control for outliers. Finally, we also show the average accuracy values, error rates, and response times for a chart combining all the questions, which helps us assess the overall performance with each type of chart. Note that the averages for error rates are not necessarily the arithmetic mean of the error rates for all the questions with that chart, since different number of participants might be accurate and thus be considered for computing the error rates; therefore, it represents a weighted mean for all accurate responses to each question associated with a chart. The accuracies achieved depended on the type of question being asked. Viewers were able to more accurately use density traces when responding to questions about minimum and maximum values, as well as most frequent values. However, when responding to the other questions about frequency values viewers were more accurate when interpreting histograms. The only exception is the frequency for x = $200, where viewers were slightly more accurate, but still relatively similar, when looking at density traces. This exception in x = $200 happens to be a point with no tick mark in the X-axis, which might have made it more difficult for viewers to identify. As we hypothesised above, it appears that the fact that density traces are curved lines complicates quantification of frequency values for specific points, but facilitates identification of trends and therefore finding points such as the most frequent value. !150 The Journal of Community Informatics ISSN: 1721-4441 Table 5: Accuracy values for User Study #1 Table 6: Error rates for User Study #1 Table 7: Response times (in seconds) for User Study #1 Overall, putting together the results for all types of questions, viewers were on average more accurate when making interpretations from histograms than from density traces. This accuracy gain with histograms is also reflected in error rates. The error rates for histograms are also slightly lower (with the only exception of the frequency for $200), which suggests histograms as a more suitable chart than density traces for viewers to make accurate interpretations. Overall, there does not seem to be a clear difference in terms of response times between histograms and density traces. On average, viewers spent only 3.4% more time in answering to questions associated with histograms, which is reflected in a 6.4% improvement in terms of accuracy. !151 The Journal of Community Informatics ISSN: 1721-4441 User Study 2.2: Comparison of Two Distributions Having seen that histograms convey more accurate interpretations than density traces when it comes to a single distribution, in a follow-up user study we looked at performance of viewers when comparing two distributions. Viewers do better in quantifying interpretations from a single histogram, but how should two histograms or two density traces be put together in a single chart to optimise perception? Since two plots can be arranged in different ways in a single chart, we study the effect of these arrangements on the perception of viewers. Experimental Methods This study also used the experimental method described at the beginning of Section 4. In this study viewers were asked to answer questions about six different types of charts: Figure 5: Histograms with different settings shown to participants of the User Study #2. Overlapped histograms: In an overlapped histogram, one of the histograms is superimposed on top of the other, with both lying in the X-axis (as in the first user study, based on the literature (Fischer et al., 2005), we assume that vertical bars are more suitable for viewers than horizontal bars). In order for both histograms to be seen, they are made slightly transparent. We do this by using R’s hist() function including an alpha = 14 parameter in the set of colours being used. Again, we established the bin width for histograms by following Sturges’ rule (Sturges, 1926) (this approach was used for all histograms in this user study). Figure 5 (a) shows the resulting overlapped histogram for the VLSS data. !152 The Journal of Community Informatics ISSN: 1721-4441 We expect that overlapped histograms will enable comparison of frequencies for both distributions, but the fact that one of the distributions slightly complicates visualising the other distribution might complicate their differentiation. Overlapped density traces: Density traces can also be superimposed on top of each other, as shown in figure 5 (f). We used R’s plot() and lines() functions combined with the density() function, using Epanechnikov’s kernel (Epanechnikov, 1969), to plot these charts (this approach was used for all density traces in this user study). Having two density traces sharing the same space could aid comparison and, different from overlapping histograms, overlapping density traces do not hinder the visualisation of the lines typically occlude one another. Mirrored histograms: In order to avoid overlapping histograms, in a mirrored histogram one of the histograms is mirrored downwards from the X-axis. In spite of pointing downwards, the length of the bars in the bottom histogram also represent positive values. To draw mirrored histograms we used R’s hist() function after inverting the values for one of the histograms. Figure 5 (b) shows the mirrored histogram for the VLSS data. We expect that mirrored histograms will facilitate clear visualisation of both distributions without any overlap, but that the quantification of bars pointing downwards from the X-axis could be more challenging for viewers. Mirrored density traces: Similarly, density traces can be mirrored so that one of them is drawn downwards from the X- axis. Figure 5 (g) shows mirrored density traces for the VLSS data. To draw this we used R’s plot() and lines() functions after inverting the values for one of the density traces. Similar to the advantage offered by mirrored histograms, we expect that mirrored density traces might facilitate visualisation of both lines separately avoiding possible confusion, but may make it more difficult for participants to perform comparisons between both distributions, as they do not share the same space. Interspersed histograms: Bars for two variables are interspersed in a single histogram, so that for each range of values two bars are shown next to each other, one for the frequency of each variable in that range. Figure 5 (c) shows the resulting interspersed histogram of the VLSS data that we showed to the participants in the study. We created this chart using the multhist() function from the ‘plotrix’ package in R. We expect that interspersed histograms will facilitate visualisation of both distributions, as they do not occlude each other. This, however, is at the cost of halving the horizontal space physically available for the width of the bars, which might have a negative effect in on visual perception of viewers. !153 The Journal of Community Informatics ISSN: 1721-4441 Stacked histograms: In a stacked histogram the bars for one distribution lie on top of the bars of the other distribution. This means that the frequency values for one of the distributions do not count from the X-axis, but from an upper point on top of the bar for the other distribution. We created stacked histograms using R’s histStack() function. Figure 5(d) shows the resulting stacked histogram for the VLSS data that we showed to the participants of the study. We expect that when using stacked histograms it will be easier for viewers to differentiate histograms from each other than when overlapped histograms are used. It may however be more challenging for viewers to quantify the height of bars, as significant cognitive effort is required (viewers need to subtract the height of one bar from the other). Cumulative histograms: Each bar in a cumulative histogram represents the cumulative frequency for all smaller values, instead of representing just the value for that specific range. For instance, the third bar from the left for a distribution represents the aggregation of the frequencies for the first, second, and third bars. Consequently, the increase of a bar with respect to the previous bar actually represents the frequency of that specific range. Figure 5 (e) shows the cumulative histogram for the VLSS data that we showed to the participants of the study. We created this chart using R’s hist() function, which received the outcome of applying the cumsum() function to the histogram’s data values. We expect that cumulative histograms will facilitate differentiation between the two distributions, but the fact that frequencies are summed will complicate quantification of specific frequencies. Each time a visualisation was shown to participants we requested the following values as the input from participants in the user study: • Most frequent value (MFV) for each distribution. The most frequent value is different for the rural and urban populations, and thus viewers need to precisely identify each population’s most frequent value. • Frequency values for specific data points in both distributions. We asked for the frequency for expenditure values of $200 and $500, for both populations. The main difference between these two cases is that $500 has a tick mark in the X-axis, while $200 does not. This might make a difference in the interpretation from viewers, making it potentially more difficult to position a value when there is no tick mark. Results Tables 8, 9, and 10 show the average results for accuracy, error rates, and response times for the charts under study. 
 !154 The Journal of Community Informatics ISSN: 1721-4441 Table 8: Accuracy values for User Study #2 Table 9: Error rates for User Study #2 Table 10: Response times (in seconds) for User Study #2 As expected, viewers were not able to accurately interpret cumulative histograms. The fact that the frequency for each range of values has to be calculated by subtracting the frequency for the previous range confused viewers, misleading their perception. Accuracy values of around 20% were achieved in most cases, either when looking for most frequent values, or when quantifying frequency values. Viewers were clearly more accurate with the rest of the charts, achieving average accuracies higher than 70%. Overall viewers managed clearly better interpretations from interspersed histograms, achieving an accuracy of 85%. Displaying thinner bars gives the advantage of making both bars clearly visible without any overlap, and easily quantifiable without the need to stack bars. Still, viewers did quite well with overlapped and stacked histograms, achieving 77% and 78% accuracy rates, respectively. These two types of histograms led to better perceptions from viewers than mirrored histograms, where both distributions are visible with no overlaps. The fact that one of the distributions is mirrored downwards seems to have damaged quantification of frequency values for viewers. If we look at the accuracy of responses by type of question, there is a noticeable lower performance when providing values for frequencies of $200 than for frequencies of $500. Again, the fact that $200 does not have a tick mark in the X-axis appears to be misleading viewers. Adding more tick marks in the X-axis as long as space allows should help boost !155 The Journal of Community Informatics ISSN: 1721-4441 performance when quantifying values that are on or close to those tick marks. It is certainly key to think of the specific points in which tick marks have to be added in order to guarantee that the intended message is correctly conveyed. With the density traces we see a similar trend as when viewers looked at a single distribution, i.e., viewers were highly accurate when identifying most frequent values (slightly more accurate even than the best of the histograms), but the performance when quantifying specific frequency values is poorer, which also drops the overall performance. Density traces are therefore a suitable visualisation when the intention is to emphasise the central tendency of a distribution. However, histograms are more suitable when we want viewers to interpret more specific values shown in the distribution. Looking at the response times, it can be seen that viewers needed more time to respond to questions about most frequent values, than for questions about specific frequency values. This reinforces our conclusions from the first user study that viewers seem to feel more comfortable with histograms when quantifying frequency values, but are not as comfortable when looking at the tendency of values to identify the most frequent value. Discussion In this work, we have conducted two user studies to assess viewers’ data literacy when interpreting a distribution of values displayed in different types of charts. In the first study, we have studied the suitability of five different types of charts to visualise a single distribution of values. In a follow-up study, we have delved into different types of histograms and density traces to assess viewers’ literacy not only with a single distribution of values, but also when putting two together with the aim of comparing them with each other. We have used a crowdsourcing platform to conduct these studies, without restricting users by their level of expertise, and therefore allowing participation from users with differing levels of data literacy. In the first user study, we have seen that histograms allow the most accurate interpretations— viewers achieved 97% accuracy from histograms, compared to 91% with bee swarms, and lower than 60% for the other charts— and are an appropriate choice of chart type when visualising the distribution of a variable for an average, non-expert audience. This reinforces previous findings from Meyer et al. (Meyer et al., 1997) and Zacks and Tversky (Zacks and Tversky, 1999) concluding that bar charts are a suitable visualisation medium to support reading exact values, identification of maxima, and describing contrasts in data. More interestingly, this study highlighted a shortcoming in the ability of average, non-expert viewers to recognise the limitations of different chart types—viewers don’t know what they don’t know. This is a significant issue as it means that there is a strong possibility that viewers are likely to make incorrect inferences from charts, or that they can be very easily misled using charts. This finding reinforces the need to carefully design charts for different tasks (Shah and Hoeffner, 2002), (Glazer, 2011) and highlights a shortcoming in the data literacy of non-experts. Another interesting point arising from the apparent effectiveness of histograms compared to bee swarms is that it reinforces the finding by Fischer et al. (Fischer et al., 2005) that viewers !156 The Journal of Community Informatics ISSN: 1721-4441 find it easier to interpret vertical bars (present in histograms) than horizontal bars (present in bee swarms). We also believe that there might be a difference between centring the data points in a bee swarm around a virtual vertical axis in the middle of the chart, and placing the data points upwards starting from the X-axis in a histogram. The gap between two bars lying on the same axis can be easily quantified visually, while the gap between two bars centred on an axis is halved on both sides of the bar making it more difficult to quantify. The alignment of the bars with respect to the axis might affect perception—this warrants further study. In the second user study, our results suggest that histograms are overall more suitable than density traces to display distributions of values to viewers with different levels of expertise and not necessarily trained in data analytics, especially when the main purpose is quantification of specific frequency values. Density traces have shown instead to be more suitable to emphasise the tendency of values underlying a distribution. In a follow-up user study, we have identified that interspersing bars of the two distributions plotted in a histogram leads to optimal perception when comparing distributions. Other alternatives such as overlapping, stacking, and mirroring bars in histograms led to much less accurate perceptions, while cumulative histograms showed to be by far the worst option. The findings of these user studies provide insight towards defining guidelines to assist graphical designers in optimal creation of charts that enable comparison of distributions. The fact that our user studies have been conducted with non-expert users whose level of expertise has not been restricted makes our guidelines suitable to be applied to communities of users with different degrees of data literacy. Future work includes deepening the comparison of value distributions, by looking into more challenging cases where three or more distributions need to be compared, given that histograms with increasing numbers of distributions might require different approaches. Another aspect that has not been dealt with in this work, and would be a sensible objective to pursue would be to break down the user study into different demographic groups to better understand how perception would affect people of different ages, cultures, etc. Acknowledgments This work was supported by the Enterprise Ireland and IDA Ireland Technology Centres programme at CeADAR, the Centre for Applied Data Analytics Research. 
 References Beauchamp, A. (2015). What is data literacy? Databrarians. February, 12. Calzada Prado, J. and Marzal, M. A. (2013). Incorporating data literacy into information literacy programs: Core competencies and contents. Libri: International Journal of Libraries & Information Services, 63(2):123 – 134. Cleveland, W. S. and McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387):531–554. Corio, M. and Lapalme, G. (1999). Generation of texts for information graphics. In Proceedings of EWNLG’99, 49–58. !157 The Journal of Community Informatics ISSN: 1721-4441 Demir, S., Carberry, S., and McCoy, K. F. (2012). Summarizing information graphics textually. Computational Linguistics, 38(3):527–574. Demir, S., Oliver, D., Schwartz, E., Elzer, S., Carberry, S., and McCoy, K. F. (2010). Interactive sight into information graphics. In Proceedings of W4A,16. ACM. Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 14(1):153–158. Fischer, M. H., Dewulf, N., and Hill, R. L. (2005). Designing bar graphs: Orientation matters. Applied Cognitive Psychology, 19(7):953–962. Friel, S. N., Curcio, F. R., and Bright, G. W. (2001). Making sense of graphs: Critical factors influencing comprehension and instructional implications. Journal for Research in Math. Education, 124–158. Glazer, N. (2011). Challenges with graph interpretation: A review of the literature. Studies in Science Education, 47(2):183–210. Guha, S., Koudas, N., and Shim, K. (2001). Data-streams and histograms. In Proceedings of the thirty- third annual ACM symposium on Theory of computing, 471–475. ACM. Harris, J. (2012). Data is useless without the skills to analyze it. Harvard Business Review, 13. Heer, J. and Bostock, M. (2010). Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 203–212. ACM. Heer, J., Bostock, M., and Ogievetsky, V. (2010). A tour through the visualization zoo. Communications of the ACM, 53(6):59–67. Heer, J., Vi ́ egas, F. B., and Wattenberg, M. (2009). Voyagers and voyeurs: Supporting asynchronous collaborative visualization. Communications of the ACM, 52(1):87–97. Hintze, J. L. and Nelson, R. D. (1998). Violin plots: a box plot-density trace synergism. The American Statistician, 52(2):181–184. Hullman, J., Diakopoulos, N., and Adar, E. (2013). Contextifier: automatic generation of annotated stock visualizations. In Proceedings of CHI, 2707–2716. ACM. Izenman, A. J. (1991). Review papers: Recent developments in nonparametric density estimation. Journal of the American Statistical Association, 86(413):205–224. Javed, W., McDonnel, B., and Elmqvist, N. (2010). Graphical perception of multiple time series. IEEE Transactions on Visualization and Computer Graphics, 16(6):927–934. Koltay, T. (2015). Data literacy: in search of a name and identity. Journal of Documentation, 71(2): 401–415. Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Sage. Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159–174. McGill, R., Tukey, J. W., and Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32(1):12–16. Meyer, J., Shinar, D., and Leiser, D. (1997). Multiple factors that determine performance with tables and graphs. Human Factors: The Journal of the Human Factors and Ergonomics Society, 39(2):268–286. !158 The Journal of Community Informatics ISSN: 1721-4441 Moraes, P. S., Carberry, S., and McCoy, K. (2013). Providing access to the high-level content of line graphs from online popular media. In Proceedings of W4A, 1–10. ACM. Muthers, S. and Matzarakis, A. (2010). Use of beanplots in applied climatology a comparison with boxplots. Meteorologische Zeitschrift, 19(6):641–644. Schield, M. (2004). Information literacy, statistical literacy and data literacy. IASSIST Quarterly, 28(2/3):6–11. Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3):605–610. Scott, D. W. (2009). Multivariate density estimation: theory, practice, and visualization, volume 383. Wiley. com. Shah, P. and Hoeffner, J. (2002). Review of graph comprehension research: Implications for instruction. Educational Psychology Review, 14(1):47–69. Silverman, B. W. (1986). Density estimation for statistics and data analysis, volume 26. CRC press. Sturges, H. A. (1926). The choice of a class interval. Journal of the American Statistical Association, 21(153):65–66. Tukey, J. W. (1977). Exploratory data analysis. Reading, Ma, 231. Wand, M. (1997). Data-based choice of histogram bin width. The American Statistician, 51(1):59–64. Womack, R. (2014). Data Visualization and Information Literacy, volume 38. Wright, S., Fosmire, M., Jeffryes, J., Stowell Bracke, M., and Westra, B. (2012). A multi-institutional project to develop discipline-specific data literacy instruction for graduate students. Libraries Faculty and Staff Presentations, Paper 10. Zacks, J. and Tversky, B. (1999). Bars and lines: A study of graphic communication. Memory & Cognition, 27(6):1073–1079. !159