, . microsoft word has an option “text alternative” to add a description of a table or figure for visually impaired people, who will use screen readers for reading the document. adobe acrobat reader also has an accessibility pane to tag tables and add alternative text and descriptions of tables, which is used by the nvda screen reader to read aloud. moreover, commonlook office, whose motto is “build accessibility into documents early,” has add-ins for microsoft word or powerpoint to add enough accessibility content to the documents to information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 9 table 1. solutions and libraries for table extraction and processing. s no. tools open source image based comments 1 tabula y n extracts data tables from pdf and saves as csv or excel spreadsheet. it works on native pdf files and cannot extract scanned tables. it supports multiple platforms but does not support batch processing. 2 pdftables n n extracts page, table, table row, and even table cell. it is a fully automated api. it supports multiple platforms and multiple programming languages. 3 docparser n y extracts information from images and forms. it is a cloud-based application and supports batch processing. it parses the documents and offers more features but needs human intervention. it shows poor accuracy in handwritten application forms. 4 pdftron n n supports multiple platforms and multiple programming languages. 5 camelot y y a python library that extracts table from images. it has built-in ocr. 6 excalibur y y a web-based solution which is powered by camelot. 7 pypdf2 y n a python library that can do batch processing with multiple files. 8 pdfplumber y y a python library built on pdfminer. 9 pdf table extractor y n a web-based tool built on tabula. it supports scraping of multiple page tables and comparison of cell values. 10 pdfminer y n a python library that extracts information like location, fonts, and lines of the text. it focuses on analyzing text. it has a pdf parser. it figures out the semantic relationships among structured tables. make the resulting pdf accessible. however, already-developed unstructured documents, without any accessibility features, still need some measures to make the documents understandable to visually impaired or blind users. keeping in mind the statistics of visually impaired people and the unstructured data of the future—the global data sphere will grow from 33zb to 175zb and 80% of this worldwide data will be unstructured—visually impaired individuals cannot be ignored for their access to knowledge.68 therefore, we would need mechanisms for making these unstructured documents understandable to as many people as possible by incorporating accessibility measures in the document readers. the following section highlights some of the key issues in this domain. issues and challenges in the existing systems tables can be utilized in multiple scenarios including information extraction, table search, ontology engineering, conversion to dbms, and document engineering. 69 the situation becomes difficult when a blind or visually impaired person needs to understand the tables. the issues and challenges in dealing with pdf tables are categorized in the following sections. https://tabula.technology/ https://pdftables.com/ https://docparser.com/ https://www.pdftron.com/ https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 10 table structure tables in pdf documents need more focus on table structure detection because they do not follow a defined formal structure.70 several knowledge gaps are identified in literature regarding table structure, such as the identification of functional areas of tables, for which silva argued the use of multiple heuristics and machine learning algorithms in parallel or in sequence.71 the variety of structural layouts creates problems in their identification, which can be handled by defining more rules at the lexical and syntactic layer of table processing. this could also be fruitful for better semantic annotations.72 in addition, the variety of cell content or inconsistent cell content, along with implicit header cells, creates problems in understanding the tables, especially by machines.73 the vector representation of web tables may be applied to pdf tables for semantic annotations and identification of column types.74 along with that approach, graphical representation and a graphical neural network (gnn) can also be used for better structure identification in multiple domains.75 new data sets need to be introduced for structure recognition in various domains, including business and finance, as they use a huge amount of tables in their documents.76 from the discussion above, the table structure inconsistencies, cell content inconsistencies, functional and logical processing of tables needs more research effort to eliminate the stated problems. along with that, the inclusion of more data sets will also help in handling the diversity in the field. table formats the existing format of tables in pdf lacks the metadata needed for further processing; therefore, the conversion of pdf tables to other formats, especially open formats, will open new endeavors. some researchers have worked on converting tables to csv format, which retains the basic structure but lacks some cell formatting. researchers worked on the transformation of web tables to relational tables for easy manipulation.77 in contrast, xml can handle complex data and is more easily read by humans. therefore, a methodology is presented to work on tables in xml format, but it considers tables having text and numerical data only.78 json, another format, can also be used as an alternative to xml; it is smaller in size than the xml and can handle complex and hierarchical data. the json format has less support than xml but is preferred for web application due to its interoperability and lightweight features. table interpretation the variable representation patterns of table values, dense content and natural languag e processing create problems in the correct interpretation of tables.79 anaphoric resolution techniques and documenting level discourse parsers are suggested to handle complex references among multiple domains.80 moreover, handling the locality features of a table and the annotation of its property feature can lead to better interpretation of tables.81 the use of a knowledge base is suggested for understanding and annotating the relationships among tables and text to get more information about the extracted entities from tables and text.82 similarly, the extraction of data and its precision in medical and financial tables is an issue that needs the attention of researchers, as both fields have crucial and important data in its tables. 83 for easy interpretation of tables, machine learning classifiers, based on table headings and captions, can be used to classify them into their respective domain.84 the relationship of tables in a specific domain and or among multiple domains can be achieved by developing ontologies.85 this will enable the tables to be published on an lod cloud that will establish more relationships and infer insights from multiple domains. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 11 table evaluation most of the researchers working on pdf tables have tried to evaluate their work with popular data sets such as icdar 2013, icdar 2015, icdar 2017 pod, pubmed, unlv, and mormont. as we have pdf documents in multiple domains, therefore, new data sets should be introduced for structure recognition, especially in business and finance, as these domains use a large number of tables in their documents.86 an evaluation methodology was proposed for table detection, structure recognition, and its functional and semantic analysis.87 unfortunately, there are no standard metrics, parameters, and formal methodology for table processing evaluation.88 therefore, standard evaluation metrics should be defined for pdf tables, in order to standardize the evaluation of algorithms and frameworks. table presentation to blind and visually impaired users the available tools and techniques for reading aloud documents to blind and visually impaired people either read the table caption only and ignore the content or treat the tables as text and read the rows line by line. this does not help these users to understand the semantics of the table and its content. besides the content of the table, its layout shows grouping and connections among the content which is not presented to blind and visually impaired people by current solutions.89 therefore, tools and screen readers need to present tables in nonvisual format or give a summarized view of tables by following the guidelines of w3c, instead of reading the table like text.90 the summarized view of tables can become part of bibliographic metadata and can contribute in cataloging in the perspective of linked and open data. 91 a study highlighted the accessibility of published pdf articles by four journal publishers and presented the findings in graphs to show the trend from 2009 to 2013, by taking parameters including meaningful title, alternate text for images, and logical reading order.92 the author further applied the same methodology to analyze the articles published in next four years (2014 to 2018) and came to the conclusion that accessibility of pdf documents had improved. however, the journal publishers , who should be more aware of disability and accessibility, did not consistently follow the pdf/ua accessibility requirements and wcag 2.0 when producing pdf versions of their articles.93 therefore, visually impaired individuals should be provided with a mechanism for understanding the digital content and underlying semantics at multiple levels of abstractions, like the general information about the document and its elements—including tables—its structure and content, navigation in the table, and querying the table to get more details and lessen cognitive overload. accessibility of digital library collection the accessibility of large-scale digital library collections can enhance content for sighted as well as visually impaired users. the traditional utilization of digital library collections needs to be broadened by making computation-ready collections meant to be used and consumed in multiple domains.94 an effort was made by researchers to digitize and archive a digital repository of images and convert them to pdf/a documents but, unfortunately, the researchers came up with limited semantics as they did not consider the elements within the documents themselves.95 the accessibility of these converted documents may be compromised with these limited semantics. the rich semantics of tables can be used in the bibliographic classification of a digital library’s collection to increase the search width of the digital library.96 blind and visually impaired users can be assisted in using digital libraries, as they may need help at physical and cognitive levels. at the physical level, the blind may face difficulty in accessing information, identifying path and status, and efficiently evaluating information. at the cognitive level, they may face problems in understanding multiple structures, programs, information, features of the digital library, and the need to stick to some specific formats. therefore, the inclusion of help features will make the information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 12 digital library friendly to blind and visually impaired people by incorporating meaningful descriptions for nontextual elements.97 the sight-centered nature of the digital library creates problems for blind and visually impaired users due to missing textual or verbal instructions. some researchers identified the inclusion of labels and meaningful descriptions for hyperlinks, instructions, structure, multimedia content and nontext content to make digital libraries friendly to blind and visually impaired people.98 at the same time, others argue for improvement in usability by introducing help features in terms of usefulness, ease of use, and user satisfaction.99 the accessibility of digital libraries in general and its content in specific may be improved by accommodating help features in the interface and meaningful descriptions for the contents’ nontext elements including tables. conclusions and future research directions this study discusses the accessibility of tables included in pdf documents in general as well as in the specific environment of digital libraries. existing frameworks, algorithms, and solutions for the processing and interpretation of pdf tables, specifically their presentation to blind and visually impaired people, are thoroughly discussed. a general workflow of table processing is also presented in figure 1. the available solutions for reading out pdf documents to blind and visually impaired people are analyzed for their output, specifically for their attitude towards handling tables. furthermore, a list of resources for table interpretation and presentation are discussed along with their different features. the issues and challenges in table structure, format, interpretation, evaluation, its presentation to blind, and accessibility of digital library collection are discussed. the researchers working in the domain of accessibility, digital library, and pdf tables can extend and modify the current solutions and algorithms by following the future research directions given below. • the structure of a table has implicit semantic information which a sighted reader can infer but a blind reader needs assistance to understand. the structure of a pdf table is extracted using multiple approaches like heuristics, ontologies, machine learning and segmentation, whereas vectors are used for a web table.100 therefore, the combinations of multiple approaches and use of vectors for pdf tables may produce better results. • the content of a table is usually numeric or very short text and needs proper interpretation. therefore, a knowledge base can be used to get more information about the extracted entities from tables and text in order to understand and annotate the relationships among tables and text.101 these knowledge bases can be predetermined or may be selected automatically according to the table content or domain. • table interpretation can become easy if tables are classified according to their domains by using machine learning classifiers. the classification can be based on table headings and captions, as well as the title and author of the document.102 • ontologies are used to relate the tables in a specific domain and or among multiple domains, and publishing them on an lod cloud will establish new relationships.103 this will help in inferring new insights from complex, long, and numerical tables. • unstructured data and content can be made available for multiple usage and interpretations if it is converted to open formats like csv, json and xml.104 among these, csv comes with repeated content, xml needs special parsers, whereas json is lightweigh t and easy to write and read.105 it has support from nosql databases like mongodb and apache couchdb, and web application apis like twitter, you tube, and facebook. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 13 therefore, json might be a better option for the conversion of pdf tables for its multiple interpretation and navigation within tables. • the processes used for evaluation of tables have no defined matrices.106 therefore, the table evaluation processes should be defined with their respective matrices in order to standardize the research in this domain. • the precision of extracted content of table is very crucial especially in medical, financial, and experimental tables that have numeric data. therefore, the preprocessing of tables or conversion to other formats would need more attention to avoid any truncation or round off of the data. • the presentation of tables to blind or visually impaired people can be in nonvisual or summarized form.107 the summaries may be presented nonvisually, including the structural layout as well as a brief introduction of the table, to minimize the cognitive overload on these individuals. • to evaluate the accessibility of digital library interfaces, 16 heuristics were proposed to make the digital libraries in reach of users, however, more heuristics are needed to make generalized interfaces for all individuals.108 • the nontext elements of digital library collections should have meaningful descriptions for better understandability of blind and visually impaired individuals. the user-generated content about these nontext elements could be used for cataloging.109 • the rich semantics of tables can be exploited for cataloging and classification that will be helpful in exploratory searching. • as the michigan state university libraries has taken the initiative of assessing and improving the accessibility of digital library content by adopting the wcag guidelines, other libraries can also adopt the model for providing accessible content to their users including blind and visually impaired individuals. • the development of new data sets for tables in multiple domains can facilitate the researchers in interpreting tables and establishing relationships in cross-domains. this review paper is an attempt to highlight the knowledge gap in processing the pdf tables and its accessibility for blind and visually impaired individuals. an efficient and open-source solution for making pdf documents accessible to blind and visually impaired people needs to exploit the heuristics, ontologies, machine learning, and deep learning by using open-source libraries and tools for understanding and interpreting the tabular content in order to reduce information overload. endnotes 1 roya rastan, “automatic tabular data ex wcag traction and understanding” (phd diss., university of new south wales, 2017). 2 mark t. maybury, “communicative acts for explanation generation,” international journal of man-machine studies 37, no. 2 (1992): 135–72. 3 patricia wright, “the comprehension of tabulated information: some similarities between reading prose and reading tables,” nspi journal 19, no. 8 (1980): 25–29, https://doi.org/10.1002/pfi.4180190810. https://doi.org/10.1002/pfi.4180190810 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 14 4 jean-claude guédon et al., future of scholarly publishing and scholarly communication: report of the expert group to the european commission (brussels: european commission, directorategeneral for research and innovation, 2019), https://doi.org/10.2777/836532. 5 world health organization, world report on vision, october 8, 2019, https://www.who.int/publications-detail/world-report-on-vision/. 6 mireia ribera turró, “are pdf documents accessible?” information technology and libraries 27, no. 3 (2008): 25–43, https://doi.org/10.6017/ital.v27i3.3246. 7 kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29, https://doi.org/10.1086/685399. 8 iris xie et al., “using digital libraries non-visually: understanding the help-seeking situations of blind users,” information research 20, no. 2 (2015): 673. 9 heidi m. schroeder, “implementing accessibility initiatives at the michigan state university libraries,” reference services review 46, no. 3 (2018): 399–413, https://doi.org/10.1108/rsr04-2018-0043. 10 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no.4 (2016): 7–18, https://doi.org/10.6017/ital.v35i4.9469. 11 rakesh babu and iris xie, “haze in the digital library: design issues hampering accessibility for blind users,” electronic library 35, no. 5 (2017): 1052–65, https://doi.org/10.1108/el-102016-0209. 12 rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101. 13 xinxin wang, “tabular abstraction, editing, and formatting” (phd diss., university of waterloo, 1996). 14 rastan, “automatic tabular data extraction,” 25. 15 azadeh nazemi, “non-visual representation of complex documents for use in digital talking books” (phd diss., curtin university, 2015). 16 rastan, “automatic tabular data extraction,” 14. 17 max göbel et al., “icdar 2013 table competition,” in 2013 12th international conference on document analysis and recognition (2013): 1449–53, https://doi.org/10.1109/icdar.2013.292. 18 burcu yildiz, katharina kaiser, and silvia miksch, “pdf2table: a method to extract table information from pdf files,” in proceedings of the 2nd indian international conference on artificial intelligence (iicai, 2005): 1773–85; tamir hassan and robert baumgartner, “table recognition and understanding from pdf files,” in ninth international conference on https://doi.org/10.2777/836532 https://www.who.int/publications-detail/world-report-on-vision/ https://doi.org/10.6017/ital.v27i3.3246. https://doi.org/10.1086/685399 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.6017/ital.v35i4.9469 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.6017/ital.v38i4.11101 https://doi.org/10.1109/icdar.2013.292 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 15 document analysis and recognition (icdar 2007) (2007): 1143–47, https://doi.org/ 10.1109/icdar.2007.4377094; alexey shigarov et al., “tabbypdf: web-based system for pdf table extraction,” in international conference on information and software technologies (springer international publishing, 2018): 257–69, https://doi.org/10.1007/978-3-31999972-2_20. 19 minghao li et al., “tablebank: table benchmark for image-based table detection and recognition,” preprint, arxiv:1903.01949; sebastian schreiber et al., “deepdesrt: deep learning for detection and structure recognition of tables in document images,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 1162–67, https://doi.org/10.1109/icdar.2017.192. 20 zewen chi et al., “complicated table structure recognition,” preprint, arxiv:1908.04729. 21 michael cafarella et al., “ten years of webtables,” in proceedings of the vldb endowment 11, no. 12 (august 2018): 2140–49, https://doi.org/10.14778/3229863.3240492. 22 shah khusro, asima latif, and irfan ullah. “on methods and tools of table detection, extraction and annotation in pdf documents,” journal of information science 41, no. 1 (2015): 41–57, https://doi.org/10.1177/0165551514551903. 23 hassan, “table recognition and understanding”; richard zanibbi, dorothea blostein, and james r cordy, “a survey of table recognition,” document analysis and recognition 7, no. 1 (2004): 1–16, https://doi.org/10.1007/s10032-004-0120-9; andreiwid sheffer corrêa and pär-ola zander, “unleashing tabular content to open data: a survey on pdf table extraction methods and tools,” in proceedings of the 18th annual international conference on digital government research (june 2017): 54–63, https://doi.org/10.1145/3085228.3085278; christopher clark and santosh divvala, “looking beyond text: extracting figures, tables and captions from computer science papers” (paper, aaai workshops at the twenty-ninth aaai conference on artificial intelligence, austin, tx, january 25–26, 2015)., 24 ermelinda oro and massimo ruffolo, “pdf–trex: an approach for recognizing and extracting tables from pdf documents,” in 2009 10th international conference on document analysis and recognition (icdar) (2009): 906–10, https://doi.org/10.1109/icdar.2009.12. 25 vidhya govindaraju, ce zhang, and christopher ré, “understanding tables in context using standard nlp toolkits,” in proceedings of the 51st annual meeting of the association for computational linguistics (sofia, bulgaria: association for computational linguistics, august 2013): 658–64. 26 nikola milosevic et al., “disentangling the structure of tables in scientific literature,” in natural language processing and information systems, nldb 2016, lecture notes in computer science 9612 (springer, cham), https://doi.org/10.1007/978-3-319-41754-7_14. 27 rastan, “automatic tabular data extraction,” 48. https://10.0.4.85/icdar.2007.4377094 https://10.0.4.85/icdar.2007.4377094 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1109/icdar.2017.192 https://doi.org/10.14778/3229863.3240492 https://doi.org/10.1177/0165551514551903 https://doi.org/10.1007/s10032-004-0120-9 https://doi.org/10.1145/3085228.3085278 https://doi.org/10.1109/icdar.2009.12 https://doi.org/10.1007/978-3-319-41754-7_14 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 16 28 alexey shigarov, andrey mikhailov, and andrey altaev, “configurable table structure recognition in untagged pdf documents,” in proceedings of the 2016 acm symposium on document engineering, (2016): 119–22, https://doi.org/10.1145/2960811.2967152. 29 shigarov et al., “tabbypdf,” 262, 263, 265. 30 dae hyun kim et al., “facilitating document reading by linking text and tables,” in proceedings of the 31st annual acm symposium on user interface software and technology (october 2018): 423–34, https://doi.org/10.1145/3242587.3242617. 31 hassan, “table recognition and understanding,” 1145. 32 jing fang et al., “a table detection method for multipage pdf documents via visual separators and tabular structures,” in 2011 international conference on document analysis and recognition (2011): 779–83, https://doi.org/10.1109/icdar.2011.304. 33 bahadar ali and shah khusro, “a divide-and-merge approach for deep segmentation of document tables,” in proceedings of the 10th international conference on informatics and systems (may 2016): 43–49, https://doi.org/10.1145/2908446.2908473. 34 wenyuan xue et al., “table analysis and information extraction for medical laboratory reports,” in 2018 ieee 16th intl conf on dependable, autonomic and secure computing, 16th intl conf on pervasive intelligence and computing, 4th intl conf on big data intelligence and computing and cyber science and technology congress (dasc/picom/datacom/cyberscitech) (2018): 193–99, https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043. 35 roya rastan, hye-young paik, and john shepherd, “texus: a unified framework for extracting and understanding tables in pdf documents,” information processing & management 56, no. 3 (2019): 895–918, https://doi.org/10.1016/j.ipm.2019.01.008. 36 dafang he et al., “multi-scale multi-task fcm for semantic page segmentation and table detection,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 254–61, https://doi.org/10.1109/icdar.2017.50. 37 jing fang et al., “table header detection and classification,” in proceedings of the twenty-sixth aaai conference on artificial intelligence (july 2012): 599–605. 38 he et al., “multi-scale multi-task,” 255. 39 martha o. perez-arriaga, trilce estrada, and soraya abad-mota, “tao: system for table detection and extraction from pdf documents,” florida artificial intelligence research society conference, north america (2016). 40 saman arif and faisal shafait, “table detection in document images using foreground and background features,” in 2018 digital image computing: techniques and applications (dicta), (2018): 1–8, https://doi.org/10.1109/dicta.2018.8615795. 41 schreiber et al., “deepdesrt,” 1163, 1164. https://doi.org/10.1145/2960811.2967152 https://doi.org/10.1145/3242587.3242617 https://doi.org/10.1109/icdar.2011.304 https://doi.org/10.1145/2908446.2908473 https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043 https://doi.org/10.1016/j.ipm.2019.01.008 https://doi.org/10.1109/icdar.2017.50 https://doi.org/10.1109/dicta.2018.8615795 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 17 42 shoaib ahmed siddiqui et al., “decnt: deep deformable cnn for table detection,” ieee access 6 (2018): 74151–61, https://doi.org/10.1109/access.2018.2880211. 43 chi et al., “complicated table structure recognition.” 44 rahul anand, hye-young paik, and cheng wang, “integrating and querying similar tables from pdf documents using deep learning,” 2019, preprint, arxiv:1901.04672. 45 jiaoyan chen et al., “colnet: embedding the semantics of web tables for column type prediction,” in proceedings of the aaai conference on artificial intelligence 33, no. 1: 29–36, https://doi.org/10.1609/aaai.v33i01.330129. 46 ziqi zhang, “towards efficient and effective semantic table interpretation,” in international semantic web conference (2014): 487–502, https://doi.org/10.1007/978-3-319-11964-9_31. 47 ivan ermilov, sören auer, and claus stadler, “user-driven semantic mapping of tabular data,” in proceedings of the 9th international conference on semantic systems (september 2013): 105–12, https://doi.org/10.1145/2506182.2506196. 48 martha o perez-arriaga, trilce estrada, and soraya abad-mota, “table interpretation and extraction of semantic relationships to synthesize digital documents,” in proceedings of the 6th international conference on data science, technology and application—data (2017): 223– 32, https://doi.org/10.5220/0006436902230232. 49 varish mulwad, “tabel—a domain-independent and extensible framework for inferring the semantics of tables,” (phd diss., university of maryland, 2015). 50 syed tahseen raza rizvi et al., “ontology-based information extraction from technical documents,” in proceedings of the 10th international conference on agents and artificial intelligence (icaart) (2018): 493–500, https://doi.org/10.5220/0006596604930500. 51 corrêa and zander, “unleashing tabular content to open data,” 55. 52 irfan ullah et al., “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432. 53 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 54 govindaraju, zhang, and ré , “understanding tables in context using standard nlp toolkits,” 660, 661. 55 perez-arriaga, estrada, and abad-mota, “table interpretation and extraction,” 227. 56 kim et al., “facilitating document reading,” 425, 426. https://doi.org/10.1109/access.2018.2880211 https://doi.org/10.1609/aaai.v33i01.330129 https://doi.org/10.1007/978-3-319-11964-9_31 https://doi.org/10.1145/2506182.2506196 https://doi.org/10.5220/0006436902230232 https://doi.org/10.5220/0006596604930500 https://doi.org/10.6017/ital.v37i4.10432 https://doi.org/10.4018/ijswis.2018100106 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 18 57 rastan, pail, and shepherd, “texus,” 906. 58 nikola milosevic et al., “a framework for information extraction from tables in biomedical literature,” international journal on document analysis and recognition (ijdar) 22, no. 1 (2019): 55–78, https://doi.org/10.1007/s10032-019-00317-0. 59 chi et al., “complicated table structure recognition.” 60 wenhao yu et al., “tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system,” in the world wide web conference (may 2019): 3615–19, https://doi.org/10.1145/3308558.3314118. 61 anand, paik, and wang, “integrating and querying similar tables.” 62 turró, “are pdf documents accessible?” 2, 4. 63 nazemi, “non-visual representation of complex documents,” 110, 111, 112, 118. 64 juan cao, “generating natural language descriptions from tables,” ieee access 8 (2020): 46206–16, https://doi.org/10.1109/access.2020.2979115. 65 maartje ter hoeve et al., “conversations with documents: an exploration of document-centered assistance,” in proceedings of the 2020 conference on human information interaction and retrieval (march 2020): 43–52, https://doi.org/10.1145/3343413.3377971. 66 guédon et al., “future of scholarly publishing,” 42. 67 w3c, “wcag 2.0.” 68 world health organization, “world report on vision”; david reinsel, john gantz, and john rydning, “data age 2025: the digitization of the world, from edge to core,” idc white paper, #us44413318 (framingham, ma: idc, november 2018), https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataagewhitepaper.pdf/. 69 rastan, “automatic tabular data extraction,” 18, 19. 70 arif and shafait, “table detection in document images,” 1. 71 ana costa e silva, “parts that add up to a whole: a framework for the analysis of tables,” (phd diss., edinburgh university, uk, 2010). 72 milosevic et al., “a framework for information extraction from tables,” 60. 73 rastan, “automatic tabular data extraction,” 14. 74 chen et al., “colnet,” 31. 75 mulwad, “tabel,” 23; zewen, “complicated table structure recognition.” 76 siddiqui et al., “decnt,” 74160. https://doi.org/10.1007/s10032-019-00317-0 https://doi.org/10.1145/3308558.3314118 https://doi.org/10.1109/access.2020.2979115 https://doi.org/10.1145/3343413.3377971 https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 19 77 david w embley, sharad seth, and george nagy, “transforming web tables to a relational database,” 2014 22nd international conference on pattern recognition (2014) 2781–86, https://doi.org/10.1109/icpr.2014.479. 78 milosevic et al., “a framework for information extraction from tables,” 56. 79 milosevic et al., “a framework for information extraction from tables,” 55, 56. 80 kim et al., “facilitating document reading,” 432. 81 chen et al., “colnet,” 36. 82 asima latif et al., “a hybrid technique for annotating book tables,” int. arab j. inf. technol 15, no. 4 (2018): 777–83. 83 rastan, paik, and shepherd, “texus,” 909. 84 milosevic et al., “a framework for information extraction from tables,” 61, 62, 65, 66. 85 rizvi et al., “ontology-based information extraction,” 496. 86 siddiqui et al., “decnt,” 74160. 87 max göbel et al., “a methodology for evaluating algorithms for table understanding in pdf documents,” in proceedings of the 2012 acm symposium on document engineering (september 2012): 45–48, https://doi.org/10.1145/2361354.2361365. 88 rastan, paik, and shepherd, “texus,” 917. 89 david pinto et al., “table extraction using conditional random fields,” in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval (july 2003): 235–42, https://doi.org/10.1145/860435.860479. 90 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 91 ullah et al., “current state of linked and open data in cataloging,” 47, 48. 92 julius t. nganji, “the portable document format (pdf) accessibility practice of four journal publishers,” library and information science research 37, no.3 (2015): 254–62, https://doi.org/10.1016/j.lisr.2015.02.002. 93 julius t. nganji, “an assessment of the accessibility of pdf versions of selected journal articles published in a wcag 2.0 era (2014–2018),” learned publishing 31, no. 4 (2018): 391–401, https://doi.org/10.1002/leap.1197. 94 wittmann et al., “from digital library to open datasets,” 49, 50. 95 yan han and xueheng wan, “digitization of text documents using pdf/a,” information technology and libraries 37, no. 1 (2018): 52–64, https://doi.org/10.6017/ital.v37i1.9878. https://doi.org/10.1109/icpr.2014.479 https://doi.org/10.1145/2361354.2361365 https://doi.org/10.1145/860435.860479 https://doi.org/10.1016/j.lisr.2015.02.002 https://doi.org/10.1002/leap.1197 https://doi.org/10.6017/ital.v37i1.9878 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 20 96 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 97 xie et al., “using digital libraries non-visually,” paper 673. 98 babu and xie, “haze in the digital library,” 1057–59. 99 iris xie et al., “enhancing usability of digital libraries: designing help features to support blind and visually impaired users,” information processing and management 57, no. 3 (2020): 102110, https://doi.org/10.1016/j.ipm.2019.102110. 100 chen et al., “colnet,” 31, 32. 101 kim et al., “facilitating document reading,” 432. 102 milosevic et al., “a framework for information extraction from tables,” 61. 103 rizvi et al., “ontology-based information extraction,” 496. 104 embley, seth, and nagy, “transforming web tables to a relational database,” 2783; milosevic et al., “a framework for information extraction from tables,” 60. 105 nicholas j tierney and karthik ram, “a realistic guide to making data available alongside code to improve reproducibility,” preprint, arxiv:2002.11626. 106 rastan, paik, and shepherd, “texus,” 917. 107 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 108 mexhid ferati and wondwossen m. beyene, “developing heuristics for evaluating the accessibility of digital library interfaces,” in universal access in human–computer interaction, design and development approaches and methods, uahci 2017, lecture notes in computer science 10277 (springer, cham), https://doi.org/10.1007/978-3-319-58706-6_14. 109 ullah et al., “current state of linked and open data in cataloging,” 64. https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.1016/j.ipm.2019.102110 https://doi.org/10.1007/978-3-319-58706-6_14 abstract introduction the current state of table processing table extraction and processing using heuristics using segmentation using machine learning and deep learning approaches using ontologies relationship of tables with content and context existing accessibility-driven solutions for pdf documents issues and challenges in the existing systems table structure table formats table interpretation table evaluation table presentation to blind and visually impaired users accessibility of digital library collection conclusions and future research directions endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. free culture: how big media uses technology and the law to lock down culture and control creativity coyle, karen information technology and libraries; dec 2004; 23, 4; proquest pg. 198 book review free culture how big media uses technology and the law to lock down culture and control creativity by lawrence lessig . new york: penguin, 2004. 240p. $24.95 (isbn 1594-20006-8). this is the third book by stanford law professor larry lessig, and the third in which he furthers his basic theme : that the ancient regime of intellectual property owners is locked in a battle with the capabilities of new technology. lessig u sed his first book, code and other laws of cyberspace (basic books, 1999), to explain that the notion of cyberspace as free, open, and anarchic is simply a myth, and a dangerous one at that: the very architecture of our computers and how they communicate determine what one can and cam10t do within that environment. if you can get control of that architecture, say by mand ating filters on cont ent, yo u can get subs tantial control over the culture of that communication space. in his sec ond book, the future of ideas: the fate of the commons in a connected world (random, 2001), lessig describes how the chang e from real prop erty to virtual propert y actually means more opportunity for control , not less. the theme that he takes up in free culture is his conc ern that certain powerful inter ests in our society (read: hollywood) are using copyright law to lock down the very stuff of creativity: mainly , pa st creativity. lessig himself admits in his preface that his is not a new or unique argument. he cites richard stallman's writings in the mid-1980s that became the basis for the free software movement as containing many of the same concepts that lessig argues in his book. in this case, it serves as a kind of proof of concept (that new ideas build on past ideas) rather than a criticism of lack of originality. stallman's work is not, however, a substitute for lessig's; not only does lessig address popular culture where stallman addresses only computer code, but lessig has one key thing in his favor: h e is a mast er story-tell er and a darned good writer, not something one usually expec ts in an academic and an expert in constitutional law. his book opens with the first flight osf the wright brothers and the death of a farmer's chick ens, followed by buster keaton's film steamboat bill and disney's famous mouse . th e next chapter traces the history of photography and how the law once considered that snapping a picture could require prior permission from the owners of any property caught in th e viewfinder. later he tells how an improvement to a sea rch engin e led one college student to owe the recording industry association of america $15 million. throughout the book lessig illustrates copyright through the lives of real people and uses histor y, science, and the arts to mak e this law come to life for the reader . lessig explains that intellectual property differ s from real property in the eye of the law. unlike real property, where th e property owner has near total control over its uses, the only control offered to authors originally was the control over who could make copies of the work and distribut e them. in addition, that right-the "copy right" -lasted only a short time. the original length of copyright in the united states was fourteen years, with the right to renew for another fourteen years. so a total of twenty-eight years stood betwe en an author's rights and the public domain, and those rights were limited to publishing copies. others could quote from a work, even derive other works from it (such as turning a no ve l into a play) , all within a law that was designed to promote science and the arts. fast forward to the present day and we have a very different situation. not only has there been a change in th e length of time that copyright applies to a work; a major change in 198 information technology and libraries i december 2004 tom zillner, editor copyright law in 1976 extended copyright to works that had not previously b een covered. in the earli es t u.s. copyright regimes of the late 18th century, only works that were registered with the copyright office were afforded the prot ection of copyright law, and only about five perc en t of works produc ed were so registered. th e rest were in the public domain. later, actual registration with the copyright office was unnecessary but the author was required to place a copyright notice on a work (e.g ., "© 2004, karen coyle") in order to claim copyright in it. copyright holder s had to renew works in order make use of the full term of protection, and renewal rates were actually quite low. in 1976, all such requirements were removed, and the law was amended to state that any work in a fixed m edium automatically receives copyright protection, and for the full term. that is true even if the author do es not want that protection . so although many saw the great exchange of ideas an d information on the internet as being a huge commons of knowledge, to be shared and sha red alike, a ll of it has, in fact, alwa ys been covered by copyright law-every word out there belongs to someone. that chang e, combined with a much earlier change that gave a copyright holder control over derivative works, puts creators into a deadlock. th ey cannot safely build on the work of others without permission (thus less ig's argument that we are becomin g a "permission culture ") . yet, we have no m echanism (such as registration of works that would result in a databas e of creators) that would facilitate getting th at permission . if you find a work on the internet and it has no named author or no contact information for the author, the law forbids you to reuse the work without permission, but there is nothing that would make getting that permission a manageable task. of course, even if you do know who th e rights hold er is , permission is not a given. for examreproduced with permission of the copyright owner. further reproduction prohibited without permission. ple, you hear a great song on the radio and want to use parts of that tune in your next rap performance. you would need to approach the major record label that holds the rights and ask permission, which might not be granted. you could go ah ead and use the sample and, if challenged, claim "fair use." but being challenged means going to court in a world where a court case could cost you in the six digits, an amount of money that most creators do not have. lessig, of course, spends quite a bit of time in his book on the length of copyright, now life of the author plus seventy years. it was exactly this issue that he and eric eldred took to the supreme court in 2003. lessig argued before the court that if congress can seemingly arbitrarily increase the length of copyright, as it has eleven times since 1962, then there is effectively no limit to the copyright term. yet "for a limited time" was clearly mandated in the u.s. constitution. lessig lost his case. you might expect him to spend his efforts explaining how the supreme court was wrong and he was right, but that is not what he does . right or wrong, they are the supreme court, and his job was to convince them to decide in favor of his client. instead, lessig revises his estimation of what can be accomplished with constitutional arguments and spends a chapter outlining compromises that mightjust might-be possible in the future. to the extent that eldred v. ashcroft had an effect on lessig's thinking , and there is evidence that the effect was profound, it will have an effect on all of us because lessig is one of the key actors in this arena. throughout the book, lessig points out the difference between copyright law and the actual market for works. there is a great irony in the fact that copyright law now protects works for a century or more while most books are in print for one year or less. it is this vast storehouse of out-ofprint and unexploited works that makes a strong argument for some modification of our copyright law. he also recognizes that there are different creative cultures in our society, with different views of the purpose of creation. here he cites academic movements like the public library of science as solutions for the sector of society that has a low or nonexistent commercial interest but a need to get its works as widely distributed as possible. for these creators, and for "sharers" everywhere, lessig promotes the creativecommons solution (at www. creativecommons.org), a simple licensing scheme that allows creators to attach a license to their work that lets others know how they can make use of it. in a sense, creativecommons is a way to opt out of the default copyright that is applied to all works. when i first received my copy of free culture, i did two things: i looked up libraries in the index, and i looked up the book online to see what other reviewers had said. online, i found a web site for the book (http:/ /free-culture.org) that pointed to two very interesting sites: one that lists free, downloadable fulltext copies of the book in over a dozen different formats; and one that allows you to listen to the chapters being read aloud by volunteers and admirers. (i did listen to a few chapters and generally they are as listenable as most nonfiction audio books. in the end, though, i read the hard copy of the book.) lessig is making a point by offering his work outside the usual confines of copyright law, but in fact the meaning of his gesture is more economic than legal. although he, and cory doctorow before him (down and out in the magic kingdom, tor books, 2003), brokered agreements with their publishers to publish simultaneously in print with free digital copies, few authors and publishers today will choose that option for fear of loss of revenue, not because of their belief in the sanctity of intellectual property. if there were sufficient proof that free online copies of works increased sales of hard copies, this would quickly become the norm, regardless of the state of copyright law. as for libraries-unfortunately, they do not fare well. he dedicates a short chapter to brewster kahle and his way-back machine as his example of the need to archive our culture for future access. i admit that i winced when lessig stated: but kahle is not the only librarian. the internet archive is not the only archive. but kahle and the internet archive suggest what the future of librarie s or archives could be. (114) lessig also mentions libraries in his arguments about out-of-print and inaccessible works, but in this case he actually gets it wrong: after it [a book] is out of print , it can be sold in used book store s without the copyright owner getting anything and stored in libraries, where many get to read the book, also for free. (113) since we know that lessig is very aware that books are sold and lent even while they are still in print, we have to assume that the elegance of the argum ent was preferred over precision . but he makes this error mor e than once in the book, leaving librarie s to appear to be a home for leftov ers and remaindered works. that is too bad. we know that lessig is aware of libraries; anyone active in the legal profession depends on them. he has spoken at library-related conferences and events. yet he does not see libraries as key players in the battle against overly powerful copyright interests . more to the point, libraries have not captured his imagination, or given him a good story to tell. so here is a challenge for myself and my fellow librarians: whether it means chatting up lessig after one of his many public performances, becoming active in creativecommons, or stopping by palo alto to take a busy law professor to lunch , we need to make sure that we get on , and stay on, lessig's radar . we need him ; h e needs us.-karen coyle, digital libraries consultant, http:// kcoyle.net book review 199 58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” microsoft word ital_march_gerrity.docx editor’s comments bob gerrity information technology and libraries | march 2013 1 with this issue, information technology and libraries (ital) begins its second year as an open-‐ access, e-‐only publication. there have been a couple of technical hiccups related to the publication of back issues of ital previously only available in print: the publication system we’re using (open journal system) treats the back issues as new content and automatically sends notifications to readers who have signed up to be notified when new content is available. we’re working to correct that glitch, but hope that the benefit of having the full ital archive online will outweigh the inconvenience of the extra e-‐mail notifications. overall though, ital continues to chug along and the wheels aren’t in danger of falling off any time soon. thanks go to mary taylor, the lita board, and the lita publications committee for supporting the move to the new model for ital. readership this year appears to be healthy—the total download count for the thirty-‐three articles published in 2012 was 42,166, with 48,160 abstract views. unfortunately we don’t have statistics about online use from previous years to compare with. the overall number of article downloads for 2012, for new and archival content, was 74,924. we continue to add to the online archive: this month the first issues from march 1969 and march 1981 were added. if you haven’t taken the opportunity to look, the back issues offer an interesting reminder of the technology challenges our predecessors faces. in this month’s issue, ital editorial board member patrick “tod” colegrove reflects on the emergence of makerspace phenomenon in libraries, providing an overview of the makerspace landscape. lita member danielle becker and lauren yannotta describe the user-‐centered website redesign process used at the hunter college libraries. kathleen weessies and daniel dotson describe gis lite provide examples of its use at the michigan state university libraries. vandana singh presents guidelines for adopting an open-‐source integrated library system, based on findings from interviews with staff at libraries that have adopted open-‐source systems. danijela boberić krstićev from the university of novi sad describes a software methodology enabling sharing of information between different library systems, using the z39.50 and sru protocols. beginning with the june issue of ital, articles will be published individually as soon as they are ready. ital issues will still close on a quarterly basis, in march, june, september, and december. by publishing articles individually as they are ready, we hope to make ital content more timely and reduce the overall length of time for our peer-‐review and publication processes. suggestions and feedback are welcome, at the e-‐mail address below. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. 128 information technology and libraries | september 2010 lynne weber and peg lawrence authentication and access: accommodating public users in an academic world in cook and shelton’s managing public computing, which confirmed the lack of applicable guidelines on academic websites, had more up-to-date information but was not available to the researchers at the time the project was initiated.2 in the course of research, the authors developed the following questions: ■■ how many arl libraries require affiliated users to log into public computer workstations within the library? ■■ how many arl libraries provide the means to authenticate guest users and allow them to log on to the same computers used by affiliates? ■■ how many arl libraries offer open-access computers for guests to use? do these libraries provide both open-access computers and the means for guest user authentication? ■■ how do federal depository library program libraries balance their policy requiring computer authentication with the obligation to provide public access to government information? ■■ do computers provided for guest use (open access or guest login) provide different software or capabilities than those provided to affiliated users? ■■ how many arl libraries have written policies for the use of open-access computers? if a policy exists, what is it? ■■ how many arl libraries have written policies for authenticating guest users? if a policy exists, what is it? ■■ literature review since the 1950s there has been considerable discussion within library literature about academic libraries serving “external,” “secondary,” or “outside” users. the subject has been approached from the viewpoint of access to the library facility and collections, reference assistance, interlibrary loan (ill) service, borrowing privileges, and (more recently) access to computers and internet privileges, including the use of proprietary databases. deale emphasized the importance of public relations to the academic library.3 while he touched on creating bonds both on and off campus, he described the positive effect of “privilege cards” to community members.4 josey described the variety of services that savannah state college offered to the community.5 he concluded his essay with these words: why cannot these tried methods of lending books to citizens of the community, story hours for children . . . , a library lecture series or other forum, a great books discussion group and the use of the library staff in the fall of 2004, the academic computing center, a division of the information technology services department (its) at minnesota state university, mankato took over responsibility for the computers in the public areas of memorial library. for the first time, affiliated memorial library users were required to authenticate using a campus username and password, a change that effectively eliminated computer access for anyone not part of the university community. this posed a dilemma for the librarians. because of its federal depository status, the library had a responsibility to provide general access to both print and online government publications for the general public. furthermore, the library had a long tradition of providing guest access to most library resources, and there was reluctance to abandon the practice. therefore the librarians worked with its to retain a small group of six computers that did not require authentication and were clearly marked for community use, along with several standup, open-access computers on each floor used primarily for searching the library catalog. the additional need to provide computer access to high school students visiting the library for research and instruction led to more discussions with its and resulted in a means of generating temporary usernames and passwords through a web form. these user accommodations were implemented in the library without creating a written policy governing the use of open-access computers. o ver time, library staff realized that guidelines for guests using the computers were needed because of misuse of the open-access computers. we were charged with the task of drafting these guidelines. in typical librarian fashion, we searched websites, including those of association of research libraries (arl) members for existing computer access policies in academic libraries. we obtained very little information through this search, so we turned to arl publications for assistance. library public access workstation authentication by lori driscoll, was of greater benefit and offered much of the needed information, but it was dated.1 a research result described lynne webber (lnweber@mnsu.edu) is access services librarian and peg lawrence (peg.lawrence@mnsu.edu) is systems librarian, minnesota state university, mankato. authentication and access | weber and lawrence 129 providing service to the unaffiliated, his survey revealed 100 percent of responding libraries offered free in-house collection use for the general public, and many others offered additional services.16 brenda johnson described a one-day program in 1984 sponsored by rutgers university libraries forum titled “a case study in closing the university library to the public.” the participating librarians spent the day familiarizing themselves with the “facts” of the theoretical case and concluded that public access should be restricted but not completely eliminated. a few months later, consideration of closing rutgers’ library to the public became a real debate. although there were strong opposing viewpoints, the recommendation was to retain the open-door policy.17 jansen discussed the division between those who wanted to provide the finest service to primary users and those who viewed the library’s mission as including all who requested assistance. jansen suggested specific ways to balance the needs of affiliates and the public and referred to the dilemma the university of california, berkeley, library that had been closed to unaffiliated users.18 bobp and richey determined that california undergraduate libraries were emphasizing service to primary users at a time when it was no longer practical to offer the same level of service to primary and secondary users. they presented three courses of action: adherence to the status quo, adoption of a policy restricting access, or implementation of tiered service.19 throughout the 1990s, the debate over the public’s right to use academic libraries continued, with increasing focus on computer use in public and private academic libraries. new authorization and authentication requirements increased the control of internal computers, but the question remained of libraries providing access to government information and responding to community members who expected to use the libraries supported by their taxes. morgan, who described himself as one who had spent his career encouraging equal access to information, concluded that it would be necessary to use authentication, authorization, and access control to continue offering information services readily available in the past.20 martin acknowledged that library use was changing as a result of the internet and that the public viewed the academic librarian as one who could deal with the explosion of information and offer service to the public.21 johnson described unaffiliated users as a group who wanted all the privileges of the affiliates; she discussed the obligation of the institution to develop policies managing these guest users.22 still and kassabian considered the dual responsibilities of the academic library to offer internet access to public users and to control internet material received and sent by primary and public users. further, they weighed as consultants be employed toward the building of good relations between town and gown.6 later, however, deale indicated that the generosity common in the 1950s to outsiders was becoming unsustainable.7 deale used beloit college, with an “open door policy” extending more than 100 years, as an example of a school that had found it necessary to refuse out-of-library circulation to minors except through ill by the 1960s.8 also in 1964, waggoner related the increasing difficulty of accommodating public use of the academic library. he encouraged a balance of responsibility to the public with the institution’s foremost obligation to the students and faculty.9 in october 1965, the ad hoc committee on community use of academic libraries was formed by the college library section of the association of college and research libraries (acrl). this committee distributed a 13-question survey to 1,100 colleges and universities throughout the united states. the high rate of response (71 percent) was considered noteworthy, and the findings were explored in “community use of academic libraries: a symposium,” published in 1967.10 the concluding article by josey (the symposium’s moderator) summarized the lenient attitudes of academic libraries toward public users revealed through survey and symposium reports. in the same article, josey followed up with his own arguments in favor of the public’s right to use academic libraries because of the state and federal support provided to those institutions.11 similarly, in 1976 tolliver reported the results of a survey of 28 wisconsin libraries (public academic, private academic, and public), which indicated that respondents made a great effort to serve all patrons seeking service.12 tolliver continued in a different vein from josey, however, by reporting the current annual fiscal support for libraries in wisconsin and commenting upon financial stewardship. tolliver concluded by asking, “how effective are our library systems and cooperative affiliations in meeting the information needs of the citizens of wisconsin?”13 much of the literature in the years following focused on serving unaffiliated users at a time when public and academic libraries suffered the strain of overuse and underfunding. the need for prioritization of primary users was discussed. in 1979, russell asked, “who are our legitimate clientele?” and countered the argument for publicly supported libraries serving the entire public by saying the public “cannot freely use the university lawn mowers, motor pool vehicles, computer center, or athletic facilities.”14 ten years later, russell, robison, and prather prefaced their report on a survey of policies and services for outside users at 12 consortia institutions by saying, “the issue of external users is of mounting concern to an institution whose income is student credit hour generated.”15 despite russell’s concerns about the strain of 130 information technology and libraries | september 2010 be aware of the issues and of the effects that licensing, networking, and collection development decisions have on access.”35 in “unaffiliated users’ access to academic libraries: a survey,” courtney reported and analyzed data from her own comprehensive survey sent to 814 academic libraries in winter 2001.36 of the 527 libraries responding to the survey, 72 libraries (13.6 percent) required all users to authenticate to use computers within the library, while 56 (12.4 percent) indicated that they planned to require authentication in the next twelve months.37 courtney followed this with data from surveyed libraries that had canceled “most” of their indexes and abstracts (179 libraries, or 33.9 percent) and libraries that had cancelled “most” periodicals (46 libraries or 8.7 percent).38 she concluded that the extent to which the authentication requirement restricted unaffiliated users was not clear, and she asked, “as greater numbers of resources shift to electronic-only formats, is it desirable that they disappear from the view of the community user or the visiting scholar?”39 courtney’s “authentication and library public access computers: a call for discussion” described a follow-up with the academic libraries participating in her 2001 survey who had self-identified as using authentication or planning to employ authentication within the next twelve months. her conclusion was the existence of ambivalence toward authentication among the libraries, since more than half of the respondents provided some sort of public access. she encouraged librarians to carefully consider the library’s commitment to service before entering into blanket license agreements with vendors or agreeing to campus computer restrictions.40 several editions of the arl spec kit series showing trends of authentication and authorization for all users of arl libraries have been an invaluable resource in this investigation. an examination of earlier spec kits indicated that the definitions of “user authentication” and “authorization” have changed over the years. user authentication, by plum and bleiler indicated that 98 percent of surveyed libraries authenticated users in some way, but at that time authentication would have been more precisely defined as authorization or permission to access personal records, such as circulation, e-mail, course registration, and file space. as such, neither authentication nor authorization was related to basic computer access.41 by contrast, it is common for current library users authenticate to have any access to a public workstation. driscoll’s library public access workstation authentication sought information on how and why users were authenticated on public-access computers, who was driving the change, how it affected the ability of federal depository libraries to provide public information, and how it affected library services in general.42 but at the time of driscoll’s survey, only 11 percent of surveyed libraries required authentication on all computers and 22 percent required it only on selected terminals. cook and shelton’s managing public computing the reconciliation of material restrictions against “principles of freedom of speech, academic freedom, and the ala’s condemnation of censorship.”23 lynch discussed institutional use of authentication and authorization and the growing difficulty of verifying bona fide users of academic library subscription databases and other electronic resources. he cautioned that future technical design choices must reflect basic library values of free speech, personal confidentiality, and trust between academic institution and publisher.24 barsun specifically examined the webpages of one hundred arl libraries in search of information pertinent to unaffiliated users. she included a historic overview of the changing attitudes of academics toward service to the unaffiliated population and described the difficult balance of college community needs with those of outsiders in 2000 (the survey year).25 barsun observed a consistent lack of information on library websites regarding library guest use of proprietary databases.26 carlson discussed academic librarians’ concerns about “internet-related crimes and hacking” leading to reconsideration of open computer use, and he described the need to compromise patron privacy by requiring authentication.27 in a chapter on the relationship of it security to academic values, oblinger said, “one possible interpretation of intellectual freedom is that individuals have the right to open and unfiltered access to the internet.”28 this statement was followed later with “equal access to information can also be seen as a logical extension of fairness.”29 a short article in library and information update alerted the authors to a uk project investigating improved online access to resources for library visitors not affiliated with the host institution.30 salotti described higher education access to e-resources in visited institutions (haervi) and its development of a toolkit to assist with the complexities of offering electronic resources to guest users.31 salotti summarized existing resources for sharing within the united kingdom and emphasized that “no single solution is likely to suit all universities and colleges, so we hope that the toolkit will offer a number of options.”32 launched by the society of college, national and university libraries (sconul), and universities and colleges information systems association (ucisa), haervi has created a best-practice guide.33 by far the most useful articles for this investigation have been those by nancy courtney. “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” a literature review on the topic of visitors in academic libraries, included a summary of trends in attitude and practice toward visiting users since the 1950s.34 the article concluded with a warning: “the shift from printed to electronic formats . . . combined with the integration of library resources with campus computer networks and the internet poses a distinct threat to the public’s access to information even onsite. it is incumbent upon academic librarians to authentication and access | weber and lawrence 131 introductory letter with the invitation to participate and a forward containing definitions of terms used within the survey is in appendix a. in total, 61 (52 percent) of the 117 arl libraries invited to participate in the survey responded. this is comparable with the response rate for similar surveys reported by plum and bleiler (52 of 121, or 43 percent), driscoll (67 of 124, or 54 percent), and cook and shelton (69 of 123, or 56 percent).45 1. what is the name of your academic institution? the names of the 61 responding libraries are listed in appendix b. 2. is your institution public or private? see figure 1. respondents’ explanations of “other” are listed below. ■❏ state-related ■❏ trust instrument of the u.s. people; quasigovernment ■❏ private state-aided ■❏ federal government research library ■❏ both—private foundation, public support 3. are affiliated users required to authenticate in order to access computers in the public area of your library? see figure 2. 4. if you answered “yes” to the previous question, does your library provide the means for guest users to authenticate? see figure 3. respondents’ explanations of “other” are listed below. all described open-access computers. ■❏ “we have a few “open” terminals” ■❏ “4 computers don’t require authentication” ■❏ “some workstations do not require authentication” ■❏ “open-access pcs for guests (limited number and function)” ■❏ “no—but we maintain several open pcs for guests” ■❏ “some workstations do not require login” 5. is your library a federal depository library? see figure 4. this question caused some confusion for the canadian survey respondents because canada has its own depository services program corresponding to the u.s. federal depository program. consequently, 57 of the 61 respondents identified themselves as federal depository (including three canadian libraries), although 5 of the 61 are more accurately members of the canadian depository services program. only two responding libraries were neither a member of the u.s. federal depository program nor of the canadian depository services program. 6. if you answered “yes” to the previous question, and computer authentication is required, what provisions have been made to accommodate use of online government documents by the general public in the library? please check all that touched on every aspect of managing public computing, including public computer use, policy, and security.43 even in 2007, only 25 percent of surveyed libraries required authentication on all computers, but 46 percent required authentication on some computers, showing the trend toward an ever increasing number of libraries requiring public workstation authentication. most of the responding libraries had a computer-use policy, with 48 percent following an institution-wide policy developed by the university or central it department.44 ■■ method we constructed a survey designed to obtain current data about authentication in arl libraries and to provide insight into how guest access is granted at various academic institutions. it should be noted that the object of the survey was access to computers located in the public areas of the library for use by patrons, not access to staff computers. we constructed a simple, fourteen-question survey using the zoomerang online tool (http://www .zoomerang.com/). a list of the deans, directors, and chief operating officers from the 123 arl libraries was compiled from an internet search. we eliminated the few library administrators whose addresses could not be readily found and sent the survey to 117 individuals with the request that it be forwarded to the appropriate respondent. the recipients were informed that the goal of the project was “determination of computer authentication and current computer access practices within arl libraries” and that the intention was “to reflect practices at the main or central library” on the respondent’s campus. recipients were further informed that the names of the participating libraries and the responses would be reported in the findings, but that there would be no link between responses given and the name of the participating library. the survey introduction included the name and contact information of the institutional review board administrator for minnesota state university, mankato. potential respondents were advised that the e-mail served as informed consent for the study. the survey was administered over approximately three weeks. we sent reminders three, five, and seven days after the survey was launched to those who had not already responded. ■■ survey questions, responses, and findings we administered the survey, titled “authentication and access: academic computers 2.0,” in late april 2008. following is a copy of the fourteen-question survey with responses, interpretative data, and comments. the 132 information technology and libraries | september 2010 ■❏ “some computers are open access and require no authentication” ■❏ “some workstations do not require login” 7. if your library has open-access computers, how many do you provide? (supply number). see figure 6. a total of 61 institutions responded to this question, and 50 reported open-access computers. the number of open-access computers ranged from 2 to 3,000. as expected, the highest numbers were reported by libraries that did not require authentication for affiliates. the mean number of open-access computers was 161.2, the median was 23, the mode was 30, and the range was 2,998. 8. please indicate which online resources and services are available to authenticated users. please check all that apply. see figure 7. ■❏ online catalog ■❏ government documents ■❏ internet browser apply. see figure 5. ■❏ temporary user id and password ■❏ open access computers (unlimited access) ■❏ open access computers (access limited to government documents) ■❏ other of the 57 libraries that responded “yes” to question 5, 30 required authentication for affiliates. these institutions offered the general public access to online government documents various ways. explanations of “other” are listed below. three of these responses indicate, by survey definition, that open-access computers were provided. ■❏ “catalog-only workstations” ■❏ “4 computers don’t require authentication” ■❏ “generic login and password” ■❏ “librarians login each guest individually” ■❏ “provision made for under-18 guests needing gov doc” ■❏ “staff in gov info also login user for quick use” ■❏ “restricted guest access on all public devices” figure 3. institutions with the means to authenticate guests figure 4. libraries with federal depository and/or canadian depository services status figure 2. institutions requiring authentication figure 1. categories of responding institutions authentication and access | weber and lawrence 133 11. does your library have a written policy for use of open access computers in the public area of the library? question 7 indicates that 50 of the 61 responding libraries did offer the public two or more open-access computers. out of the 50, 28 responded that they had a written policy governing the use of computers. conversely, open-access computers were reported at 22 libraries that had no reported written policy. 12. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. twenty-eight libraries gave a url, a url plus a summary explanation, or a summary explanation with no url. 13. does your library have a written policy for authenticating guest users? out of the 32 libraries that required their users to authenticate (see question 3), 23 also had the means to allow their guests to authenticate (see question 4). fifteen of those libraries said they had a policy. 14. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. eleven ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 9. please indicate which online resources and services are available to authenticated guest users. please check all that apply. see figure 8. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 10. please indicate which online resources and services are available on open-access computers. please check all that apply. see figure 9. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software figure 5. provisions for the online use of government documents where authentication is required figure 6. number of open-access computers offered figure 7. electronic resources for authenticated affiliated users (n = 32) number of libraries number of librariesnumber of libraries number of libraries figure 8. resources for authenticating guest users (n = 23) 134 information technology and libraries | september 2010 ■■ respondents and authentication figure 10 compares authentication practices of public, private, and other institutions described in response to question 2. responses from public institutions outnumbered those from private institutions, but within each group a similar percentage of libraries required their affiliated users to authenticate. therefore no statistically significant difference was found between authenticating affiliates in public and private institutions. of the 61 respondents, 32 (52 percent) required their affiliated users to authenticate (see question 3) and 23 of the 32 also had the means to authenticate guests (see question 4). the remaining 9 offered open-access computers. fourteen libraries had both the means to authenticate guests and had open-access computers (see questions 4 and 7). when we compare the results of the 2007 study by cook and shelton with the results of the current study (completed in 2008), the results are somewhat contradictory (see table 1).46 the differences in survey data seem to indicate that authentication requirements are decreasing; however, the literature review—specifically cook and shelton and the 2003 courtney article—clearly indicate that authentication is on the rise.47 this dichotomy may be explained, in part, by the fact that of the more than 60 arl libraries responding to both surveys, there was an overlap of only 34 libraries. the 30 u.s. federal depository or canadian depository services libraries that required their affiliated users to authenticate (see questions 3 and 5) provided guest access ranging from usernames and passwords, to open-access computers, to computers restricted to libraries gave the url to their policy; 4 summarized their policies. ■■ research questions answered the study resulted in answers to the questions we posed at the outset: ■■ thirty-two (52 percent) of the responding arl libraries required affiliated users to login to public computer workstations in the library. ■■ twenty-three (72 percent) of the 32 arl libraries requiring affiliated users to login to public computers provided the means for guest users to login to public computer workstations in the library. ■■ fifty (82 percent) of 61 responding arl libraries provided open-access computers for guest users; 14 (28 percent) of those 50 libraries provided both open-access computers and the means for guest authentication. ■■ without exception, all u.s. federal depository or canadian depository services libraries that required their users to authenticate offered guest users some form of access to online information. ■■ survey results indicated some differences between software provided to various users on differently accessed computers. office software was less frequently provided on open-access computers. ■■ twenty-eight responding arl libraries had written policies relating to the use of open-access computers. ■■ fifteen responding arl libraries had written policies relating to the authorization of guests. figure 9. electronic resources on open access computers (n = 50) figure 10. comparison of library type and authentication requirement number of libraries authentication and access | weber and lawrence 135 ■■ one library had guidelines for use posted next to the workstations but did not give specifics. ■■ fourteen of those requiring their users to authenticate had both open-access computers and guest authentication to offer to visitors of their libraries. other policy information was obtained by an examination of the 28 websites listed by respondents: ■■ ten of the sites specifically stated that the open-access computers were for academic use only. ■■ five of the sites specified time limits for use of openaccess computers, ranging from 30 to 90 minutes. ■■ four stated that time limits would be enforced when others were waiting to use computers. ■■ one library used a sign-in sheet to monitor time limits. ■■ one library mentioned a reservation system to monitor time limits. ■■ two libraries prohibited online gambling. ■■ six libraries prohibited viewing sexually explicit materials. ■■ guest-authentication policies of the 23 libraries that had the means to authenticate their guests, 15 had a policy for guests obtaining a username and password to authenticate, and 6 outlined their requirements of showing identification and issuing access. the other 9 had open-access computers that guests might use. the following are some of the varied approaches to guest authentication: ■■ duration of the access (when mentioned) ranged from 30 days to 12 months. ■■ one library had a form of sponsored access where current faculty or staff could grant a temporary username and password to a visitor. ■■ one library had an online vouching system that allowed the visitor to issue his or her own username and password online. ■■ one library allowed guests to register themselves by swiping an id or credit card. ■■ one library had open-access computers for local resources and only required authentication to leave the library domain. ■■ one library had the librarians log the users in as guests. ■■ one library described the privacy protection of collected personal information. ■■ no library mentioned charging a fee for allowing computer access. government documents, to librarians logging in for guests (see question 6). numbers of open-access computers ranged widely from 2 to more than 3,000 (see question 7). eleven (19 percent) of the responding u.s. federal depository or canadian depository services libraries that did not provide open-access computers issued a temporary id (nine libraries), provided open access limited to government documents (one library), or required librarian login for each guest (one library). all libraries with u.s. federal depository or canadian depository services status provided a means of public access to information to fulfill their obligation to offer government documents to guests. figure 11 shows a comparison of resources available to authenticated users and authenticated guests and offered on open-access computers. as might be expected, almost all institutions provided access to online catalogs, government documents, and internet browsers. fewer allowed access to licensed electronic resources and e-mail. access to office software showed the most dramatic drop in availability, especially on open-access computers. ■■ open-access computer policies as mentioned earlier, 28 libraries had written policies for their open-access computers (see question 11), and 28 libraries gave a url, a url plus a summary explanation, or a summary explanation with no url (see question 12). in most instances, the library policy included their campus’s acceptable-use policy. seven libraries cited their campus’s acceptable-use policy and nothing else. nearly all libraries applied the same acceptable-use policy to all users on all computers and made no distinction between policies for use of open-access computers or computers requiring authentication. following are some of the varied aspects of summarized policies pertaining to open-access computers: ■■ eight libraries stated that the computers were for academic use and that users might be asked to give up their workstation if others were waiting. table 1. comparison of findings from cook and shelton (2007) and the current survey (2008) authentication requirements 2007 (n = 69) 2008 (n = 61) some required 28 (46%) 23 (38%) required for all 15 (25%) 9 (15%) not required 18 (30%) 29 (48%) 136 information technology and libraries | september 2010 ■■ further study although the survey answered many of our questions, other questions arose. while the number of libraries requiring affiliated users to log on to their public computers is increasing, this study does not explain why this is the case. reasons could include reactions to the september 11 disaster, the usa patriot act, general security concerns, or the convenience of the personalized desktop and services for each authenticated user. perhaps a future investigation could focus on reasons for more frequent requirement of authentication. other subjects that arose in the examination of institutional policies were guest fees for services, age limits for younger users, computer time limits for guests, and collaboration between academic and public libraries. ■■ policy developed as a result of the survey findings as a result of what was learned in the survey, we drafted guidelines governing the use of open-access computers by visitors and other non-university users. the guidelines can be found at http://lib.mnsu.edu/about/libvisitors .html#access. these guidelines inform guests that openaccess computers are available to support their research, study, and professional activities. the computers also are governed by the campus policy and the state university system acceptable-use policy. guideline provisions enable staff to ask users to relinquish a computer when others are waiting or if the computer is not being used for academic purposes. while this library has the ability to generate temporary usernames and passwords, and does so for local schools coming to the library for research, no guidelines have yet been put in place for this function. figure 11. online resources available to authenticated affiliated users, guest users, open-access users authentication and access | weber and lawrence 137 these practices depend on institutional missions and goals and are limited by reasonable considerations. in the past, accommodation at some level was generally offered to the community, but the complications of affiliate authentication, guest registration, and vendor-license restrictions may effectively discourage or prevent outside users from accessing principal resources. on the other hand, open-access computers facilitate access to electronic resources. those librarians who wish to provide the same level of commitment to guest users as in the past as well as protect the rights of all should advocate to campus policy-makers at every level to allow appropriate guest access to computers to fulfill the library’s mission. in this way, the needs and rights of guest users can be balanced with the responsibilities of using campus computers. in addition, librarians should consider ensuring that the licenses of all electronic resources accommodate walk-in users and developing guidelines to prevent incorporation of electronic materials that restrict such use. this is essential if the library tradition of freedom of access to information is to continue. finally, in regard to external or guest users, academic librarians are pulled in two directions; they are torn between serving primary users and fulfilling the principles of intellectual freedom and free, universal access to information along with their obligations as federal depository libraries. at the same time, academic librarians frequently struggle with the goals of the campus administration responsible for providing secure, reliable networks, sometimes at the expense of the needs of the outside community. the data gathered in this study, indicating that 82 percent of responding libraries continue to provide at least some open-access computers, is encouraging news for guest users. balancing public access and privacy with institutional security, while a current concern, may be resolved in the way of so many earlier preoccupations of the electronic age. given the pervasiveness of the problem, however, fair and equitable treatment of all library users may continue to be a central concern for academic libraries for years to come. references 1. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003). 2. martin cook and mark shelton, managing public computing, spec kit 302 (washington, d.c.: association of research libraries, 2007): 16. 3. h. vail deale, “public relations of academic libraries,” library trends 7 (oct. 1958): 269–77. 4. ibid., 275. 5. e. j. josey, “the college library and the community,” faculty research edition, savannah state college bulletin (dec. 1962): 61–66. ■■ conclusions while we were able to gather more than 50 years of literature pertaining to unaffiliated users in academic libraries, it soon became apparent that the scope of consideration changed radically through the years. in the early years, there was discussion about the obligation to provide service and access for the community balanced with the challenge to serve two clienteles. despite lengthy debate, there was little exception to offering the community some level of service within academic libraries. early preoccupation with physical access, material loans, ill, basic reference, and other services later became a discussion of the right to use computers, electronic resources, and other services without imposing undue difficulty to the guest. current discussions related to guest users reflect obvious changes in public computer administration over the years. authentication presently is used at a more fundamental level than in earlier years. in many libraries, users must be authorized to use the computer in any way whatsoever. as more and more institutions require authentication for their primary users, accommodation must be made if guests are to continue being served. in addition, as courtney’s 2003 research indicates, an ever increasing number of electronic databases, indexes, and journals replace print resources in library collections. this multiplies the roadblocks for guest users and exacerbates the issue.48 unless special provisions are made for computer access, community users are left without access to a major part of the library’s collections. because 104 of the 123 arl libraries (85 percent) are federal depository or canadian depository services libraries, the researchers hypothesized that most libraries responding to the survey would offer open-access computers for the use of nonaffiliated patrons. this study has shown that federal depository libraries have remained true to their mission and obligation of providing public access to government-generated documents. every federal depository respondent indicated that some means was in place to continue providing visitor and guest access to the majority of their electronic resources— whether through open-access computers, temporary or guest logins, or even librarians logging on for users. while access to government resources is required for the libraries housing government-document collections, libraries can use considerably more discretion when considering what other resources guest patrons may use. despite the commitment of libraries to the dissemination of government documents, the increasing use of authentication may ultimately diminish the libraries’ ability and desire to accommodate the information needs of the public. this survey has provided insight into the various ways academic libraries serve guest users. not all academic libraries provide public access to all library resources. 138 information technology and libraries | september 2010 identify yourself,” chronicle of higher education 50, no. 42 (june 25, 2004): a39, http://search.ebscohost.com/login.aspx?direct =true&db=aph&an=13670316&site=ehost-live (accessed mar. 2, 2009). 28. diana oblinger, “it security and academic values,” in luker and petersen, computer & network security in higher education, 4, http://net.educause.edu/ir/library/pdf/pub7008e .pdf (accessed july 14, 2008). 29. ibid., 5. 30. “access for non-affiliated users,” library & information update 7, no. 4 (2008): 10. 31. paul salotti, “introduction to haervi-he access to e-resources in visited institutions,” sconul focus no. 39 (dec. 2006): 22–23, http://www.sconul.ac.uk/publications/ newsletter/39/8.pdf (accessed july 14, 2008). 32. ibid., 23. 33. universities and colleges information systems association (ucisa), haervi: he access to e-resources in visited institutions, (oxford: ucisa, 2007), http://www.ucisa.ac.uk/ publications/~/media/files/members/activities/haervi/ haerviguide%20pdf (accessed july 14, 2008). 34. nancy courtney, “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” journal of academic librarianship 27, no. 6 (nov. 2001): 473–78, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=5602739&site= ehost-live (accessed july 14, 2008). 35. ibid., 478. 36. nancy courtney, “unaffiliated users’ access to academic libraries: a survey,” journal of academic librarianship 29, no. 1 (jan. 2003): 3–7, http://search.ebscohost.com/login.aspx?dire ct=true&db=aph&an=9406155&site=ehost-live (accessed july 14, 2008). 37. ibid., 5. 38. ibid., 6. 39. ibid., 7. 40. nancy courtney, “authentication and library public access computers: a call for discussion,” college & research libraries news 65, no. 5 (may 2004): 269–70, 277, www.ala .org/ala/mgrps/divs/acrl/publications/crlnews/2004/may/ authentication.cfm (accessed july 14, 2008). 41. terry plum and richard bleiler, user authentication, spec kit 267 (washington, d.c.: association of research libraries, 2001): 9. 42. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003): 11. 43. cook and shelton, managing public computing. 44. ibid., 15. 45. plum and bleiler, user authentication, 9; driscoll, library public access workstation authentication, 11; cook and shelton, managing public computing, 11. 46. cook and shelton, managing public computing, 15. 47. ibid.; courtney, unaffiliated users, 5–7. 48. courtney, unaffiliated users, 6–7. 6. ibid., 66. 7. h. vail deale, “campus vs. community,” library journal 89 (apr. 15, 1964): 1695–97. 8. ibid., 1696. 9. john waggoner, “the role of the private university library,” north carolina libraries 22 (winter 1964): 55–57. 10. e. j. josey, “community use of academic libraries: a symposium,” college & research libraries 28, no. 3 (may 1967): 184–85. 11. e. j. josey, “implications for college libraries,” in “community use of academic libraries,” 198–202. 12. don l. tolliver, “citizens may use any tax-supported library?” wisconsin library bulletin (nov./dec. 1976): 253. 13. ibid., 254. 14. ralph e. russell, “services for whom: a search for identity,” tennessee librarian: quarterly journal of the tennessee library association 31, no. 4 (fall 1979): 37, 39. 15. ralph e. russell, carolyn l. robison, and james e. prather, “external user access to academic libraries,” the southeastern librarian 39 (winter 1989): 135. 16. ibid., 136. 17. brenda l. johnson, “a case study in closing the university library to the public,” college & research library news 45, no. 8 (sept. 1984): 404–7. 18. lloyd m. jansen, “welcome or not, here they come: unaffiliated users of academic libraries,” reference services review 21, no. 1 (spring 1993): 7–14. 19. mary ellen bobp and debora richey, “serving secondary users: can it continue?” college & undergraduate libraries 1, no. 2 (1994): 1–15. 20. eric lease morgan, “access control in libraries,” computers in libraries 18, no. 3 (mar. 1, 1998): 38–40, http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=306709& site=ehost-live (accessed aug. 1, 2008). 21. susan k. martin, “a new kind of audience,” journal of academic librarianship 24, no. 6 (nov. 1998): 469, library, information science & technology abstracts, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=1521445&site= ehost-live (accessed aug. 8, 2008). 22. peggy johnson, “serving unaffiliated users in publicly funded academic libraries,” technicalities 18, no. 1 (jan. 1998): 8–11. 23. julie still and vibiana kassabian, “the mole’s dilemma: ethical aspects of public internet access in academic libraries,” internet reference services quarterly 4, no. 3 (1999): 9. 24. clifford lynch, “authentication and trust in a networked world,” educom review 34, no. 4 (jul./aug. 1999), http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=2041418 &site=ehost-live (accessed july 16, 2008). 25. rita barsun, “library web pages and policies toward ‘outsiders’: is the information there?” public services quarterly 1, no. 4 (2003): 11–27. 26. ibid., 24. 27. scott carlson, “to use that library computer, please authentication and access | weber and lawrence 139 appendix a. the survey introduction, invitation to participate, and forward dear arl member library, as part of a professional research project, we are attempting to determine computer authentication and current computer access practices within arl libraries. we have developed a very brief survey to obtain this information which we ask one representative from your institution to complete before april 25, 2008. the survey is intended to reflect practices at the main or central library on your campus. names of libraries responding to the survey may be listed but no identifying information will be linked to your responses in the analysis or publication of results. if you have any questions about your rights as a research participant, please contact anne blackhurst, minnesota state university, mankato irb administrator. anne blackhurst, irb administrator minnesota state university, mankato college of graduate studies & research 115 alumni foundation mankato, mn 56001 (507)389-2321 anne.blackhurst@mnsu.edu you may preview the survey by scrolling to the text below this message. if, after previewing you believe it should be handled by another member of your library team, please forward this message appropriately. alternatively, you may print the survey, answer it manually and mail it to: systems/ access services survey library services minnesota state university, mankato ml 3097—po box 8419 mankato, mn 56001-8419 (usa) we ask you or your representative to take 5 minutes to answer 14 questions about computer authentication practices in your main library. participation is voluntary, but follow-up reminders will be sent. this e-mail serves as your informed consent for this study. your participation in this study includes the completion of an online survey. your name and identity will not be linked in any way to the research reports. clicking the link to take the survey shows that you understand you are participating in the project and you give consent to our group to use the information you provide. you have the right to refuse to complete the survey and can discontinue it at any time. to take part in the survey, please click the link at the bottom of this e-mail. thank you in advance for your contribution to our project. if you have questions, please direct your inquiries to the contacts given below. thank you for responding to our invitation to participate in the survey. this survey is intended to determine current academic library practices for computer authentication and open access. your participation is greatly appreciated. below are the definitions of terms used within this survey: ■■ “authentication”: a username and password are required to verify the identity and status of the user in order to log on to computer workstations in the library. ■■ “affiliated user”: a library user who is eligible for campus privileges. ■■ “non-affiliated user”: a library user who is not a member of the institutional community (an alumnus may be a nonaffiliated user). this may be used interchangeably with “guest user.” ■■ “guest user”: visitor, walk-in user, nonaffiliated user. ■■ “open access computer”: computer workstation that does not require authentication by user. 140 information technology and libraries | september 2010 appendix b. responding institutions 1. university at albany state university of new york 2. university of alabama 3. university of alberta 4. university of arizona 5. arizona state university 6. boston college 7. university of british columbia 8. university at buffalo, state university of ny 9. case western reserve university 10. university of california berkeley 11. university of california, davis 12. university of california, irvine 13. university of chicago 14. university of colorado at boulder 15. university of connecticut 16. columbia university 17. dartmouth college 18. university of delaware 19. university of florida 20. florida state university 21. university of georgia 22. georgia tech 23. university of guelph 24. howard university 25. university of illinois at urbana-champaign 26. indiana university bloomington 27. iowa state university 28. johns hopkins university 29. university of kansas 30. university of louisville 31. louisiana state university 32. mcgill university 33. university of maryland 34. university of massachusetts amherst 35. university of michigan 36. michigan state university 37. university of minnesota 38. university of missouri 39. massachusetts institute of technology 40. national agricultural library 41. university of nebraska-lincoln 42. new york public library 43. northwestern university 44. ohio state university 45. oklahoma state university 46. university of oregon 47. university of pennsylvania 48. university of pittsburgh 49. purdue university 50. rice university 51. smithsonian institution 52. university of southern california 53. southern illinois university carbondale 54. syracuse university 55. temple university 56. university of tennessee 57. texas a&m university 58. texas tech university 59. tulane university 60. university of toronto 61. vanderbilt university reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: how do you know whence they will come? marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 3 editorial: how do you know whence they will come? a s i write this, i am putting my affairs in order at western michigan university, in preparation for a move to a new position at the university of notre dame libraries beginning in april. at each university my responsibilities include overseeing both the online catalog and the libraries' web presence. i mention this only because i find it interesting, and indicative of an issue with which the library profession in general is grappling, that librarians in both institutions are engaged in discussions regarding the relationship between the two. in talking to librarians at those places and others, from some i hear sentiment for making one or the other the "primary" access point. thus i've heard arguments that "the online catalog represents our collection, so we should use it as our main access mechanism." other librarians state that "the online catalog is fine for searching for books in our collection, but there is so much more to find and so many more options for finding it, that we should use our web pages to link everything together." my hunch is that probably we can all agree that there are things that an online catalog can do better than a web site, and things that a web site can do better than the online catalog. as far as that goes, have we ever had a primary access point (thanks to karen coyle for this thought)? but that's not what i want to talk about today. the debate over a primary access point contains an invalid implicit assumption and asks the wrong question. the implicit assumption is that we can and should control how our patrons come into our systems. the question we should be asking ourselves is not "what is our primary access method?" but rather "how can we ensure that our users, local and remote, will find an avenue that enables them to meet their informational needs?" since at this time i'm more familiar with wmu than notre dame, i'll draw some examples from the former. we have "subject guides to resources" on our web site. these consist of pages put together by subject specialists that point to recommended sources, both print and electronic, dan marmion local and remote, on given subjects. students can use them to begin researching topics in a large number of subject areas. the catch is that the students have to be browsing around the web site. if they happen to start out in the online catalog they will never encounter these gateways, because the only reference to them is on the web site. on the other hand, a student who stays strictly with the web site is quite possibly going to miss a valuable resource in our library if he/she doesn't consult the online catalog, because we obviously can't list everything we own on the web site. (also, obviously, the web site doesn't provide the patron with status information.) this is why we have to ask ourselves the correct question mentioned above. what is the solution? unfortunately i'm not any smarter than everyone else, so i don't have the answer (although i do know some folks who can help us with it: check out www.lita.org/ committe / toptech/ main page. htm). my guess is that we'll have to work it out as a profession, possibly in collaboration with our online system vendors, and that the solution will be neither quick nor simple nor easy. there are some ad hoc moves we can make, of course, such as put links to the gateways into the catalog, and on our web pages stress that the patron really needs to do a catalog search. the bottom line is that we have a dilemma: we can't control how people come into our electronic systems, so we can't have a "primary access point." if we try, we do harm to those who, for whatever reason, reach us via some other avenue. we need to make sure that we provide equal opportunity for all. dan marmion (dmarmion@nd.edu) is associate director of information systems and access at notre dame university, notre dame, indiana. production: ala production services (troy d. linker, christine s. taylor; angela hanshaw, kevin heubusch, and tracy malecki), american library association, 50 e. huron st., chicago, il 60611. publication of material in information technologtj and libraries does not constitute official endorsement by lita or the ala. abstracted in computer & information systems, computing reviews, information science abstracts, library & information science abstracts, referativnyi zhurnal, nauclmaya i tekhnicheskaya informatsiya, otdyelnyi vypusk, and science abstracts publications. indexed in compumath citation index, computer contents, computer literature index, current contents/health services administration, current contents/social behavioral sciences, current index to journals in education, education, library literature, magazine index, newsearch, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. mum requirements of american national standard for information sciences-permanence of paper for printed library materials, ansi z39.48-1992.oo copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minieditorial i 3 315 technical communications isad/solinet to sponsor institute "networks and networking ii; the present and potential" is the theme of an isad institute to be held at the braniff place hotel on february 27-28, 1975, in new orleans. the sponsors are the information science and automation division of ala and the southeastern library network (solinet). this second institute on networking will be an extension of the previous one held in new orleans a year ago. the ground covered in that previous institute will be the point of departure for "networks ii." the purpose of the previous institute was to review the options available in networking, to provide a framework for identifying problems, and to suggest evaluation strategies to aid in choosing alternative systems. while the topics covered in the previous institute will be briefly reviewed in this one, some speakers will take different approaches to the subject of networking, while other speakers will discuss totally new aspects. in addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a person-to-person basis. a new feature to isad institutes now being planned will be the presence of vendors' exhibits. arrangements are being made with the many vendors and manufacturers whose services are applicable to networking to exhibit their products and systems. it is hoped that many of them will be interested in responding to this opportunity. the program will include: "a systems approach to selection of alternatives" -resource sharing-camponents-communications options-planning strategy. joseph a. rosenthal university of california, berkeley. ' "state of the nation"-review of current developments and an evaluation. brett butler, butler associates. "the library of congress, marc, and future developments." henriette d. avram, library of congress. "data bases, standards and data conversions" -existing data bases-characteristics-standardization-problems. john f. knapp, richard abel & co. "user products"-possibilities for product creation-the role of user products. maurice freedman, new york public library. "on-line technology"-hardware and software considerations-library requirements-standards-cost considerations of alternatives. philip long, state university of new york, albany. "publishers' view of networks"-copyright-effect on publishers-effect on authorship-impact on jobbers-facsimile transmission. carol nemeyer, association of american publishers. "national library of canada"-current and anticipated developments-cooperative plans in canada-international cooperation. rodney duchesne, national library of canada. "administrative, legal, financial, organizational and political considerations" -actual and potential problems-organizational options-financial commitment-governance. fred kilgour, oclc. registration will be $75.00 to members of ala and staff members of solinet institutions, $90.00 to nonmembers, and $10.00 to library school students. for hotel reservation information and registration blanks, contact donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611; 312-944-6780. 316 journal of library automation vol. 7/4 december 1974 regional projects and activities indiana coopemtive libmry services authm·ity the first official meeting of the board of directors of the indiana cooperative library services authority (incolsa) was held june 4, 1974, at the indiana state library in indianapolis. a direct outgrowth of the cooperative bibliographic center for indiana libraries ( cobicil) feasibility study project sponsored by the indiana state library and directed by mrs. barbara evans markuson, incolsa has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." to date, contracts have been signed by sixty-one public, thirteen academic, fourteen schools and five specfal librariesa total of ninety-three libraries. incolsa is being funded initially by a three-year establishment grant from the u.s. office of education, library services and construction act (lsca) title i funds. officers are: president-harold baker, head of library systems development, indiana state university; vice-presidentor. michael buckland, assistant director for technical services, purdue university libraries; secretary-mary hartzler, head of catalog division, indiana state library; treasurer-mary bishop, director of the crawfordsville book processing center; three directors-at-large--phil hamilton, director of the kokomo public library; edward a. howard, director of the evansville-vanderburgh county public library; and sena kautz, director of media services, duneland school corporation. stanford's ballots on-line files publicly available through spires september 16,.1974 the stanford university libraries automated technical processing system, ballots (bibliographic automation of large library operations using a timesharing system) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. important components of the ballots operations are several on-line files accessible through an unusually powerful set of indexes. currently available are: a file of library of congress marc data starting from january 1, 1972 (with a gap from may to august 1972); an in-process file of individual items being purchased by stanford; an on-line catalog (the catalog data file) of all items cataloged through the system, whether copy was derived from library of congress marc data, was input from non-marc cataloging copy, or resulted from stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. in addition, during september and october 1974, the 85,000 bibliographic and holdings records (already in machinereadable form on magnetic tape) representing the entire j. henry meyer memorial undergraduate library was convmted to on-line meyer catalog data and meyer reference files in ballots. these files are publicly available through spires (stanford public information retrieval system) to any person with a terminal that can dial up the stanford center for information processing's academic computer services computer (an ibm 360 model 67) and who has a valid computer account. the marc file can be searched through the following index points: lc card number personal name corporate/ conference n arne title the in-process, catalog data, and reference files for stanford and for meyer can also be searched as spires public subfiles through the following index points: ballots unique record identification number personal name corporate/ conference name title subject heading (catalog data and reference file records only) call number (catalog data and reference file records only) lc card number the title and corporate/ conference name indexes are word indexes; this means that each word is indexed individually. search requests may draw on more than one index at a time by using the logical operators "and," "or," and "and not" to combine index values sought. if you plan to use spires to search these files, or if you would like more information, a publication called gttide to ballots files may be ordered by writing to: editor, library computing services, s.c.i.p.-willow, stanford university, stanford, ca 94305. this document contains complete information about the ballots files and data elements, how to open an account number, and how to use spires to search ballots files. a list of ballots publications and prices is also available on request. as additional libraries create on-line files using ballots in a network environment, these files will also be available. these additions will be announced in ]ola technical commttnications. data base news interchange of alp and ei data bases a national science foundation grant (gn-42062) for $128,700 has been awarded to the american institute of physics (aip), in cooperation with engineering index ( ei), for a project entitled "interchange of data bases." the grant became effective on may 1, 1974, for a period of fifteen months. the project is intended to develop methods by which ei and alp can reduce their input costs by eliminating duplication of intellectual effort and processing. through sharing of the resources of the two organizations and an interchange of their respective data bases, alp and ei expect to improve the utilization of these computer-readable data bases. the basic requirement for the developtechnical communications 317 ment of the interchange capability for computer-readable data bases is the establishment of a compatible set of data elements. each organization has unique data elements in its data base. it will therefore be necessary to determine which of the data elements are absolutely essential to each organization's services which elements can be modified, and wh~t other elements must be added. mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from alp to ei tape format and vice versa. simultaneously, there will be the development of language conversion facilities between ei' s indexing vocabulary and alp's physics and astronomy classification scheme (pacs). it is also planned to investigate the possibility of establishing a computer program which can convert alp's indexing to ei's terms and vice versa. with the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. eric data base users conference the educational resource information center (eric) held an eric data base users conference in conjunction with the 37th annual meeting of the american society for information science (asis) in atlanta, georgia, october 13-17, 1974. the eric data base users conference provided a forum for present and potential eric users to discuss common problems and concerns as well as interact with other components of the eric network: central eric, the eric processing and reference facility, eric clearinghouse personnel, and information dissemination centers. although attendees have in the past been primarily oriented toward machine use of the eric files, all patterns of usage were represented at this conference, from manual users of printed indexes to operators of national on-line reh·ieval systems. 318 ]oumal of library automation vol. 7/4 december 1974 a number of invited papers were presented dealing with subjects such as: • the current state and future directions of educational information dissemination. sam rosenfeld (nie), lee burchina! (nsf). • what services, systems, and data bases are available? marvin gechman (information general), harvey marron (nie). • the roles of libraries and industry, respectively, in disseminating educational information. richard de gennaro (university of pennsylvania), paul zurkowski (information industry association) . several organizations (national library of canada, university of georgia, wisconsin state department of education) were invited to participate in "show and tell" sessions to describe in detail how they are using the eric system and data base. a status report covering eric on-line services for educators was presented by dr. carlos cuadra (system development corporation) and dr. roger summit (lockheed). interactive discussion groups covered a number of subjects including: • computer techniques-programming methods, use of utilities, file maintenance, search system selection, installation, and operation. • serv:uig the end user of educational information. • introduction to the eric systemwhat tools, systems, and services are available and how are they used? • beginning and advanced sessions on computer searching the eric files. online terminals were used to demonstrate and explain use of machine capabilities. commercial services and developments scope data inc. ala train compatible terminal printers scope data inc. currently is offering a high-speed, nonimpact terminal printer for use in various interactive printing applications. capability can be included in the series 200 printer as an extra-cost feature to print the eight-bit ascii character set for ala character set with 176 characters. for further information contact alan g. smith, director of marketing, scope data inc., 3728 silver star rd., orlando, fl 32808. institute for scientific information puts life sciences data base on-line through system development corporation the institute for scientific information (lsi) has announced that it will collaborate with system development corporation (sdc) to provide on-line, interactive, computer searches of the life sciences journal literature. scheduled to be fully operational by july 1, 1974, the isi-sdc service is called scisearch® and is designed to give quick, easy, and economical access to a large life sciences literature .file. stressing ease of access, the sdc retrieval program, orbit, permits subscribers to conduct extremely rapid literature searches through two-way communications terminals located in their own facilities. mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. this dialog between the searcher and the computer (located in sdc's headquarters in santa monica, california) is conducted with simple english-language statements. because this system is tied in to a nationwide communications network, most subscribers will be able to link their terminals to the computer through the equivalent of a local phone call. covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will initially offer a searchable ille of over 400,000 items published between april 1972 and the present. each month approximately 16,000 new items will be added until the average size of the file totals about one-half million items and represents two-and-one-half years ·of coverage. to assure subscribers maximum retrieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. included are searches by keywords, word stems, word phrases, authors, and organizations. one of the search techniques utilized-citation searching-is an exclusive feature of the lsi data base. for every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a language indicator, a code for the type of item (article, note, review, etc.), an lsi accession number, and all the cited references contained in the retrieved article. the accession number is used to order full-text copies of relevant items through lsi's original article tear sheet service (oats®). this ability to provide copies of every item in the data base distinguishes the lsi service from many others. current library of congress catalog on-line for reference searches information dynamics corporation (idc) has agreed to collaborate with system development corporation (sdc) to provide reference librarians, researchers, and scholars with on-line interactive computer searches of all library materials being cataloged by the library of congress. scheduled to be fully operational as of october 1, 1974, the sdc-idc service is called sdc-idc/libcon and is designed to give quick, easy, and economical access to a large portion of the world's scholarly library materials. as in the lsi service described above, the data base can be searched in several ways. included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most english materials. one of the search techniques utilized-string searching-is an exclusive feature of sdc's orbit system. keyword searching of cataloged items including all foreign materials processed by the library of congress technical communications 319 is an exclusive feature of the idc data base not currently available in other online marc files. for individual items retrieved through a search, subscribers can receive a bibliographic description that includes authors, full title, an idc accession number, the lc classification number, and publisher information. standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: the tesla reactor ballot will be provided in f01'thcoming issues. to use, photocopy the ballot fol'm, fill out, and mail to: john c. kountz, associate for library automation, office of the chan{jellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives to existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline is designed to facilitate review by both the committee and the membership of initiative standards proposals and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi320 journal of library automation vol. 7/4 december 1974 cated by: vi. existing standards. not applicable). nate that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. (optional) specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specifications of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. tesla reactor ballot identification number for standing requirement reactor information name-----'----------tiue ______________________ ___ organization --------------addrms _____________ ___ city ___ _ state ___ zip __ _ telephonea 1:-:::-rea::+----~--- need (for this standard) for d against 0 specification (a presented in this requirement) for 0 against 0 ext. can you participate in the development of this. standard -.,.---------==----0 no d yes reason for position: (use format of proposal. · additional pages can be used if required) the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. as you, the members, use the outline to present your standards proposals, tesla will publish them in jola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. input to the editor: we have been asked by the members of the ala interdivisional committee on representation in machine readable form of bibliographic information, (marbi) to respond to your editorial in the june 1974 issue of the journal of library automation. this editorial dealt with the council of library resources' [sic] involvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of marc for use in inter-library exchange technical communications 321 of bibliographic data ( cembi), to management of a project which has as its goal the creation of a national serials data base, (conser), and, more recently, to the convening of a conference of library and a&i organizations to discuss the outlook for comprehensive national bibliographic control. you raised several legitimate questions: 1) has sufficient publicity been given to these activities of the council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) is the council bypassing existing channels of operation and communication? you also suggest that proposals from groups such as cembi be channeled through an official ala committee such as marbi for intensive review and evaluation. it should be pointed out that marbi is not charged with the development of standards. it acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for national or international use. this applies to proposals emanating from cembi and conser as well as from other concerned groups. all indications to date are that the council is fully aware of marbi's role and will not bypass marbi. a number of members of marbi are also members of cembi and marbi is represented on the conser project. also reassuring is the fact that, unless we allow lc to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a council-sponsored group will eventually be reflected in the marc records distributed by lc. the library of congress has issued a statement, published in the june 1974 issue of jola, to the effect that it will not implement any changes in the marc distribution system which are not acceptable to marbi. marbi and lc have worked out a procedure whereby all proposed changes to marc are submitted to marbi. they are then published in ]ola and distributed to mem322 journal of library automation vol. 7/4 december 1974 hers of the marc users discussion group for comments. comments are collected and evaluated by marbi and a report submitted to lc, with its recommendations. the marbi review process does not guarantee perfection and there is no assurance that everyone will be satisfied. compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for machine readable bibliographic data. however the council has undoubtedly learned from the isbd(m) experience that it cannot make decisions which affect libraries without the greatest possible involvement of librarians. it is the feeling of the marbi committee members that the council intends to work with marbi in future projects which fall into marbi's area of concern. velma veneziano marbi past chairperson ruth tighe chairperson editor's note: it is gratifying to note that marbts response reflects the opinions expressed in the june 1974 editorial. the library community will doubtless. be pleased to learn of clr's intention to work closely with marbi.-skm to the editor: as briefly discussed with you, yom editorial in the june 1974 issue of jola is both admirable and disturbing (to me, at least). the problem of national leadership in the area of library automation is a critical problem indeed. being in the ''boondocks" and far removed from the scene of action, i can only express to you my perception as events and activities filter through to me. i can remember as far back as 1957 when adi had a series of meetings in washington, d.c. trying to establish a national program for bibliographic automation. i have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliographic automation and information storage and retrieval systems. i have worked with nsf, usoe, department of commerce, u.s. patent office, engineering and technical societies, dod agency-the entire spectrum. i spent a good many years working in adi and asis, sla, andmost recently ala. at no time were we able to make significant progress towards a national system. even the great airlie house conference did not produce any significant changes in the fragmented, competitive "non-system." it has only been in the recent past since clr has taken an aggressive posture that i am able to see the beginning of orderly development of a national automated bibliographic system. i certainly agree that any topic as critical as those being discussed by cembi should be in the public domain, but i also believe that the progress made by cembi would not have been possible without clr taking the initiative in getting these key agencies together. thank goodness someone quit talking and started doing something at the national level! i sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, clr provides a genuine service to the total library community in establishing cembi. hopefully, your very excellent article (in the same issue of jola) on "standards for library automation ... " will help to put the entire issue of bibliographic record standards into perspective. as a former chemist and corrosion engineer, i am fully aware of the absolute necessity for technical standards. i am also fully aware of the necessity of developing technical standards through the process you outlined in your article. hopefully, clr action with cembi will expedite this laborious process and help to push our profession forward into the twentieth century. since we ourselves have not been able to do it through all these years, i am personally grateful that some group such as clr took the initiative and forced us to do what we should have done years ago. maryann duggan slice office di1·ector editor's note: positive action and progmssive movement are, of course, desirable and are often lacking in large organizations. however, posit·ive action without communication of this action to the affected population can only be detrimental. on issues of the complexity of those addressed by cembi and conser, review by the library community is always useful, even though action may be temporarily delayed.-skm to the editor: on page 233 of the september issue of lola there is a report from the information industry association's micropublishing committee chairman (henry powell). he states that", .. the committee spelled out several areas of concern to micropublishers which will be the subject of committee action .... " one of the concerns of the committee is that a z39 standards committee has recommended "standards covering what micropublishers can say about their products." (emphasis mine.) technical communications 323 as chairman of the z39 standards subcommittee which is developing the advertising standard referred to, i wish to point out that there is no intention on the part of the subcommittee to tell micropublishers what they can say nor what they may say about their products. the subcommittee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the beginning taken the view that the purpose of the standard would be to provide guidance for micropublishers and librarians alike. we are most anxious that no one feel that the subcommittee has any intention of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. in addition it should be noted that no ansi standard is compulsory. carl m. spaulding program officer council on library resou1·ces editorial board thoughts: library analytics and patron privacy ken varnum information technology and libraries | december 2015 2 two significant trends are sweeping across the library landscape: assessment (and the corresponding collection and analysis of data) and privacy (of records of user interactions with our services). libraries, perhaps more than any other public service organization, are strongly motivated to assess their offerings with dual aims. the first might be thought of as an altruistic goal: understanding the needs of their particular clientele and improving library services to meet those needs. the second is perhaps more existential: helping justify the value libraries create to whatever sources of funding are necessary to impress. both are valid and important. it is hard to argue that improving services, focusing on actual needs, and maintaining funding are in any way improper goals. however, this desire is often seen as being in conflict with exploring too deeply the actions or needs of individual constituents, despite librarians’ historical and deeply-held belief that each constituent’s precise information needs should be explored and provided for through personalized, tailored services. solid assessment cannot happen without solid data. libraries have historically relied on qualitative surveys of their users, asking users to evaluate the quality of the services they receive. being able to know more details and ask directed questions of individuals who used services is possible in the traditional library setting through invitations to complete surveys after individual interactions such as a reference or circulation desk interaction, library program, visit to a physical location, or even a community-wide survey invitation. focus groups can be assembled as well, of course, once a library has identified a real-world group to study. however, those samples are more often convenience samples or—unless a library is able to successfully contact and receive responses from across the entire community— somewhat self-selected. assessment that leads to new or improved services relies much more heavily on broad-based understanding of the users of a system. libraries have been able to do limited quantitative studies of library usage—at its simplest, counting how many of this were checked out, how many of that was accessed, and how many users were involved. these metrics are useful, but also limited, particularly at the scale of a single library. knowing that a pool of resources is heavily used is helpful; even knowing that a suite of resources is frequently used collectively is beneficial. however, tying use of resources to specific information needs or information seekers, whether this is defined as ken varnum (varnum@umich.edu), a member of the ital editorial board, is senior program manager for discovery, delivery, and learning analytics at the university of michigan library, ann arbor, michigan. mailto:varnum@umich.edu editorial board thoughts: library analytics and patron privacy | varnum doi: 10.6017/ital.v34i4.9151 3 individuals or ad hoc collections of users based on situational factors such as academic level, course enrollments, etc. these more specific grouping rely on granular data that for many libraries—especially academic ones—are increasingly electronic. we are at a point in time when we have the potential to leverage wide swathes of user data. and this is where the second trend, privacy, comes to bear. protecting user privacy has been a guiding principle of librarianship in the united states (in particular) since the 1970s, as a strong reaction to u.s. government (through the fbi) requests to provide access to circulation logs for individuals under suspicion of espionage. this was in the early days of library automation, when large libraries with automated ils systems could prevent future disclosure through the straightforward strategy of purging transaction records as soon as the item was returned. this practice became standard operating procedure in libraries, and expanded into new information service domains as they evolved over the following forty years. with good intentions, libraries have ensured that they maintain no long-term history for most online services. as a profession, we have begun to realize that the straightforward (and arguably simplistic) approaches we have relied on for so long may no longer be appropriate or helpful. over the past year, these conversations found focus through a project coordinated by the national information standards organization thanks to a grant from the andrew w. mellon foundation.1 the range of issues discussed here was far-reaching and touched on virtually every aspect of privacy and assessment imaginable. the resulting draft document, consensus framework to support patron privacy in digital library and information systems,2 outlines 12 principles that libraries (and the information service vendors they partner with) should follow as they establish “practices and procedures to protect the digital privacy of the library user.” this new consensus framework sets a series of guidelines for us to consider as we begin to move into this uncharted (for libraries) territory. if we are to record and make use of our users’ online (and offline, for that matter) footprints to improve services, improve the user experience, and justify our value, this document gives us an outline of the issues to consider. it is time (and probably long past time) that we make conscious decisions about how we assess our online resources, in particular, and do so with a deeper knowledge of both the resources used and the people using them. at the exact moment in our technological history when we find ourselves able to provide automated services at scale to our users through the internet and simultaneously record and analyze the intricate details of those transactions, we need to come to think clearly about what questions we have, what data we need to answer them, and be explicit about how those data points are treated. it is important that we start this process now and change our blunt practices into more strategic data collection and analysis. where 40 years ago we opted to information technology and libraries | december 2015 4 bluntly enforce user privacy by deleting the data, we should now take a more nuanced approach and store and analyze data in the service of improved services and tools for our user communities. we have the opportunity, through technology and a more nuanced understanding of privacy, to conduct a protracted reference interview with our virtual users over multiple interactions… and thereby improve our services. references 1. http://www.niso.org/topics/tl/patron_privacy/ 2. http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus %20principles%20users%20digital%20privacy.pdf http://www.niso.org/topics/tl/patron_privacy/ http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20principles%20users%20digital%20privacy.pdf http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20principles%20users%20digital%20privacy.pdf a computer output microfilm serials list for patron use william saffady: wayne state university, detroit, michigan. 263 library literature generally assumes that com is bette1· suited to staff rather than patron use applications. this paper describes a com serials holdings list intended for patton use. the application and conversion from paper to com are described. emphasis is placed on the selection of an appropriate microformat and easily operable viewing equipment as conditions of success fo1' patron use. as a marriage of dynamic information-handli11g technologies, computer output microfilm (com) is a systems tool of· potentially great significance to librarians. several libraries have reported successful com applications initiated within the last few years. the two most recent-fischer's description of four com-generated reports used by the los angeles public libraries and bolefs account of a com book catalog at the washington university school of medicine library-stress the time, space, and cost savings so frequently reported in analyses of the advantages of com.1• 2 this article describes the substitution of microfilm for paper as the computer output medium in one of the most common library automation applications, a serials holdings list intended for use by library patrons. it is interesting that, at a time when librarians are insisting on the importance of patron acceptance of technological innovation, the recent literature reports com applications intended solely for staff use. bole£, in fact, lists staff rather than patron use among the characteristics of potentially successful library com applications. the report that follows suggests, however, that careful attention to the selection of an appropriate microformat and viewing equipment can successfully extend the effectiveness of com to include pab:on-use library automation applications. the application the union list of se1·ials in the wayne state university libraries is a computer-generated alphabetical listing, by title, of serials held by the wayne state university library system and some biomedical libraries in the detroit metropolitan area. sullivan describes it as "informative in purpose and conventional in method."3 as with many similar applications, serials i' 264 journal of library automation vol. 7 j 4 december 197 4 holdings were automated in order to unify and disseminate hitherto separate, local records. the list is primarily a location device, giving for each title the location within the library system and information on the holdings at each location. it is updated monthly, the july 1974 issue totalling 1,431 pages. in paper form, twenty copies produced on an ibm 1403 line printer using four-ply carbon-interleaved forms were distributed for use throughout the library system. the list shares some of the characteristics that have marked other successful com applications. 4 it consists of many pages and has a sizeable distribution. quick retrieval of information is essential. use is for reference rather than reading. there is no need to annotate the list and no need for paper copies, although the latter requirement would not rule out the use of com for this particular application. patrons simply consult the list to determine whether the library's holdings include a particular serial and then proceed to the indicated location. it is interesting that serials holdings lists, long recognized as an excellent introductory library automation application, should also prove an excellent first application for com. complexities of format and viewing equipment selection aside, the conversion of output from paper to microfilm presented no problems. since the wayne state university computing and data processing center does not have com capability, the university libraries, after careful consideration of several vendors, contracted with the mark larwood company, a microfilm service bureau equipped with a gould beta com 700l recorder. the beta com is a crt-type com recorder with an uppercase and lowercase character set, forms-overlay capability, proportional spacing, underlining, superscripts, subscripts, italics, and a universal camera capable of producing 16, 35, 70, and 105mm microformats at several reduction ratios. a decisive factor in the selection of this particular vendor was the beta com's dedicated pdp-8/l minicomputer that enables the com recorder to accept an ibm 1403 print tape, thereby greatly simplifying conversion and eliminating the expense of reprogramming. microformat selection as ballou notes, discussions of com have tended to concentrate more on the computer than on micrographics, but for a patron-use com application the selection of an appropriate microformat is of the greatest importance.5 however, there has been an unfortunate emphasis placed, both in the literature of micrographics and by vendors, on microfiche, the format now dominating the industry, especially in com applications. such emphasis ignores the fundamental rule of systems design, that form follows function. each of the microformats has strengths and weaknesses that must be analyzed with reference to the application at hand. for a patron-use, com-generated serials holdings list, ease of use with a minimum of patron film handling is a paramount consideration. microfiche is clearly unsuitable for a list of over 1,400 pages. even at 42x reduction, the paserials list/saffady 265 tron would be forced to choose from among seven :fiches, each containing 208 pages. the difficulties of handling and loading, combined with library staff involvement in a program of user instruction, make fiche an unattractive choice. instead, the relatively large size of the holdings list suggests that one of the 16mm roll formats offers the best prospects of containing present size and future growth within a single microform. the disadvantages of the conventional 16mm open spool-the necessity of threading film onto a take-up reel before viewing-can be minimized by using a magazine-type film housing. the popular cartridge format eliminates much film handling, but cartridge readers are very expensive, necessitating a considerable investment where many readers are required. even with the cartridge, it is still possible for a patron to unwind the film from the take-up reel, necessitating rethreading before viewing. fortunately, microfilm cassettes overcome this difficulty. unlike the cartridge format, 16mm cassettes feature selfcontained supply and take-up reels. the film cannot be completely unwound from the take-up reel and the cassette can be removed from the viewer at any time without rewinding. patron film handling is virtually eliminated. the cassette format has proven very popular with british libraries, where it has been used with satisfactory results in com applications.6 viewing equipment success in format choice is contingent on the selection of appropriate viewing equipment. as larkworthy and brown point out, the best viewer for patron-use com applications is one that can easily be operated by the least mechanically inclined person.7 fortunately, cassette viewers, while limited in number, tend to be very easy to operate. the viewer chosen for use with the union list of serials, the memorex 1644 autoviewer, features a simple control panel, fixed 24x reduction, easily operated focus and scan knobs, motorized film drive for high-speed searching, and a manual hand control for more precise image positioning. the screen measures eleven by fourteen inches in size, with sufficient brightness for comfortable ambient light viewing. other cassette viewers examined, however satisfactory they might be in other respects, failed to meet the peculiar requirements of this particular application. discussion since its introduction in april 1974, the com-generated union list of serials in the wayne state university libraries has enjoyed a satisfactory reception. patrons have learned to consult the com list with little difficulty. the selection of an appropriate microformat and easily operated viewing equipment have kept staff involvement in patron instruction to a minimum. there appears to be no reason for limiting potential library com applications to those used primarily or solely by staff members. given the 266 journal of library automation vol. 7/4 december 1974 severity of the current paper shortage, the consequent rise in paper prices, and serious questions about the availability of paper at any price, com merits serious consideration as an alternative output medium for the widest range of library automation applications. references 1. mary l. fischer, "the use of com at the los angeles public library," the journal of micrographics 6:205-10 (may 1973). 2. doris bole£, "computer-output microffim," special libraries 65:169-75 (april 1974). 3. howard a. sullivan, "metropolitan detroit's network: wayne state university library's serials automation project," medical library association buuetin 56:269-71 (july 1968). 4. see, for example, auerbach on computer output microfilm (princeton: auerbach publishers, 1972), p.1-10. 5. hubbard w. ballou, "microform technology," in carlos cuadra, ed., annual review of information science and technology, v.8 (washington, d.c.: american society for information science, 1973), p.139. 6. d. r. g. buckle and thomas french, "the application of microform to manual and machine readable catalogues," program 6:187-203 (july 1972). 7. graham larkworthy and cyril brown, "library catalogs on microfilm," library association record 73:231-32 (dec. 1971). microsoft word 13355 20211217 galley.docx article a 21st century technical infrastructure for digital preservation nathan tallman information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13355 nathan tallman (ntt7@psu.edu) is digital preservation librarian, pennsylvania state university. © 2021. abstract digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies such as software defined storage and function-as-a-service. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. introduction most digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case because of similar features. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. in order to understand the benefits, this article will examine the principles of sustainability and how they apply to digital preservation. typical preservation activities that use technology will be described, followed by how these activities occur in a 20th-century technical infrastructure model. after a discussion on advancements in the it industry since the conceptualization of the 20thcentury model, a theoretical 21st-century model is presented that attempts to show how the cultural heritage sector can employ industry advancements and the beneficial impact on sustainability. galleries, libraries, archives, and museums cannot afford to ignore the sustainability of managing and preserving digital content and neither can distributed digital preservation or commercial service providers.1 budgets lag behind economic inflation while the cost of and amount of materials to purchase rises, coupled with the need to hire more employees to do this work. if digital preservation programs are going to scale up to enterprise levels and operate in perpetuity, it is imperative to update technical approaches, adopt industry advancements, and embrace cloud technology. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 2 sustainability for digital preservation programs to succeed, they must be sustainable per the triple bottom line or they risk subverting their mission. the triple bottom line definition of sustainability identifies three pillars: people (labor), planet (environmental), and profit (economic).2 while there are typically few people with digital preservation in their job title within an organization, it’s a collaborative domain with roles and responsibilities distributed throughout organizations, reflecting the digital object lifecycle. it’s important that the underlying technical infrastructure can easily be supported and is not so complicated that it is hard to recruit systems administration staff. digital preservation consumes many technical resources and data centers have a substantial environmental impact. as ben goldman points out in “it’s not easy being green(e),” data centers consume an immense amount of power and require extravagant cooling systems that use precious fresh water resources.3 because there is no point in preserving digital content if there will be no future generation of users, responsible digital preservation programs will seek to reduce carbon outputs and the number of rare-earth elements in our technical infrastructure.4 while cultural heritage organizations rarely seek to make a profit, economic sustainability is vital to organizational health and costs for digital preservation must be controlled. modern technological infrastructures discussed here will help to increase sustainability by using widespread technologies and strategies for which support can be easily obtained, by reducing energy consumption, by minimizing reliance on hardware using rare-earth elements, and by leveraging advances in infrastructure components such as storage to perform digital preservation activities. basic digital preservation activities this paper will examine technical preservation activities and the author acknowledges that basic digital preservation activities are likely to include risk management and other non-technical concepts. while there is no formal, agreed-upon definition of what constitutes a set of basic digital preservation activities, bit-level digital preservation is a common baseline. bit-level digital preservation seeks to preserve the digital object as it was received, ensuring that you can get out an exact copy of what you put in, no matter how long ago the ingest occurred; however, with no guarantees as to the renderability of said digital object. two basic digital preservation activities are key to this strategy: fixity and replication. fixity fixity checking, or the “practice of algorithmically reviewing digital content to ensure that it has not changed over time,” is a foundational digital preservation strategy for verifying integrity that aligns with rosenthal et al.’s “audit” strategy.5 fixity is how preservationists demonstrate mathematically that the content has not changed since it was received. not all fixity is the same, however; fixity can be broken up into three types: transactional fixity, authentication fixity, and fixity-at-rest.6 transactional fixity transactional fixity is checked after some sort of digital preservation event7, such as ingest or replication. depending on the event, it’s desirable to use a non-cryptographic algorithm, such as crc32 or md5, when files move within a trusted system. when it’s only necessary to prove that a file hasn’t immediately changed, such as copying between filesystems, cryptographic algorithms are unnecessarily complex and are too expensive, in terms of compute consumption. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 3 authentication fixity authentication fixity proves that a file hasn’t changed over a long period of time, particularly since ingest. although one could use a chain of transactional fixity checks to cumulatively prove there has been no change, it’s often desirable to conduct one fixity check that can be independently verified. unbroken cryptographic algorithms, such as one from the sha-2 and sha-3 families, are well suited to this use case and worth the complexity and compute expense, particularly since this type of fixity check doesn’t have to be run as often. fixity-at-rest fixity-at-rest is when fixity is monitored while content is stored on disk. while some organizations may choose to only conduct fixity checks when files move or migrate, this strategy can miss bit loss due to media degradation, software or human error, or malfeasance that is only discovered when the file is retrieved.8 a common approach for monitoring fixity-at-rest is to systematically conduct fixity checks on all or a sample of files at regular intervals. these types of fixity checks may or may not use cryptographic algorithms, depending on their availability.9 replication replication is another cornerstone of achieving bit-level digital preservation. the national digital stewardship alliance’s 2019 levels of digital preservation, a popular community standard, recommends maintaining at least two copies in separate locations, while noting three copies in geographic locations with different disaster threats is stronger.10 all of these copies must be compared to ensure fixity is maintained. an important concept to consider when thinking about replication is the independence of each copy. according to schaefer et al.’s user guide for the preservation storage criteria, “the copies should exist independently of one another in order to mitigate the risk of having one event or incident which can destroy enough copies to cause loss of data.”11 in other words, a replica should not depend on another replica, but instead depend on the original file. advanced digital preservation activities when considering more robust digital preservation strategies beyond bit-level preservation, additional activities must be considered to ensure that the information contained within digital files can be understood. implementation of these activities may vary by digital object, depending on the digital preservation goal and appraised value of the content. this paper only describes a handful of the many advanced digital preservation activities as illustrative examples; the ideas in this paper could be applied to most advanced activities. metadata extraction digital files often contain various types of embedded metadata that can be used to help describe both its intellectual content and technical characteristics. this metadata can be extracted and used to populate basic descriptive metadata fields, such as title or author. extracted technical metadata is useful for broader preservation planning, but also for validating technical characteristics in derivative files. for example, if generating an access file for digitized motion picture film, it’s necessary to know the color encoding, aspect ratio, and frame rate. if these details are ignored, the access derivative may appear significantly different than the original file and give a false impression to users. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 4 file format conversions file format conversions help to ensure the renderability of digital content. there are two types of file format conversions to consider: normalization and migration. normalization generally refers to proactively converting file formats upon ingest to retain informational content, e.g., converting a wordperfect document to plain text or pdf when only the informational content is desired. migration may occur at any time: upon ingest, upon access, or any time while an object is in storage. migration occurs when file formats are converted to a newer version of the same format, e.g., microsoft access 2003 (mdb) to microsoft access 2016 (accdb) or to a more stable and open format that retains features, e.g., microsoft access 2016 (accdb) to sqlite. versioning versioning, or the retention of past states of a digital object with the ability to restore previous states, is complex to implement and not always necessary. an organization might choose to apply versioning to subsets of digital content, such as within an institutional repository, but not for born-analog, or digitized material. additionally, an organization may choose to version metadata only, ignoring changes to the bitstream, such as for born-analog digital objects. figure 1. the infrastructure architecture for a typical 20th-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 5 the 20th century technical infrastructure the technical infrastructure that enables digital preservation can come in many forms. while technology has advanced over the past thirty years, the cultural heritage sector, particularly where digital preservation is concerned, has been slow to adapt. below are descriptions of three common components of a typical server stack (technical infrastructure), though the author acknowledges that some organizations have already moved past this model. figure 1 is a diagram of the typical 20th-century stack. storage storage, at the core of digital preservation, has benefitted from rapid technological advancement since computers first started storing information on punch cards and magnetic media. twentiethcentury servers often use three main types of storage: file, block, and object. file storage file storage is what most people are familiar with. a filesystem interfaces with the underlying storage technology (block or object) and physical media (hard disk drives, solid state drives, tapebased media, or optical media) to present users with a hierarchy of directories and subdirectories to store data. this data can easily be accessed by users or applications using file paths, while the filesystem negotiates the actual bit-locations on the physical media. the choice of filesystem can impact data integrity (fixity), although choice may be limited by operating system. in the 20th century, journaling filesystems offered the most data protection as the filesystem keeps track of all changes; in the event of a disk failure, it’s possible to recover more data if a journaling filesystem is used. block storage block storage uses blocks of memory on physical media (disk, tape, etc.) that are managed through a filesystem to present volumes of storage to the server. all interactions between server and storage are handled by the filesystem via file paths, though the data is stored on scattered blocks on the media. block storage directly attached to a server is often the most performant option, the data does not travel outside the server. network attached storage, in which an external file system is mounted to the server as if it were locally attached block storage, requires data to travel through cables and networks before it gets to the server, which decreases performance. object storage object storage, which still uses tape and disk media, is an abstraction on top of a filesystem. instead of using a filesystem to interact directly with storage media, the storage media is managed by software. the software pools storage media and interactions happen through an api, with files being organized into “buckets” instead of using a filesystem with paths. object storage is webnative and the basis for commercial cloud storage. software-defined storage, which is discussed in more detail later in this article, also allows users to create block storage volumes that can be directly mounted to virtual servers as part of a filesystem or to create network shares that present the underlying storage to users via a filesystem.12 both block and object storage can be used for high-performance storage, hot storage (online), cold storage (nearline), and offline storage. generally, tape and slower performing hard disks are used for offline and nearline storage; faster performing hard disks are used for online storage. solidinformation technology and libraries december 2021 a 21st century technical infrastructure | tallman 6 state drives (ssds) using non-volatile memory express (nvme) protocols are best suited for highperformance storage.13 in the 2019 storage infrastructure survey, by the national digital stewardship alliance, 60% of those aware of their organizational infrastructure reported a reliance on hardware-based filesystems (file and block storage) while about 15% used software-based filesystems (object storage), with 14% reporting a hybrid approach.14 this indicates that the cultural heritage sector continues to rely more on file and block storage and is not yet fully embracing object storage. the survey did not probe into why this might be. servers: physical and virtual twentieth-century technical infrastructures relied primarily upon physical servers. physical servers, also called bare metal, dominated the server landscape up through roughly 2005. virtual servers arrived on the scene after “vmware introduced a new kind of virtualization technology which … [ran] on the x86 system” in 1999.15 server virtualization facilitated a fresh wave of innovation by making it easier and more inexpensive to create, manage, and destroy servers as necessary. dedicating physical servers to one or a limited number of applications requires more resources and expends a higher carbon cost; virtual servers can be highly configured for their precise needs and this configuration can be changed using software, rather than changing parts on a physical server, resulting in less waste. cultural heritage organizations have been slow to fully adapt virtual servers. the 2019 ndsa storage infrastructure survey reports that 81% of respondents continue to rely on physical servers with 63% of respondents using virtual servers. fewer than 10% reported using containers, an even more efficient virtualization technology.16 containers are an evolution of virtual servers that act like highly optimized, self-contained servers doing a specific activity.17 applications and microservices in the 20th century, applications often required dedicated servers. business logic was handled by applications or microservices that ran on top of the server and storage, the highest level in the stack. there are advantages to handling the business logic at this high level: it’s completely in the control of the developer and can be finely tuned to the needs. unfortunately, this is also an expensive place to handle all business logic as the application needs to be maintained over time and there’s overhead involved in working at this level of the stack. microservices, in this server model, are generally specific commands that can be invoked as needed. while called microservices because they can be applied individually, they still run in this expensive part of the stack and have the same downsides as applications. in digital preservation systems using this type of architecture, basic and advanced digital preservation activities occur within this application layer. fixity can be a costly activity. garnett, winter, and simpson, in their paper “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” point out that “calculating full checksums in software is not efficient” and “increases wear and tear on the disks themselves, actually accelerating degradation.”18 fixity, when done this way, is a linear process that requires every file to be read from disk so a checksum can be calculated; when performing fixity over large amounts of content, this is very inefficient and time consuming. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 7 preservation activities in the 20th-century stack in this model of infrastructure, many cultural heritage institutions are relying on practices created when the field of digital preservation was emerging. basic activities basic preservation activities take a generalized approach and mostly occur in the costly application and microservices layer. this follows the general approach of application development from the commercial sector in the 20th century. fixity although there are differences in frequency, most organizations do not currently make distinctions between transactional fixity, authentication fixity, or fixity-at-rest. common current practices use the same method (md5, sha-256, sha-512) for all fixity checks.19 this inefficient approach take place in the application and microservices layer and uses more compute power than necessary, increasing the environmental impact. replication in most 20th-century stacks, replications are handled in the application layer, where it is most costly in terms of computational power and labor to maintain, having a negative impact on sustainability. some are using 20th-century microservices are well. advanced digital preservation activities like basic preservation activities, advanced ones chiefly take place in the application and microservices layer if they occur at all. metadata extraction and file format conversion metadata extraction and file format conversion tends to occur only upon ingest as a one-time event. archivematica, the popular open-source digital preservation system, uses 20th-century microservices for each and they only occur during the transfer (ingest) process.20 other systems often include this in the business logic of the application layer. versioning version control is a feature that many organizations choose not to implement. the 2019 ndsa storage infrastructure surveys shows that fewer than half (40) of respondents (83) used any type of version control.21 version control is hard to implement in a custom system, with alternative approaches. fedora, a digital preservation backend repository, introduced support for versioning in the application layer around 2004.22 advances in the commercial sector since the conceptualization of the 20th-century stack, there have been significant advancements made in the general it industry. virtualization technology developed in the 1990s led to the proliferation of cloud computing and infrastructure that transformed the it industry in the early 2000s, leading to the “long-held dream of computing as a utility” or commodity.23 clouds can be public, where anyone is able to provision and use services, or private, where services are only available to a group of authorized users. public clouds are run in commercial data centers while private clouds are typically built-in privately-owned data centers, though it’s possible to use commercial data centers to build private clouds. hybrid clouds are also possible, typically combing private and public clouds, or combining on-premises infrastructure with a private or public cloud. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 8 in 2009, researchers at uc berkley identified three strong reasons why cloud computing has been so widely adopted: the illusion of vertical scaling on demand, elimination of upfront cost, and the ability to pay for short-term resources.24 surveys from the ndsa and the beyond the repository grant project show a steady, but slow adoption of cloud infrastructure by the cultural heritage community.25 it is unclear whether early adopters have chosen independently or simply followed it changes in their parent organizations. any organization can build a private cloud and take advantage of the benefits described in this article. using the cloud does not mean that you must contract with commercial cloud providers. some organizations may choose to build a private cloud if there are concerns over data sovereignty, mistrust in public clouds, or for other reasons. the ontario council of university libraries in canada has built a private cloud for its members called the ontario library research cloud using openstack, a suite of open-source software for building clouds.26 software-defined storage while virtualization enables cloud computing, software-defined storage is the foundation for cloud storage. software-defined storage combines inexpensive hardware with software abstractions to create a flexible, scalable, storage solution that provides data integrity.27 software-defined storage can use the same pool of disks to present all three of the common types of storage: file, object, and block. file storage is what most users are familiar with. software defined file storage creates a network file share from which files can be accessed on local devices via a filesystem.28 object storage in this environment is like a web-native file share; files are stored in buckets, which can be further organized by folders. files are not accessed through a filesystem, but are instead accessed through uris, which makes object storage very amenable to web applications and avoids some of the pitfalls of relying on filesystems. block storage is mostly used to mount storage to virtual servers, storage that is directly attached to the server as if it was a physical disk or volume mounted to the server. block storage is more performant than either file or object storage; as such it’s typically used for things like the operating system and application code, but not for storing content. all storage can be managed through apis, adding to its suitability for automation, software development, and it operations.29 hardware diversity software defined storage also has features that make a compelling use case for digital preservation. first, software defined storage accommodates hardware diversity. because software defined storage is an abstraction, it’s possible to combine different types of storage media, from different manufacturers and production batches to ensure some technical diversity and avoid risk from catastrophic failure from a hardware monoculture. fixity and integrity second, like the use of raid in traditional filesystems, file integrity can be strengthened through the use of erasure coding.30 erasure coding splits files into chunks and spreads them across multiple disks or potentially nodes such that the file can be reconstructed if some of the disks or nodes fail. this can be configured in different ways, depending on the amount of parity desired.31 replication third is replication of content. for cloud administrators, replication might be an alternative to erasure coding for ensuring data integrity; for digital preservationists, it’s a distinct strategy and information technology and libraries december 2021 a 21st century technical infrastructure | tallman 9 basic preservation activity. operating nodes in a software defined storage network can be in different availability zones; through object storage policies, content can be replicated as many times as necessary to provide mitigation of geographic based threats. it’s even possible to replicate to object storage in a different software defined storage network, helping to achieve organizational diversity as well. figure 2. the infrastructure architecture for a theoretical 21st-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 10 an updated technical infrastructure for the 21st century a theoretical 21st-century stack for digital preservation has many of the same components as its 20th-century antecedent. however, these components are used in different ways, largely due to technological advancements. leveraging these advancements to handle digital preservation activities at lower levels of the stack reduces the complexity of the business logic in the application layer. figure 2 shows an updated architecture diagram for this 21st-century stack, which may be used by an individual organization, consortium, or service provider planning to build a digital preservation system. the storage layer is built on software-defined storage with digital content primarily being stored as objects; these objects are stored using the oxford common file layout (discussed further later). physical bare metal servers are used to power virtual machines that host applications such as a digital repository. physical servers also host a container and function as a service to provide a suite of microservices for processing digital content. storage in this new stack, storage is primarily managed through software defined storage with data flowing over networks. there are currently two primary open-source options for running a software-defined storage service: gluster and ceph. both can be installed and run on-premises, in a private or public data center, or even contracted through infrastructure as a service (iaas). in his presentation at the 2018 designing storage architectures for digital collections meeting, hosted by the library of congress, glenn heinle recommended ceph over gluster where data integrity is the highest priority; however, others argue that gluster is better for long-term storage.32this is likely because ceph is better able to recover from hardware failures.33 file storage reliance on file storage has become minimal in this theoretical stack, with data primarily stored as objects. however, file storage may still be used; when it is, it benefits from using a modern filesystem. several modern filesystems have emerged since 2000, most notably zfs and openzfs34 with their innovative copy-on-write transactional model and methods for managing free space.35 both zfs and openzfs can also be configured to use raid-z, which maintains blocklevel fixity by calculating checksums for each block of data and verifying the checksum when accessed. this can be combined with simple software to touch every block on a regular basis to ensure fixity-at-rest. although this is a different approach than file-level fixity checks, it accomplishes the same thing in a much more efficient method: preservation metadata could be recorded for each block that contains part of the file.36 zfs has also inspired similar modern filesystems such as btrfs, apple file system (apfs), refs, and resier.37 however, even if this theoretical stack isn’t relying on file storage to persist data, software-defined storage is an abstraction that sits atop servers and disks (or tape) that do use filesystems.38 ironically, zfs is not the best option for the underlying disks as its data integrity features come with more overhead and data integrity can be achieved through different means with softwaredefined storage.39 block storage block storage comes in two forms in this future stack. many virtual servers will leverage the block storage offerings of the software defined storage service, attaching virtual disk blocks to virtual servers. however, the physical servers that support virtualization will still have some physically information technology and libraries december 2021 a 21st century technical infrastructure | tallman 11 attached storage using ssds (through nvme) to support high performance storage needs. this physically attached block storage is more performant than virtually attached block storage since the system has direct access to the disks and does not have to work through a virtually abstracted filesystem. object storage object storage has become the primary method of storing data in this theoretical stack. the flexibility of object storage, with its web-native apis and authentication, gives it an advantage as systems become less centralized and more integrations are needed. the natural scalability of object storage and the variety of private, public, and commercial offerings greatly simplifies geographic and organizational redundancy when replicating data. with software-defined storage, it’s also possible to offer hot (live) and cold (nearline, offline) options, giving flexibility for how data is stored to better optimize the storage for various needs. hot storage may use either hard disk or solid-state drives while cold storage would rely on tape or optical media. presently, options for running software defined storage on tape and optical media are mostly proprietary.40 while this would be a concern if these systems held the only copy, if the data is replicated to systems using other technology and media, this risk can be managed. while optical media has long been criticized for use as a preservation media, when well-managed, the risk may be overstated.41 oxford common file layout the oxford common file layout (ocfl) is a “shared approach to filesystem layouts for institutional and preservation repositories.”42 ocfl is a specification for organizing digital objects in a way that supports preservation while being computationally efficient. it has several advantages for use in digital preservation, such as the ability to rebuild a repository with only the files, it’s both human and machine readable, supports native error detection, allows objects to be efficiently versioned, and is designed to work with a variety of storage infrastructures.43 although some implementation details are still being worked out, ocfl can be used with object storage.44 ocfl is in production use and client libraries are available for javascript, java, ruby, and rust.45 in this future stack, all storage operations are handled by an ocfl client, which then interacts with the underlying software defined storage network as shown in figure 2. servers physical servers are used chiefly to support virtualization in this future stack. however, this stack moves beyond virtual servers and supports containers and serverless computing. virtual servers are chiefly used to support applications and databases while containers are perfectly suited for microservices running preservation activities. serverless, or function-as-a-service, is the next evolution in virtualization. while a container may be idling all the time, spinning into action when a microservice is called, serverless functions are run on-demand only. they can cost less when using commercial services as aws lambda or aws fargate where the customer is billed for usage only.46 though serverless functions can make use of containers, function-as-a-service platforms have emerged, such as apache openwhisk and openfaas that don’t always require containers. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 12 preservation activities in the 21st-century stack this 21st-century stack performs the same preservation activities as its predecessor. however, it generally does this at lower levels of the stack, in the infrastructure layers as opposed to the application and microservice layers. this change reduces the computational load on the stack and simplifies the business logic. basic activities fixity and replication are achieved leveraging a combination of microservices and softwaredefined storage. by optimizing the approach to fixity for each use case, instead of using the same computationally intensive method for all fixity, organizations can use less compute power. while fixity and replication still involve the microservice layer, it is a more targeted approach. transactional fixity transactional fixity is maintained through a function-as-a-service-based microservice. each time a file is moved, the microservice is triggered, which calculates a md5 checksum and compares it to a stored value that was created upon ingest. if there is a mismatch between the md5 values, a second microservice is called that fetches a valid file replica. while crc32 might be preferred (because it’s slightly less cpu-intensive), box has shown that crc32 values can differ depending on how they are calculated.47 a stored crc32 can only be reliably used to confirm fixity if the new calculation uses the exact same method because crc32 not a true specification—such as md5— and implementations may differ. crc32 is recommended only when it’s possible to calculate in the same manner, such as the same microservice. as this introduces technical complexity, some organizations may prefer to rely solely on md5 for transactional fixity. authentication fixity authentication fixity is maintained in much the same way as in the 20th-century model, except using a cryptographically secure checksum algorithm, such as sha-512 (part of the sha-2 family). however, distinguishing between transactional vs. authentication fixity allows more precise use of algorithms, only requiring more computationally intensive cryptography when it’s truly needed. authentication fixity may require the use of a container-based microservice, versus a function-asa-service, due to the increased computational need. fixity-at-rest fixity-at-rest, the most common type of fixity, is managed by the software-defined storage service and reported in preservation metadata. how this is achieved might look different, depending on which software-defined storage service is used. the ceph community has developed a new technology called bluestor which serves as a custom storage backend that directly interacts with disks, essentially replacing the need to use an underlying filesystem.48 bluestor calculates checksums for every file and verifies them when read. because this is all internal and managed by the same system, crc32 is the default algorithm, but multiple algorithms are supported. ceph will “scrub” data every week. scrubbing is the process of reading the file simply to verify the checksum, even if no user has accessed the file. because of the way ceph performs erasure coding, if a checksum is invalid, the file can be repaired. what remains to be done is writing a script that will read ceph’s internal metadata and record preservation events within the object’s preservation metadata for the fixity verification and any reparative actions. ryu and park propose a “markov failure and repair model” to optimize the frequency of data scrubbing and number of replicas such that the least amount of information technology and libraries december 2021 a 21st century technical infrastructure | tallman 13 power is consumed and that scrubbing occurs at off-peak times.49 it appears that this optimization causes less media degradation from the process of regularly reading the file, though empirical studies are needed to confirm that there is overall less degradation than conducting fixit checks through an application. gluster has a similar scrubbing process for fixity-at-rest in the optional bitrot feature, although it uses sha-256 by default, instead of crc32, which requires more computing power.50 replication replication in this future stack is mostly handled by the software-defined storage service, but microservices may play a role in achieving independence of copies.51 object storage policies allow the automatic copying of data into another region or availability zone that is within the software defined storage network. however, these copies are not replicas, or independent instances, because all copies are in a chain derived from the primary instance; if there is a problem anywhere in the chain, bad data will be copied. in addition to using object storage policies, microservices could be used to independently verify the fixity of downstream copies as well as trigger true replications to independent instantiations, such as an alternative storage service or different storage area within the same software defined storage network. bill branan suggested a similar approach in his cloud native preservation presentation at ndsa digital preservation 2019.52 advanced digital preservation activities advanced digital preservation activities in a 21st-century stack also make use of microservices for metadata extraction and file format conversion. versioning, however, relies upon features of the oxford common file layout, even though object storage often supports versioning natively. metadata extraction function-as-a-service microservices are well suited to metadata extraction, actuated upon ingest or as needed. since embedded metadata is machine-readable, this activity will not be resource intensive or time consuming. in addition to extracting metadata and storing it as discrete, sidecar files, these microservices can be used to populate specific metadata fields used by the repository, including descriptive, administrative, and technical metadata. this approach is more efficient as it gives flexibility to reuse the functions in multiple workflows as opposed to specific events like ingest. file format conversion file format conversions use a combination of function-as-a-service and container-based microservices, depending upon the original format. like metadata extraction, conversion may occur at ingest (normalization) or as needed (migration). function-as-a-service is well suited for small to medium files, such as converting wordperfect to opendocument format. function-as-aservice is also well suited for logical preservation, when only the informational content is necessary to preserve, such as converting a tif to a txt file through ocr. container-based microservices are better suited for converting large media files that may take more memory and time; function-based services often have a time constraint, for example, migrating proprietary encoded digital video to open codecs and container formats. versioning although object storage typically supports versioning, it is inefficient because each version is an entirely new object. this means that unchanged data is duplicated, taking up more disk space. the oxford common file layout, which negotiates storage between the application and microservices information technology and libraries december 2021 a 21st century technical infrastructure | tallman 14 layers and a software defined storage service, supports a forward delta versioning in which each new version only contains the changes. using the object inventories, it’s possible to rebuild any object to any version without duplicating bits.53 an additional benefit of using ocfl is that it inherently uses checksums, and any changes or corruption are detected when an interaction occurs with the object, creating a layered approach to maintaining fixity-at-rest. sustainability in the 21st-century stack the differences between our 20thand 21st-century stacks result in a more sustainable approach to digital preservation, per the triple-bottom-line definition.54 by adopting commercial sector approaches, cultural heritage organizations can more efficient data centers consumers. people (labor) by shifting activities to lower levels in the stack and letting infrastructure components selfmanage, fewer people are needed to develop and maintain the business logic that formerly handled the same action. the application and microservice layers use programming languages and libraries that can become outdated quickly, requiring development work to maintain functionality. while there is still a need for specialized knowledge, fewer people are needed to do the work when these actions take place in more stable parts of the stack. planet (environmental) our new stack has a lower environmental impact for a variety of reasons. first, per kryder’s law (the storage parallel to moore’s law for computing), the areal density of storage media predictably increases annually, and our new stack uses the latest hard disk and tape technology.55 this results in needing less media, some of which doesn’t need power to run, decreasing the carbon impact. additionally, our new stack uses a mix of hot and cold storage, making it possible to implement automatic tiering to shift objects to less resource-intensive storage, like tape.56 second, as the stack becomes more serverless, fewer computational resources are needed. even though container and function-based microservices may incur some overhead in terms of cpu cycles, it is more efficient in terms of system idling to be running these as microservices on one platform rather than doing the same action in the application or vm layer. this further decreases the carbon impact and while also decreasing the dependency on rare-earth elements. relatedly, by leveraging software-defined storage to maintain fixity-at-rest, the compute load is greatly decreased; the cpu cost to calculating checksums in the storage layer is less than when this is done in the through applications or microservices. profit (economic) sustainability improvements for both people and planet may also result in a lower total cost of ownership for a digital preservation system. cost is a prime motivator when administers and leaders make long term decisions, decreasing annual operating cost related to digital preservation is crucial to the viability of a program. future and related work the 21st-century stack proposed in this paper is not the only way to increase sustainability or the only way in which digital preservation stacks will change. the planet is running out of bandwidth and will need to expand into using 5g and low-earth orbit satellite communications. new, quantum-resistant algorithms will need to be introduced as quantum computing advances.57 information technology and libraries december 2021 a 21st century technical infrastructure | tallman 15 blockchain technology introduces many possibilities. inherently, blockchain maintains fixity. the archangel project is exploring practical methods of recording provenance and proving authenticity by using a permissioned blockchain.58 blockchain is also the technology behind the interplanetary file system (ipfs), a content-addressed distributed storage network, that is in turn used by filecoin, a marketplace for an ipfs storage. small data industries is building starling, a filecoin-based application designed for cultural heritage organizations to securely store digital content.59 it’s important to note that these blockchain-based projects use a proof-of-stake model instead of a proof-of-work model, which has a significantly lower environmental impact than other blockchain implementations like the cryptocurrency bitcoin.60 while some organizations, like stanford university, may already leverage software-defined storage, most in the cultural heritage sector are not.61 the metaarchive cooperative, a nonprofit consortium for digital preservation, has begun a noteworthy project to explore using softwaredefined storage in a distributed digital preservation network. metaarchive, which currently uses lockss, is one of the few public digital preservation services that mitigates risk through organizational and administrative diversity. because members host and administer the lockss nodes that contain the replications, each copy is managed by a different set of organizational and administrative policies and often use different technology to do so. diversifying in this way protects against a single point of failure if only one organization managed the technical infrastructure. how this same diversity is achieved in a software-defined storage-based distributed digital preservation network will be a great contribution to the community. it would also be useful to study the reasons cultural heritage organizations have been so reluctant to adopt commercial sector technologies. identifying these hesitations would make it possible to create strategies that would encourage adoption of these approaches. it may simply be that when it comes to digital preservation, familiar and proven technologies provide a level of comfort. organizations may also be entrenched in custom developed solutions that are hard to move away from. conclusion digital preservation is a long-term commitment. while re-appraisal may take place, it’s inevitable that the amount of content stored in digital repositories will only increase over time. it is fiduciarily incumbent upon the cultural heritage community to examine our practices and look for better alternatives. exceptionalism ignores technological advancements made by the commercial industry, advancements that are very well suited to digital preservation. by adopting commercial industry data practices, such as software-defined storage, while simultaneously implementing innovations from within the cultural heritage community, like the oxford common file layout, it is possible to reduce the ongoing costs, resource consumption, and environmental impact of digital preservation. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 16 endnotes 1 ben goldman, “it’s not easy being green(e): digital preservation in the age of climate change,” in archival values: essays in honor of mark a. greene, ed. mary a. caldera and christine weidman (chicago: american library association, 2018), 274–95, https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p. 2 “a simple explanation of the triple bottom line,” university of wisconsin sustainable management, october 2, 2019, https://perma.cc/2hwf-3mmq. 3 goldman, “it’s not easy being green(e).” 4 keith l. pendergrass et al., “toward environmentally sustainable digital preservation,” the american archivist 82, no. 1 (2019): 165–206, https://doi.org/10.17723/0360-9081-82.1.165. 5 sarah barsness et al., 2017 fixity survey report: an ndsa report (osf, april 24, 2018), https://doi.org/10.17605/osf.io/snjbv; david s. h. rosenthal et al., “requirements for digital preservation systems: a bottom-up approach,” d-lib magazine 11, no. 11 (2005), https://perma.cc/x2r7-r5xp. 6 matthew addis, which checksum algorithm should i use? (dpc technology watch guidance note, digital preservation coalition, december 11, 2020), https://doi.org/10.7207/twgn20-12. 7 premis editorial committee, premis data dictionary for preservation metadata, version 3.0 (library of congress, november 2015), https://perma.cc/l79v-gqv7. 8 some organizations may continue to use a strategy where fixity is only checked when a file is accessed, if the potential loss fits within a defined acceptable loss. while this strategy may not work for all organizations, recognizing that loss is inevitable and defining a level of acceptable loss is an effective and pragmatic approach to managing risk of bit decay. 9 barsness et al., 2017 fixity survey report. 10 ndsa levels of preservation revisions working group, “levels of digital preservation, 2019 lop matrix, v2.0 (osf, october 14, 2019), https://osf.io/2mkwx/. 11 sibyl schaefer et al., “user guide for the preservation storage criteria,” february 25, 2020, https://doi.org/10.17605/osf.io/sjc6u. 12 mark carlson et al., “software defined storage,” (white paper, storage network industry association, january 2015), https://perma.cc/aq4t-9yxq. 13 abutalib aghayev et al., “file systems unfit as distributed storage backends” (proceedings of the 27th acm symposium on operating systems principles—sosp ’19, huntsville, ontario, canada: association for computing machinery, 2019): 353–69, https://doi.org/10.1145/3341301.3359656. 14 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey: results of the storage infrastructure survey (osf, 2020), https://doi.org/10.17605/osf.io/uwsg7. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 17 15 joseph migga kizza, “virtualization technology and security,” in guide to computer network security, computer communications and networks (springer, cham, 2017), 457–75, https://doi.org/10.1007/978-3-319-55606-2_21. 16 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 17 eric jonas et al., “cloud programming simplified: a berkeley view on serverless computing” (university of california, berkeley, february 10, 2019), https://perma.cc/yam2-tz8w. 18 alex garnett, mike winter, and justin simpson, “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass., 2018), https://doi.org/10.17605/osf.io/y4z3e. 19 barsness et al., 2017 fixity survey report. 20 “import metadata,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/ue3r-bdgz.; “ingest,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/5sn5-gfx3. 21 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 22 “fedora content versioning,” 2005, https://duraspace.org/archive/fedora/files/download/2.0/userdocs/server/features/version ing.html. 23 michael armbrust et al., above the clouds: a berkeley view of cloud computing, (technical report, eecs department, university of california, berkeley, february 10, 2009), https://perma.cc/qj5w-8s5y. 24 armbrust et al., above the clouds. 25 micah altman et al., “ndsa storage report: reflections on national digital stewardship alliance member approaches to preservation storage technologies,” d-lib magazine 19, no. 5/6 (may 2013), https://doi.org/10.1045/may2013-altman; michelle gallinger et al., “trends in digital preservation capacity and practice: results from the 2nd bi-annual national digital stewardship alliance storage survey,” d-lib magazine 23, no. 7/8 (2017), https://doi.org/10.1045/july2017-gallinger; ndsa storage infrastructure survey working group, 2019 storage infrastructure survey; evviva weinraub et al., beyond the repository: integrating local preservation systems with national distribution services (northwestern university, 2018), https://doi.org/10.21985/n28m2z. 26 ontario council of university libraries, “ontario library research cloud,” accessed april 14, 2021, https://perma.cc/kmp9-fs8k; “open source cloud computing infrastructure,” openstack, accessed april 14, 2021, https://perma.cc/g9ge-92jd. 27 nathan tallman, “software defined storage,” (presentation for the ndsa infrastructure interest group, march 16, 2020), https://doi.org/10.26207/3nn2-zv13. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 18 28 these network shares typically use the smb (server message block) or cifs (common internet file system) protocols to present file shares through a graphical user interface in operating systems such as windows or macos while the nfs (network file shares) protocol is more often used to mount storage in linux. 29 carlson et al., “software defined storage.” 30 raid, or the redundant array of independent disks, is technology that splits a file into multiple chunks and spreads them across multiple disks in a storage device, adding extra copies of the chunks so that the file can be recovered if an individual drive fails. 31 abhijith shenoy, “the pros and cons of erasure coding & replication vs raid in next-gen storage platforms” (software developer conference, storage networking industry association, 2015), https://perma.cc/yfs5-kxkk. 32 glenn heinle, “unlocking ceph” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/z2r9-79ze; tamara scott, “big data storage wars: ceph vs gluster,” technologyadvice (blog), may 14, 2019, https://perma.cc/2yy2-bbxg. 33 giacinto donvito, giovanni marzulli, and domenico diacono, “testing of several distributed file-systems (hdfs, ceph and glusterfs) for supporting the hep experiments analysis,” journal of physics: conference series 513, no. 4 (june 2014): 042014, https://doi.org/10.1088/1742-6596/513/4/042014. 34 matthew ahrens, “openzfs: a community of open source zfs developers,” in asiabsdcon 2014 (asiabsdcon, tokyo, japan: bsd research, 2014), 27–32, https://perma.cc/xg79-pbu7. 35 brian hickmann and kynan shook, “zfs and raid-z: the über-fs?” (university of wisconsin– madison, december 2007), https://perma.cc/w5pd-enpp. 36 garnett, winter, and simpson, “checksums on modern filesystems.” 37 edward shishkin, “resier5 (format release 5.x.y),” marc mailing list archive, 2019, https://perma.cc/dn8y-v8kq. 38 “fujifilm launches ‘fujifilm software-defined tape,’” fujifilm europe, may 19, 2020, https://perma.cc/b3gn-plr9. 39 aghayev et al., “file systems unfit as distributed storage backends.” 40 ibm systems, “tape goes high speed,” 2016, https://perma.cc/fnv9-rtg9; “fujifilm launches ‘fujifilm software-defined tape’”; desire athow, “here’s what sony’s million gigabyte storage cabinet looks like,” techradar (blog), 2020, https://perma.cc/vhn4-layt. 41 david rosenthal, “optical media durability: update,” dshr’s blog, august 20, 2020, https://perma.cc/vkw9-83j3. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 19 42 andrew hankinson et al., “the oxford common file layout: a common approach to digital preservation,” publications 7, no. 2 (june 2019): 39, https://doi.org/10.3390/publications7020039. 43 andrew hankinson et al., “oxford common file layout specification,” july 7, 2020, https://perma.cc/s73z-3n6k. 44 marco la rosa et al., “our thoughts on ocfl over s3 · issue #522 · ocfl/spec,” github, accessed march 12, 2021, https://perma.cc/pa3g-cb78. 45 hannah frost, “version 1.0 of the oxford common file layout (ocfl) released,” stanford libraries (blog), july 23, 2020, 1, https://perma.cc/5j5m-gyqw; andrew woods, “implementations | ocfl/spec,” github, february 10, 2021, https://github.com/ocfl/spec. 46 while serverless might be the ultimate microservice, requiring the least amount of overhead, costs may still be hard to predict. 47 ryan luecke, “crc32 checksums; the good, the bad, and the ugly,” box blog, october 12, 2011, https://perma.cc/mvp7-yvzv. 48 aghayev et al., “file systems unfit as distributed storage backends.” 49 junkil ryu and chanik park, “effects of data scrubbing on reliability in storage systems,” ieice transactions on information and systems e92-d, no. 9 (september 1, 2009): 1639–49, https://doi.org/10.1587/transinf.e92.d.1639. 50 raghavendra talur, “bitrot detection | gluster/glusterfs-specs,” github, august 15, 2015, https://github.com/gluster/glusterfsspecs/blob/fe4c5ecb4688f5fa19351829e5022bcb676cf686/done/glusterfs%203.7/bitrot.m d. 51 schaefer et al., “user guide for the preservation storage criteria.” 52 bill branan, “cloud-native preservation” (osf, october 22, 2019), https://osf.io/kmdyf/. 53 andrew hankinson et al., “implementation notes, oxford common file layout specification,” july 7, 2020, https://perma.cc/pvf8-sqfn. 54 although out of scope in terms of the stack, the policies and practices implemented by organizations can have a direct impact on digital preservation sustainability. for example, appraisal can be the most powerful tool available to an organization to control the amount of content being preserved. despite storage vendors proclamations that storage is cheap, digital preservation is not. it is not wise nor necessary to keep every digital file. organizations will benefit from applying flexible appraisal systems that reduce the amount of content needing preservation, but also establishing different classes of preservation so the most advanced activities are only applied as needed. additionally, organizations should consider allowing lossy compression to decrease disk usage, where appropriate; compression as an appraisal choice is very similar to choosing to sample a grouping of material rather than preserving the whole. for additional information see nathan tallman and lauren work, “approaching information technology and libraries december 2021 a 21st century technical infrastructure | tallman 20 appraisal: framing criteria for selecting digital content for preservation,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass.: osf, 2018), https://doi.org/10.17605/osf.io/8y6dc. 55 david rosenthal, “cloud for preservation,” dshr’s blog, 2019, https://perma.cc/zls9-r857. 56 pendergrass et al., “toward environmentally sustainable digital preservation.” 57 henry newman, “industry trends” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/3mgk-n5u3. 58 t. bui et al., “archangel: trusted archives of digital public documents,” in proceedings acm document engineering 2018 (association for computing machinery, arxiv.org, 2018), https://doi.org/10.1145/3209280.3229120. 59 ben fino-radin and michelle lee, “[starling]” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/7lguuew9. 60 for additional information on the differences of proof-of-stake vs. proof-of-work models, see peter fairley, “ethereum plans to cut its absurd energy consumption by 99 percent,” ieee spectrum (blog), january 2, 2019, https://perma.cc/gch7-t556. 61 julian morley, “storage cost modeling” (presentation, pasig, mexico city, mexico, 2019), https://doi.org/10.6084/m9.figshare.7795829.v1. communications roel some considered 2000 the year of the e-book, and due to the dot-com bust, that could have been the format’s highwater mark. however, the first quarter of 2004 saw the greatest number of e-book purchases ever with more than $3 million in sales. a 2002 consumer survey found that 67 percent of respondents wanted to read e-books; 62 percent wanted access to e-books through a library. unfortunately, the large amount of information written on e-books has begun to develop myths around their use, functionality, and cost. the author suggests that these myths may interfere with the role of libraries in helping to determine the future of the medium and access to it. rather than fixate on the pros and cons of current versions of e-book technology, it is important for librarians to stay engaged and help clarify the role of digital documents in the modern library. a lthough 2000 was unofficially proclaimed as the year of the electronic book, or e-book, due in part to the highly publicized release of a stephen king short story exclusively in electronic format, the dot-com bust would derail a number of high-profile e-book endeavors. with far less fanfare, the e-book industry has been slowly recovering. in 2004, e-books represented the fastest-growing segment of the publishing industry. during the first quarter of that year, more than four hundred thousand e-books were sold, a 46 percent increase over the previous year ’s numbers.1 e-books continue to gain acceptance with some readers, although their place in history is still being determined—fad? great idea too soon? wrong approach at any time? the answers partly depend on the reader ’s perspective. the main focus of this article is the role of e-book technologies in libraries. libraries have always served as repositories of the written word, regardless of the particular medium used to store the words. from the ancient scrolls of qumran to the hand-illuminated manuscripts of medieval europe to the familiar typeset codices of today, the library’s role has been to collect, organize, and share ideas via the written word. in today’s society, the written word is increasingly encountered in digital form. writers use word processors; readers see words displayed; and researchers can scan countless collections without leaving the confines of the office. for self-proclaimed book lovers, the digital world is not necessarily an ideal one. emotional reactions are common when one imagines a world without a favorite writing pen or the musty-smelling, yellowed pages of a treasured volume from youth. one of the battle lines between the traditional bibliophile and the modern technologist is drawn over the concept of the e-book. some see this digital form of written word as an evolutionary step beyond printed texts, which have been sometimes humorously dubbed tree-books. although a good deal of attention has been generated by the initial publicity regarding newer e-book technologies, the apparent failures of most of them has begun to establish myths around the concept. abram points out that the relative success of e-books in niche areas (such as reference works) is in direct contrast with public opinion of those purchasing novels and popular literature through traditional vendors.2 crawford paraphrases lewis carroll in describing this confusion: “when you cope with online content about e-books, you can believe six impossible things before breakfast.”3 incidentally, this article will attempt to dispel a mere five of the myths about e-books. the future of e-books and the critical role of libraries in this future are best served by uncovering these myths and seeking a balanced, reasoned view of their potential. a 2002 consumer survey on e-books found that 67 percent of respondents wanted to read an e-book, and 62 percent wanted that access to be from a library.4 underlying this position is the assumption that the ideas represented by the written word are of paramount importance to both writers and readers. it is also assumed that libraries will continue their critical role in collecting, organizing, and sharing information. � myth 1—e-books represent a new idea that has failed many libraries have invested in various forms of e-book delivery with mixed results.5 sottong wisely warns of the premature adoption of e-book technology, which he dubs a false pretender as a replacement to printed texts.6 however, the last five years are but a small part of a longer history, and presumably, a still longer future as is often the case with computer jargon, the term e-book has emerged and gained currency in a very short amount of time. however, the concept of providing written texts in an electronic format has existed for a long time, as demonstrated by bush’s description of the dispelling five myths about e-books james e. gall james e. gall (james.gall@unco.edu) is assistant professor of educational technology at the university of northern colorado, greeley. dispelling five myths about e-books | gall 25 26 information technology and libraries | march 2005 memex.7 the gutenberg project put theory into practice by converting traditional texts into digital files as early as 1971.8 even if the e-book merely represents the latest incarnation of the concept, it does so tenuously. books in their present form have a history of hundreds of years, or thousands if their parchment and papyrus ancestors are included. this history is rich with successes and failures of technology. for example, petroski presents an interesting historical examination of the problem of storing books when the one book–one desk model collapsed under the proliferation of available texts.9 similarly, a determination on the success or failure of e-books, or digital texts, based upon a relatively short period of time, is fraught with difficulty. rather, it is important to look at recent developments as merely a next step. the technology is clearly not ready for uncritical, widespread acceptance, but it is also deserving of more than a summary dismissal. � myth 2—e-books are easily defined the term e-book means different things depending on the context. at the simplest, it refers to any primarily textual material that is stored digitally to be delivered via electronic display. one of the confusing aspects of defining ebooks is that in the digital world, information and the media used to store, transfer, and view it are loosely coupled. an e-book in digital form can be stored on cd–rom or any number of other media and then passed on through computer networks or telephone lines. the device used to view an e-book could be a standard computer, a personal digital assistant (pda), or an e-book reader (the dedicated piece of equipment on which an e-book can be read; confusingly, also referred to as an e-book). technically, virtually any computing device with a display could be used as an e-book reader. from a practical point of view, our eyes might not tolerate reading great lengths of text on a wireless phone, and banks will not likely provide excerpts of chaucer during atm transactions. another important factor in defining e-books is the actual content. a conservative definition is that an e-book is an electronic copy or version of a printed text. this appears to be the predominant view of publishers. purists often maintain that a true e-book is one that is specifically written for that format and not available in traditional printed form.10 this was one of the categories of the shortlived (2000–2002) frankfurt e-book awards. of course, the multitude of textual materials that could be delivered via the technology exceeds these definitions. magazines, primary-source documents, online commentaries and reviews, and transcripts of audio or video presentations are just a short list of nonbook materials that are finding their way into e-book formats. one can note with some sense of irony that the technology behind the web was originally designed as a way for scientists to disseminate research reports.11 despite the web’s popularity, reading research reports makes up an exceedingly small percentage of its use today. although there is a continuing effort to reach a common standard for e-books (see www.openebook.org/), the current marketplace contains numerous noncompatible formats. this noncompatibility is the result of both design and competitive tradeoffs. in the case of the former, there is a distinct philosophical difference between formats that attempt to retain the original look and navigation of the printed page (such as adobe’s popular pdf files) versus those that retain the text’s structure but allow variability in its presentation (as best exemplified by the free-flowing nature of texts presented as html pages). this difference can also be seen in the functionality built around the format. traditional systems provide readers with familiar book characteristics such as a table of contents, bookmarks, and margin notes, a view that could be named bibliocentric. the alternative is one that takes more advantage of the new medium and could be labeled technocentric, and can most easily be seen in the extensive use of hyperlinking.12 the simplest use of hyperlinking provides an easy form of annotating texts and presenting related texts. on the other extreme, hyperlinks are used in the creation of nonlinear texts in which the followed links provide a unique context for building meaning on the part of the reader.13 it is interesting to note that a preliminary study of e-book features found that the most desirable features tended to reflect the functionality of traditional books and the least desirable features provided functionality not found there.14 competitive tradeoffs are a critical issue at the current point of e-book development. the current profit models of publishing entities and copyright concerns of authors seem naturally opposed to e-book formats in which texts were freely shared, duplicated, and distributed. for example, the open ebook forum is the most prominent organization devoted to the development of standards for e-book technologies. in late 2004, their web site listed seventy-six current members. although the american library association is a member, it is one of only six members representing library-oriented organizations. in comparison, thirty-five members (or 46 percent) are publishing organizations, and thirteen (or 17 percent) are technology companies.15 the number of traditional publishers versus technology companies on this list may suggest that a bibliocentric view of ebooks would be more favored. this also appears to confirm one media prediction that traditional publishers would continue to dominate efforts with this new medium.16 however, the limited representation of libraries in this endeavor is troubling (despite the disclaimer of using an admittedly rough metric for measuring impact). it is clear that many industry formats attempt to limit the ability to distribute materials by keying files so that they may only be viewed on one device or a specific installed version of the reader software. this creates technological problems for entities like libraries that attempt to provide access to information for various parties. the concept of fair use of copyrighted materials has to be reexamined under an entirely new set of assumptions. another irony is that the availability of free, public-domain materials in e-book format can be viewed as negative by the publishing industry. after investing considerable time and effort in developing e-book technology, publishers would prefer that users continue purchasing new e-book material rather than spend time reading the vast library of free historical material. many of these content issues are currently being played out in courts and the marketplace, particularly with regard to digital music and video.17 although one can humorously imagine the so-called problems associated with a population obsessed with downloading and reading great literature, the precedents set by these popular media will have a direct impact on the future of digital texts. despite the labor required to scan or key entire print books into digital formats, there have been some reports of this type of piracy.18 other models for the dissemination of digital intellectual property that are not determined by traditional material concerns of supply and demand will continually be attempted. for example, nelson predicted a hypertext-publishing scheme in which all material was available, but royalties were distributed according to actual access by end users.19 theoretically, such a system would provide a perfect balance between access and profitability. in nelson’s words “nothing will ever be misquoted or out of context, since the user can inquire as to the origins and native form of any quotation or other inclusion. royalties will be automatically paid by a user whenever he or she draws out a byte from a published document.”20 � myth 3—e-books and printed books are competing media many, if not most, published articles regarding e-books follow classic plot construction; the writer must present a protagonist and an antagonist. bibliophiles place the printed page as the hero and the e-book as the potential bane of civilization. proulx, one such author, was quoted as saying, “nobody is going to sit down and read a novel on a twitchy little screen—ever.”21 technologists cast the e-book as the electronic savior of text, replacing the tired tradition of the printed word in the same way the printed word replaced oral traditions. hawkins quotes an author who claims that e-books are “a meteor striking the scholarly publication world.” his slightly more restrained view was that e-books had the potential “to be the most far-reaching change since gutenberg’s invention.”22 grant places this metaphorical battle at the forefront by titling an article “e-books: friend or foe?”23 before deciding which side to take, consider whether this clash of media is an appropriate metaphor. this author has introduced samples of current ebook technology in graduate classes he has taught. when presented with the technology as part of the coursework, students quickly declare their allegiances. bibliophiles most often suggest that the technology will never replace the love of curling up with a good book. the technologists will ask how many pages can be stored in the device and then fantasize about the types of libraries they can carry and the various venues for reading that they will explore. however, after a few weeks in using the devices, both groups tend to move to a middle ground of practical use. at that point, the discussion turns to what materials are best left on the printed page (usually described as pleasure reading) and what would be useful in e-book format (reference works, course catalogs, how-to manuals). other instructors have reported similar patterns of use.24 at this point, the observation is largely anecdotal, but it does call into question the perceived need for a decisive referendum on the value of e-books. the issue is not whether e-books will replace the printed word. the concern of librarians and others involved in the infrastructure of the book should be on developing the proper role for e-books in a broader culture of information. unless this approach is taken, the true goal of libraries—disseminating information to the public—will suffer. the gap between bibiliophile and technologist approaches can already be seen in the materials available in e-book format. the publishing industry in general treats the e-book as just another format, releasing the same titles in hardcover, book-on-tape, and e-book at the same time. on the opposite end of the spectrum, technologists have adopted various e-book formats for creating and transferring numerous reference documents. given their preferences, it is easy to find e-book references on unix, html coding, and the like, but there is a scarcity of materials in philosophy, history, and the arts. librarians seem the most appropriate group for developing shared understanding. publishers and e-book hardware and software manufacturers need to be concerned with the bottom line. libraries, by design, are concerned with the preservation of information and its continued dissemination long after the need to sell a particular book has passed. the hobby of creating and transferring texts to digital form is idiosyncratic and unorganized when viewed from the highest levels. libraries not only contain expertise in all areas of human endeavor, but also have strategies for categorizing and maintaining information in productive ways. in short, libraries are the best line of defense for maintaining the value of the printed page and promoting the value of digital texts. dispelling five myths about e-books | gall 27 28 information technology and libraries | march 2005 � myth 4—e-books are expensive a common complaint about e-books is that they are expensive. on the surface, this seems clear. dedicated ebook readers seemed to bottom out at around $300, and a new bestseller in e-book format is priced about the same as the hardcover edition. add the immediate and longterm costs of rechargeable batteries and the electricity needed to power them, and the economic case against the e-book appears closed. what if we turn the same critical eye to the printed page? the manufacture and distribution of printed texts is highly developed and astounding. when gutenberg succeeded in putting the christian bible in the hands of the moneyed public, he surely could not have comprehended the billions of copies that would eventually be distributed. even with the wealth of printed material at hand, one must still consider the high cost of the system. the law of supply and demand rules books as a tangible product. the most profitable books are those that will reach the most readers. specialized texts have limited audiences and, therefore, will usually be priced higher. this produces problems for both groups. popular texts must be printed in high quantities and delivered to various outlets. unfortunately, the printed page does have maintenance costs. sellen and harper point out that the actual printing cost is insignificant compared with the cost of dealing with documents after printing. they cite one study that indicated that united states businesses spend about $1 billion per year designing and printing forms, but spend an additional $25 to $35 billion filing, storing, and retrieving them.25 books are no different; as any librarian knows, it costs money to maintain a collection and protect texts from the environment and the effects of age. in the retail arena, the competition is fiercer. books that do not sell are removed in favor of those that do. it is estimated that 10 percent of texts printed each year are turned to pulp, although, fortunately, many are recycled.26 the bbc reported that more than two million former romance novels were used in the construction of a new tollway.27 with more specialized texts, the problem is not wealth, but scarcity. if a text is not profitable, it will probably become out of print. this is often synonymous with inaccessible. from the publisher’s perspective, it is only cost-effective to commit to a printing when the demand is high enough. a library is a good source of outof-print texts, provided that it has been funded appropriately to acquire and maintain the particular works that are needed. e-books are not a panacea. other innovations, such as on-demand publishing, may be part of the answer in solving the economic issues regarding collections. however, e-books can help alleviate some of these issues. e-books are easily copied and distributed, which is a boon to the researcher and information consumer. in many cases, the goal is the access to information, not the possession of a book. it could also benefit the author and publisher if appropriate reimbursement systems are put into place. as previously described, nelson originally envisioned his online hypertext system, xanadu, with a mechanism for royalties based on access—a supply-anddemand system for ideas, not materials.28 the systems used to manage access to digital materials continue to increase in complexity and have spawned a whole new business of digital rights management (drm).29 examples include reciprocal (www.reciprocal.com), overdrive (www.overdrive.com), and netlibrary (www.net library.com). libraries are the specific target of netlibrary, which promotes an e-books-on-demand project that allows free access for short periods of time.30 the creation of a standard digital object identifier (doi) for published materials may also help online publishers and entities like libraries manage their digital collections more easily.31 online music systems, such as apple’s itunes (www. itunes.com), strike a workable balance between quickand-easy access to music and a workable, economic model for reimbursing artists. e-books also have appeal for special audiences who already require assistive technologies for accessing print collections.32 having discussed the hidden costs of printed texts, another important economic issue of e-books to examine is a current trend in usage. despite the availability of dedicated e-book readers, the largest growth in e-book usage is surely in nondedicated devices. e-book–reading software is available for personal computers, laptops, and pdas. according to one source, microsoft had sold four million pocketpc e-book-enabled devices, and had two million downloads of the ms reader for the personal computer; palm had sold approximately 20 million ebook-enabled devices; and adobe had more than 30 million acrobat readers downloaded.33 these numbers alone indicate some 24 million reader-capable pdas, and 32 million reader-capable pcs, for a total of 56 million devices. although it is difficult to find data on actual use, one online bookseller reported some data on e-book use from an audience survey.34 although 88 percent had purchased books online, only 16 percent had read an e-book (11 percent using a pc, 3 percent on a handheld device, and 2 percent on both). it is presumed that in most cases this equipment was purchased for other reasons, with ebook reading being a secondary function. as such, it would be unfair to include the full cost of this equipment in any calculation of the cost of providing information in an e-book format. if so, the cost of providing artificial lighting in any building where reading takes place would need to be calculated as part of the cost of the printed page. the potential user base for the e-book rises as more computers and pdas are sold, decreasing the need for special equipment. this does not mean that the dedicated e-book reader is obsolete. by most commercial accounts, the apple newton was a failure. its bulky size and awkward interface were the subject of much ridicule. however, it did introduce the concept of the pda. the success of the palm line of products owes much to the proof of concept provided by the newton. the makers of the portable gameboy videogame system are repositioning it for multimedia digital-content delivery, and plan to pilot a flash-memory download system for various content types, including e-books.35 innovative products such as e-paper are already developed in prototype form.36 they are likely to lead to another wave of dedicated e-book readers or provide e-book–reading potential embedded in other consumer applications. � myth 5—e-books are a passing fad it is trendy to list the failures of past media (such as radio, film, and television) in impacting education despite great initial promise.37 however, all those media are still with us after having found particular niches within our culture. if the e-book is viewed as just an alternative format, comparisons with past experiences of library collections containing videotapes, record albums, and such are not appropriate.38 however, if e-books are viewed as a tool or way to access information, the questions change. instead of asking how digital formats will replace print collections, we can ask how will an e-book version extend the reach of our current collection or provide our readers with resources previously unavailable or unaffordable. when trying to locate a research article, one is generally not concerned with whether the local library has a loose copy, bound copy, microform, microfiche, or even has to resort to interlibrary loan. as long as the content is accessible and can be cited, it can be used. electronic access to journal content is becoming more common. perhaps dry journal articles do not conjure up the same romantic visions of exploring the stacks that may hinder greater acceptance of e-books. a parallel can be drawn to the current work of filmrestoration experts. the medium of film has reached an age where some of the earliest influential works no longer exist or are in a condition of rapid deterioration. according to one film site, more than half of the films made before 1950 have already been lost due to decay of existing copies.39 the work of restoration involves finding what remains of a great work in various vaults and collections. often, the only usable film is a secondor third-generation copy. from digitized copies, cleaning, color correction, and other painstaking work, a restored and—it is hoped—complete work emerges. ironically, once this laborious process is completed, a near-extinct classic is suddenly available to millions in the form of a dvd disc at a local retailer. what if the same attitude was taken with the world’s collections of printed materials? jantz has described potential impacts of e-book technology on academic libraries.40 lareau conducted a study on using e-books to replace lost books at kent state university, but found that limited availability and high costs did not make it feasible at the time.41 project gutenberg (www.gutenberg.net) and the electronic text center at the university of virginia (http://etext.lib.virginia.edu) are two examples of scholars attempting to save and share book content in electronic forms, but more efforts are needed. unfortunately, the shift to digital content has also contributed to the sheer volume of content available. edwards has recently discussed issues in attempting to archive and preserve digital media.42 the web may be suffering from a glut of information, but the content is highly skewed toward the new and technology oriented. in a few years, we may find that nontechnology–related endeavors are no longer represented in our information landscape. � conclusion the e-book industry is currently dominated by commercial-content providers, such as franklin, and software companies, most notably adobe, palm, and microsoft. traditional print-based publishers have also maintained continued interest in the medium. it is assumed that these publishers had the capital to weather the ups and downs of the industry more so than new publishers dedicated solely to e-book delivery. although the contributions and efforts of these organizations are needed, the future of e-book content should not be left to their largesse. when the rocket e-book device was initially released, a small but loyal following of readers contributed thousands of titles to its online library. some of these titles were self-published vanity projects or brief reference documents, but many were public-domain classics, painstakingly scanned or keyed in by readers wishing to share their favorite reads. when gemstar purchased rocket, the software’s ability to create non-purchased content was curtailed and the online library of free titles dismantled. apparently, both were viewed as limiting the profitability of the e-book vendor. however, gemstar recently made notice of discontinuing their e-book reading devices, one would assume due to a lack of profitability. this can be seen as a cautionary tale for libraries, which often define success by number of volumes available and accessed rather than units sold. committing to a technology that concurrently requires consumer success can be problematic. bibliophile and technologist alike must take responsibility for the future of our collective information resources. the bibliophile must ensure that all aspects of dispelling five myths about e-books | gall 29 30 information technology and libraries | march 2005 human knowledge and creativity are nurtured and allowed to survive in electronic forms. the technologist must ensure that accessibility and intellectual-property rights are addressed with every technological innovation. parry provides three concrete suggestions for public libraries in response to new media demands: continue to acknowledge and respond to customer demands, revisit the library’s mission statement for currency, and promote or accelerate shared agreements with other institutions to alleviate the high costs of accumulating resources.43 the proper frame of mind for these activities is suggested by levy: we make a mistake, i believe, when we fixate on particular forms and technologies, taking them in and of themselves, to be the carriers of what we want to embrace or resist. . . . it isn’t a question, it needn’t be a question, of books or the web, of letters or e-mail, of digital libraries or the bricks-and-mortar variety, of paper or digital technologies. . . . these modes of operation are only in conflict when we insist that one or the other is the only way to operate.44 in the early 1930s, lomax dragged his primitive audio-recording equipment over the roads of the american south to capture the performances of numerous folk musicians.45 at the time, he certainly didn’t imagine that at one point in history someone with a laptop computer sitting in a coffee shop with wireless access could download the performances of robert johnson from itunes. however, without his efforts, those unique voices in our history would have been lost. it is hoped that the readers of the future will be thanking the library professionals of today for preserving our print collections and enabling their access digitally via our primitive, but evolving, e-book technologies. references 1. open e-book forum, “press release: record e-book retail sales set in q1 2004,” june 4, 2004. accessed dec. 27, 2004, www.openebook.org. 2. stephen abram, “e-books: rumors of our death are greatly exaggerated,” information outlook 8, no. 2 (2004): 14–16. 3. walt crawford, “the white queen strikes again: an e-book update,” econtent 25, no. 11 (2002): 46–47. 4. harold henke, “consumer survey on e-books.” accessed dec. 27, 2004, www.openebook.org. 5. sue hutley, “follow the e-book road: e-books in australian public libraries,” aplis 15, no. 1 (2002): 32–37; andrew k. pace, “e-books: round two,” american libraries 35, no. 8 (2004): 74–75; michael rogers, “librarians, publishers, and vendors revisit e-books,” library journal 129, no. 7 (2004): 23–24. 6. stephen sottong, “e-book technology: waiting for the ‘false pretender,’” information technology and libraries 20, no. 2 (2001): 72–80. 7. vannevar bush, “as we may think,” atlantic monthly 176, no. 1 (1945): 101–108. 8. michael s. hart, “history and philosophy of project gutenberg.” accessed dec. 27, 2004, www.gutenberg.net/ about.shtml. 9. henry petroski, the book on the bookshelf (new york: vintage, 2000). 10. steve ditlea, “the real e-books,” technology review 103, no. 4 (2000): 70–73. 11. tim berners-lee, weaving the web: the original design and ultimate destiny of the world wide web by its inventor (new york: harpercollins, 1999). 12. james e. gall and annmari m. duffy, “e-books in a college course: a case study” (presented at the association for educational communications and technology conference, atlanta, ga., nov. 8–10, 2001). 13. george p. landow, hypertext 2.0: the convergence of contemporary critical theory and technology (baltimore, md.: johns hopkins univ. pr., 1997). 14. harold henke, “survey on electronic book features.” accessed dec. 27, 2004, www.openebook.org. 15. open e-book forum, “press release: record e-book retail sales set in q1 2004.” 16. lori enos, “report: e-book industry set to explode,” e-commerce times, 20 dec. 2000. accessed dec. 27, 2004, www. ecommercetimes.com/story/6215.html. 17. luis a. ubinas, “the answer to video piracy,” mckinsey quarterly no. 1. accessed accessed dec. 27, 2004, www .mckinseyquarterly.com. 18. mark hoorebeek, “e-books, libraries, and peer-topeer file-sharing,” australian library journal 52, no. 2 (2003): 163–68. 19. theodor h. nelson, “managing immense storage,” byte 13, no. 1 (1988): 225–38. 20. ibid., 238. 21. jacob weisberg, “the way we live now: the good ebook,” new york times, 4 june 2000. accessed dec. 27, 2004, www.nytimes.com. 22. donald t. hawkins, “electronic books: a major publishing revolution. part 1: general considerations and issues,” online 24, no. 4 (2000): 14–28. 23. steve grant, “e-books: friend or foe?” book report 21, no. 1 (2002): 50–54. 24. lori bell, “e-books go to college,” library journal 127, no. 8 (2002): 44–46. 25. abigail j. sellen and richard h. harper, the myth of the paperless office (cambridge, mass.: mit pr., 2002). 26. stephen moss, “pulped fiction,” sydney morning herald, 29 mar. 2002. accessed dec. 27, 2004, www.smh.com.au. 27. bbc news, “m6 toll built with pulped fiction,” bbc news uk edition, 18 dec. 2003. accessed dec. 27, 2004, http:// news.bbc.co.uk. 28. nelson, “managing immense storage.” 29. michael a. looney and mark sheehan, “digitizing education: a primer on e-books,” educause 36, no. 4 (2001): 38–46. 30. brian kenney, “netlibrary, ebsco explore new models for e-books,” library journal 128, no. 7 (2003). 31. stephen h. wildstrom, “a library to end all libraries,” business week (july 23, 2001): 23. online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers 139 technical communications announcements panel discussion on «government publications in machine-readable form" this meeting will be held on july 10 from 8:30 to 10:30 p.m. as a part of the american library association's 1974 new york conference. the meeting is cosponsored by the government documents round table's (godort) machinereadable data file committee, the federal librarians round table (flirt), the rasd information retrieval committee, and the rasd/rtsd/ asla public documents committee. the moderator is gretchen dewitt of columbus public library and the panelists are peter watson of ucla, mary pensyl of mit, judith rowe of princeton, and billie salter of yale. mr. watson will discuss the general issues concerning the acquisition and use of bibliographic data files and provide a brief description of some of the files now publicly available; miss pensyl will describe the workings of the project now underway to make these files available to mit users. mrs. rowe will discuss the ways in which government-produced statistical files supplement the related printed reports and will indicate some of the types and sources of files now being released; miss salter will discuss a program for integrating these and other research files into yale's social science reference service. representatives of several federal agencies will display materials describing and documenting both bibliographic and statistical data files. the purpose of the program is to acquaint reference librarians, particularly those now handling printed documents, with the uses of both types of files, the advantages and disadvantages of these reference tools, and the techniques and policy changes necessary for their use in a library environment. the recent release of the draft proposal produced by the national commission on libraries and information services makes more timely than ever an open discussion of the place of bibliographic and numeric data files in a reference collection. all librarians must be acquainted with these growing resources in order to continue to provide full service to their patrons. for further information, contact judith rowe, computer center, princeton university, 87 prospect ave., princeton, nj 08540. ninth annual educational media and technology conference to be hosted by university of wisconsin-stout, july 22-24, 1974 aetc past president dr. jerry kemp, coordinator of instructional development services for san jose state university (california), and film consultant ralph j. amelio, media coordinator and english instructor at willowbrook high school, villa park, illinois, will headline the university of wisconsin-stout's 9th annual educational media and technology conference to be held in menomonie, wisconsin, on july 22-24, 1974. "educational technology: can we realize its potential?" will be the subject of kemp's presentation on monday evening, while amelio, speaking on tuesday, july 23, will challenge participants with the subject "visual literacy: what can you do?". seven concurrent workshops will be held on monday afternoon: library automation; sound for visuals; making the timesharing computer work for you; new developments in photography; what's 140 journal of libmry automation vol. 7/2 june 1974 new in graphics; selecting and evaluating educational media; and instructional development: how to make it work! individuals leading the three-hour workshops will include: alfred baker, vicepresident of science press; john lord, technical service manager for the dukane corporation; william daehling, weber state college, ogden, utah; and several media specialists from learning resources, university of wisconsin-stout. about fifty exhibitors will show and demonstrate both hardware and software during the conference. six case studies will be given of exemplary media programs at the public school, vocational-technical, and college level. further information may be obtained by contacting dr. david p. bernard, dean of learning resources, university of wisconsin-stout, menomonie, wi 54751. report of recon project published the library of congress has published in recon pilot project (vii, 49p.) the final report of a project sponsored by lc, the council on library resources, inc., and the u.s. office of education to determine the problems associated with centralized conversion of retrospective catalog records and distribution of these records from a central source. in the marc pilot project, begun in november 1966, the library of congress distributed machine-readable catalog records for english-language monographs, and the success of that project led to the implementation in march 1969 of the marc distribution service, in which over fifty subscribers have by now received more than 300,000 marc records representing the current english-language monograph cataloging at the library of congress. as coverage is extended to catalog records for foreign-language monographs and for other forms of material, libraries will be able to obtain machine records for a large number of their current titles. more research was needed, however, on the problems of obtaining machinereadable data for retrospective cataloging, and the council on library resources made it possible for lc to engage in november 1968 a task force to study the feasibility of converting retrospective catalog records. the final report of the recon (for retrospective conversion) working task force was published in june 1969. one of the report's recommendations was that a pilot project test various conversion techniques, ideally covering the highest priority materials, english-language monograph records from 1960-68; and with funds from the sponsoring agencies lc initiated a two-year project in august 1969. the present report covers five major areas examined in that period: 1. testing of techniques postulated in the recon report in an operational environment by converting englishlanguage monographs cataloged in 1968 and 1969 but not included in the marc distribution service. 2. development of format recognition, a computer program which can process unedited catalog records and supply all the necessary content designators required for the full marc record. 3. analysis of techniques for the conversion of older english-language materials and titles in foreign languages using the roman alphabet. 4. monitoring the state-of-the-art of input devices that would facilitate conversion of a large data base. 5. a study of microfilming techniques and their associated costs. recon pilot project is available for $1.50 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000061. library of congress issues recon working task fo1'ce report national aspects of creating and using marc/recon records (v, 48p.) reports on studies conducted at the library of congress by the recon working task force under the chairmanship of henriette d. avram. they were made concurrently with a pilot project by the library to test the feasibility of the plan outlined in the task force's first report entitled conversion of retrospective reco1·.ds to machine-readable form (library of congress, 1969) and in recon pilot p1'oject (library of congress, 1972). both the pilot project and the new studies received financial support from the council on library resources, inc., and the u.s. office of education. the present volume describes four investigations: ( 1) the feasibility of determining a level or subset of the established marc content designators (tags, indicators, and subfield codes) that would still allow a library using it to be part of a future national network; ( 2) the practicality of the library of congress using other machine-readable data bases to build a national bibliographic store; ( 3) implications of a national union catalog in machine-readable form; and ( 4) alternative strategies for undertaking a largescale conversion project. the appendices include an explanation of the problems of achieving a cooperatively produced bibliographic data base, a description of the characteristics of the present national union catalog, and an analysis of library of congress card orders for one year. although the findings and recommendations of this report are less optimistic than those of the original recon study, they reaffirm the need for coordinated activity in the conversion of retrospective catalog records and suggest ways in which a large-scale project might be undmtaken. the report provides a basis for realistic planning in a critical area of library automation. national aspects of creating and using marc!recon records is available for $2.75 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000062. isad official activities tesla info1'mation editor's note: use of the following guidelines and forms is described in the article by john kountz in this issue of technical communications 141 jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for reader use, and the tesla standards scoreboard will be presented as cumulate.d 1'esults warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jola-tc, fill out these copies, and mail them to the tesla chai1'man, m1'. john c. kountz, associate fo1' libmry automation, office of the chancello1', the califomia state university and colleges, 5670 wilshim blvd., suite 900, los angeles, ca 90036. initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} 142 ]oumal of librm·y automation vol. 7/2 june 1974 tesla standards scoreboard receipt screen division rej/acpl publish tally representative title/i.d. number date date date date date date date target specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 83~" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard) . vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard technical communications 143 proposal, cite them here (expository remarks) . vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). research and development system development corporation awarded national science foundation grant to study interactive searching of large literature data bases santa monica, california-the national science foundation has awarded system development corporation $98,500 for a study of man-machine system communication in on-line reh·ieval systems. the study will focus on interactive searching of very large literature data bases, which has become a major area of interest and activity in the field of information science. at least seven major systems of national or international scope are in operation within the federal government and private industry, and more systems are on the drawing boards or in experimental operation. the principal investigator for the project will be dr. carlos cuadra, manager of sdc's education and library systems department. the project manager, who will be responsible for the day-to-day operation of the fifteen-month effort, is judy wanger, an information systems analyst and project leader with extensive experience in the establishment and use of interactive bibliographic retrieval services. ms. wanger is currently responsible for user training and customer support on sdc's on-line information service. the study will use questionnaire and interview techniques to collect data re144 journal of libml'y automation vol. 7/2 june 1974 lated to: (1) the impact of on-line retrieval usage on the terminal user; (2) the impact of on-line service on the sponsoring institution; and ( 3) the impact of online service on the information-utilization habits of the information consumer. attention will also be given to reliability problems in the transmission chain from the user to the computer and back. the major elements in this chain include: the user; the terminal; the telephone instrument; local telephone lines and switchboards; long-haul communications; the communications-computer interface hardware; the computer itself; and various programs in the computer, including the retrieval program. reports on regional projects and activities california state university and colleges system union list system the library systems project of the california state university and colleges has recently completed a production union list system. this system, comprised of eight processing programs to be run in a very modest environment (currently a cdc 3300), is written in ansi cobol and is fully documented. included in the documentation package are user worksheets for bibliographic and holding data, copies of all reports, file layouts, program descriptions, etc. output from this system are files designed to drive graphic quality photocomposition or com devices. the system is available for the price of duplicating the documentation package. and, for those so desiring, the master file containing some 25,000 titles and titles with references is also available for the cost of duplication. interested parties (bona fides only, please) should contact john c. kountz, associate for library automation, california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036, for further details. solinet membe1·ship meeting the annual membership meeting of the southeastern library network (solinet) was held at the georgia institute of technology in atlanta, march 14. it was announced that charles h. stevens, executive director of the national commission on libraries and information science, has been named director of solinet effective july 1. john h. gribbin, chairman of the board, will serve as interim director. it was also announced that solinet will be affiliated with the southern regional education board. sreb will provide office space, act as financial disbursing agent, and will be available at all times in an advisory capacity. negotiations are underway for a tie-in to the ohio college library center ( oclc) and a proposed contract is in the hands of the oclc legal counsel. it is anticipated that a contract soon will be signed. additional to the tie-in, solinet will proceed with the development of its own permanent computer center in atlanta. this center will eventually provide a variety of services and will be coordinated carefully with other developing networks, looking toward a national library network system. elected to fill three vacancies on the board of directors were james f. govan (university of north carolina), gustave a. harrar (university of florida), and robert h. simmons (west georgia college). they will assume office on july 1. anyone desiring information about solinet should write to 130 sixth st., nw, atlanta, ga 30313. reports-library projects and activities new book catalog for junior college district of st. louis the three community college libraries of the junior college district of st. louis have been using computerized union book catalogs since 1964. formerly maintained and produced by an outside contractor, the catalogs are now one product of a new catalog system recently designed and implemented by instructional resources and data processing staff of the district. known as "ir catalog," the system presently has a data base of approximately 65,000 records describing the print and nonprint collections of the district's three college instructional resource centers. in addition to photocomposed author, subject, and title indexes, the system also produces weekly cumulative printouts which supplement the phototypeset ''base" catalog. other output includes three-by-five-inch shelflist cards (which include union holdings information), a motion picture film catalog, subject and cross reference authority lists, and various statistical reports. hawaii state lihra1'y system to automate p1'ocessing the state board of education in hawaii has approved a proposal for a computerized data processing system for the hawaii state library. the decision allows for the purchase of computer equipment for automating library operations. the state library centrally processes library materials for all public and school libraries in the state. teichior hirata, acting state education superintendent, told board members a computerized system will speed book selection, ordering, and processing, and will improve interlibrary loan and reference services. he also pointed out it would facilitate a general streamlining of all technical administrative operations. the system's total cost will be $187,000, of which $58,000 will be spent for computer software. the "biblios" system, designed and developed at orange county public library in california and marketed by information design, inc., was selected as the software package. the caltech science lihm1'y catalog supplement the use of catalog supplements during the necessary maturation period required to take full advantage of the national program for acquisitions and cataloging is technical communications 145 obviously an idea whose time has come. the program developed at the california institute of technology, however, differs in several important respects from that previously described by nixon and bell at u.c.l.a. 1 for reasons based primarily on faculty pressure, the practice of holding books in anticipation of the cataloging copy has never been a practice at the institute. the solution, while hardly unique, is to assign the classification number (dewey) and depend on a temporary main entry card to suffice until the lc copy is available. while this procedure has the distinct advantage of not requiring the presence of the book to complete the cataloging process, it does, however, prevent the user from finding the newest books through a search of the subject added entry cards. the use of the computer-based systems is an obvious solution to this aspect of the program but raises several additional problems which formerly seemed to defy solutions. as has been pointed out by mason, library-based computer systems can rarely be justified in terms of cost effectiveness, and computer-based library catalogs are no exception.2 part of this problem arises from the natural inclination to repeat in machine language what has been standard practice in the library catalog. this reaction overlooks the very different nature of catalogs and catalog supplements. as catalogs serve as the basis for the permanent record and their cost can be prorated over several decades the need for a careful description of the many facets of a book is quite properly justified. in the case of catalog supplements, however, where the record will serve quite likely for only a few months, any attempt at detailed description of the book cannot be justified. one solution to this dilemma that has been developed here at caltech is a brief listing supplement which allows searching for a given book by either the first author or editor's last name, a key word from the title, or the first word of a series entry. these elements form the basis of a simple kwoc index (see figure 1) which sup146 journal of library automation vol. 7/2 june 1974 chemisorption chemisorption and catalysis hepple 541.395 he 1970 ch chester 19 techniques in partial differential equations chester 517.6 ch 1971 ch 199 ciba protein turnover 612.39 pr 1972 bl ( ciba foundation symposium, 9) fig. 1. sample entries from the kwoc index 108 19 t chemisorption & catalysis hepple 541.395 he 1970 ch a hepple chemisorption catalysis 108 t protein turnover 612.39 pr 1972 bi (ciba foundation symposium, 9) a protein ciba 199 t techniques in partial differential equations chester 517.6 ch 1971 ch a differential chester fig. 2. sample ent1·ies from the bibliographic listing new books chemistry /biology august 6, 1973 catalysis, chemisorption and . hepple 541.395 he 1970 ch differential equations, techniques in partial . . . chester 517.6 ch 1971 ch protein turnover ciba foundation symposium, 9 612.39 pr 1972 bi fig. 3. sample entries from the weekly list of newly added books plements the bibliographic listing (shown in figure 2) . all books received in the chemistry, physics, and biology libraries are represented in the catalog supplement. weekly lists of newly added books (shown in figure 3) are annotated to show the index terms prior to keypunching. the unit record consists of a "title" card or cards (which contain the full title, author/ editor, call number, library designation, and series information) and an "author" card (which contains the index terms) . edited material is added accessionally to the card file data base and batch processed on the campus ibm 370/ 155 computer. the catalog supplement is currently published on 8jf-by-1hnch sheets as a result of reducing the computer printout on a xerox 7000 copier. lists are given a vello-bind and delivered to therespective libraries. weeding the catalog supplement is still unresolved. at the present time additions are less than 1,000 per year, so that it may be possible after five years to replace the subject sections of the respective divisional catalogs with the catalog supplement. the "library" at caltech consists of several divisional libraries, each with their own card catalog. these divisional card catalogs are supplemented by a union catalog, which serves all libraries on campus and, because of the strong interdisciplinary nature of the divisional libraries, is much the better source for subject searches. the project is so facile and the costs so minimal that this approach might be of value to many small libraries. it is particularly applicable to the problems recently discussed by patterson. 3 books in series, even if they are distinct monographs, are often lost to the user from a subject approach. with this system each physical volume added to the library can be analyzed for possible inclusion in the catalog supplement. 1. robertanixon and ray bell, "the u.c.l.a. library catalog supplement," library resources & technical services 17:59 (winter 1973). 2. ellsworth mason, "along the academic way," library journal 96:1671 (1971). 3. kelly patterson, "library think vs libra1y user," rq 12:364 (summer 1973). danal. roth millikan librm·y c alifomia institute of technology commercial activities richard abel & company to sponsor workshops in library automation and management one of the most effective forms of continuing education is state-of-the-art reporting. recognizing the need for more such communication 1 the international library service firm of richard abel & company plans to sponsor two workshops for the library and information science community. the first workshop will deal with the latest techniques in library automation. it will precede the 197 4 american library association conference in new york city, july 7-13. the second will present advances in library management, and will be scheduled to precede the 1975 ala midwinter meeting, january 19-25. the workshops will include forums, lectures, and open discussions. they will be presented by recognized leaders in the fields of library automation, management, and consulting. each workshop will probably be one or two days long. there will be no charge to attend either of the workshops, but attendance will be limited, to provide a good discussion atmosphere. for the management workshop, attendance will be limited to librarians active in library management. similarly, the automation workshop is intended for librarians working in library automation. maintaining the theme of state-of-theart reporting, the basic content of the workshops will consist of what is happening in library management and automation today. looking to the future, there will also be discussions and forecasts of what is to come. persons interested in further informatechnical communications 147 tion or in pa1ticipating in either workshop should contact abel workshop director, richard abel & company, inc., p.o. box 4245, portland, or 97208. idc introduces bibnet on-line services the introduction of bibnet on-line systems, a centralized computer-based bibliographic data service for libraries, has been announced by information dynamics corporation. demonstrations are planned for the ala annual conference in new york, july 7-13. according to david p. waite, idc president, "during 1973, bibnet service modules were interconnected over thousands of miles and tested for on-line use with idc's centralized computerbased cataloging data files. this is the culmination of a program that began two years ago. it is patterned after advanced technological developments similar to those recently applied to airline reservation systems and other large scale nationwide computing networks used in industry." idc, a new england-based library systems supplier, will provide a computerstored cataloging data base of more than 1.2 million library of congress and contributed entries. initially it will consist of all library of congress marc records (now numbering over 430,000 titles), plus another 800,000 partial lc catalog records containing full titles, main entries, lc card numbers, and other selected data elements. as a result, bibnet will provide on-line bibliographic searching for all 1,250,000 catalog records produced by the library of congress since 1969. to enable users to produce library cards from those non-marc records for which only partial entries are kept in the computer, idc will mail card sets from its headquarters and add the full records to the data base for future reference. subscribing libraries will have access to the data base using a minicomputer cathode ray tube (crt) terminal. using this technique of dispersed computing each bibnet terminal has programmable computer power built-in. this in-house 148 journal of library automation vol. 7/2 june 1974 processing power, independent of the central computer, allows computer processes like library card production to be performed in the library. this also eliminates waiting for catalog cards to arrive in the mail. bibnet terminals communicate with the central computer over regular telephone lines, eliminating the high costs of dedicated communication lines. therefore, thousands of libraries throughout the united states and canada can avail themselves of on-line services at low cost. bibnet users will have several methods of extracting information from the idc data base. the computer can search for individual records by titles, main entry, isbn number, or keywords. here's how it works: the operator types in any one of the search items or if a complete title is not known, a keyword from the title may be used. the cataloging information is then displayed on the crt where the operator may verify the record. at the push of a button, the data is stored on a magnetic cassette tape which is later used for editing and production of catalog cards by the user library. the bibnet demonstration in new york will highlight one of many bibliographic service modules available from idc and stress the fact that these services can be utilized by individual libraries and organized groups of libraries. license for new information retrieval concept awarded to boeing by xynetics an exclusive license for manufacture and marketing to the government sector of systems incorporating a completely new concept in information storage and retrieval has been awarded to the boeing company, seattle, washington, by xynetics, inc., canoga park, california, it was announced jointly by dr. r. v. hanks, boeing program manager, and burton cohn, xynetics board chairman. the system is said to be the first image storage and retrieval system which offers response times and costs comparable to those of digital systems. the heart of the system is a device of proprietary design, the flat plane memory, which provides mpid access to massive amounts of data stored in high resolution photographic media. the photographic medium enables low cost storage of virtually any type of source material (documents, correspondence, drawings, multitone images, computer output, etc.) while eliminating the need for time-consuming, costly conversion of pre-existing information into a specialized (e.g., digital) format. by virtue of its extremely rapid random access capability, the data needs of as many as several thousand users can be served at remote video terminals from a single memory with near real time response ( 1-3 seconds, typically). the high speed, high accuracy, and high reliability of the flat plane memory is accomplished primarily through the use of the patented xynetics positioner, which generates direct linear motion at high speeds and with great precision and reliability instead of converting rotary motion. as a result, the positioners eliminate the gears, lead screws, and other mechanical devices previously utilized, and thus achieve the requisite speed, accuracy, and reliability. the xynetics positioners are already being used in automated drafting systems produced by the firm, and in a wide variety of other applications, including the apparel industry and integrated circuit test systems. the new approach could eliminate many of the problems associated with multiple reproductions and distribution of large data files. in addition to many government applications, the system is expected to have major applications in the commercial marketplace. appointments charles h. stevens appointed solinet director charles h. stevens, executive director, national commission on libraries and information science, has been appointed director of the southeastern library net~ work (solinet), effective july 1. the announcement was made at a meeting of solinet in atlanta, march 14, by john h. gribbin, board chairman. composed of ninety-nine institutional members, solinet is headquartered in atlanta. a librarian of acknowledged national stature and an expert on the technical aspects of information retrieval systems, mr. stevens brings to solinet a valuable combination of experience and abilities. concerned with national problems of libraries and information services, he will develop a regional network and move toward a cohesive national program to meet the evolving needs of u.s. libraries. a forerunner in library automation, mr. stevens served for six years as associate director for library development, project intrex, at massachusetts institute of technology. from 1959-1965 he was director of library and publications at mit's lincoln laboratory, lexington, massachusetts. at purdue university, he was aeronautical engineering librarian and later director of documentation of the thermophysical properties research center. mr. stevens is a member of the council of the american library association, the american society for information science, the special libraries association, and other professional organizations. he is the author of approximately forty papers in the field, lectures widely, and consults on library activities for a number of universities. mr. stevens holds a b.a. in english fro:in principia college, elsah, illinois, and master's degrees in english and in library science from the university of north carolina. mr. stevens has done further study in engineering at brooklyn polytechnic institute. mr. stevens is married and has three sons. input to the editor: international scuttlebutt informs us that those in the bibliothecal stratosphere are technical communications 149 attempting to formulate a communications format for bibliographical records acceptable on a worldwide basis. we on the local scene unite in wishing them "huzzah!" and "godspeed!" nomenclature must be provided, of course, to designate particular applications; and the following suggestions are offered as possible subspecies of the genus supermarc: deutschmarc-for records distributed from bonn and/ or wiesbaden rheemarc-for south korean records, named in honor of the late president of that country bismarc-for records of stage productions which have been produced by popular demand from the top balcony; especially pertinent for wagnerian operas benchmarc-for records of generally unsuccessful football plays minskmarc-for byelorussian records sachermarc-for austrian records, usually representing extremely tasteful concoctions trademarc-for records pertaining to manufactured products, especially patent medicines goldmarc-for records representing hungarian musical compositions ( v. karl goldmark, 1830-1915) ectomarc } endomarc mesomarc (from -for skinny, fat, and the italian, mezmedium-sized reczomarc) ords, respectively landmarc-for records of historic edifices; sometimes ( enoneously) applied to records for local geographical regions feuermarc-for records representing charred or burned documents montmarc-1. for records representing works by or about parisian artists; 2. for records representing publications of the french academy watermarc-for records representing documents contained in bottles washed up on the beach. joseph a. rosenthal university of california, berkeley june_ita_pekala_final privacy and user experience in 21st century library discovery shayna pekala information technology and libraries | june 2017 48 abstract over the last decade, libraries have taken advantage of emerging technologies to provide new discovery tools to help users find information and resources more efficiently. in the wake of this technological shift in discovery, privacy has become an increasingly prominent and complex issue for libraries. the nature of the web, over which users interact with discovery tools, has substantially diminished the library’s ability to control patron privacy. the emergence of a data economy has led to a new wave of online tracking and surveillance, in which multiple third parties collect and share user data during the discovery process, making it much more difficult, if not impossible, for libraries to protect patron privacy. in addition, users are increasingly starting their searches with web search engines, diminishing the library’s control over privacy even further. while libraries have a legal and ethical responsibility to protect patron privacy, they are simultaneously challenged to meet evolving user needs for discovery. in a world where “search” is synonymous with google, users increasingly expect their library discovery experience to mimic their experience using web search engines.1 however, web search engines rely on a drastically different set of privacy standards, as they strive to create tailored, personalized search results based on user data. libraries are seemingly forced to make a choice between delivering the discovery experience users expect and protecting user privacy. this paper explores the competing interests of privacy and user experience, and proposes possible strategies to address them in the future design of library discovery tools. introduction on march 23, 2017, the internet erupted with outrage in response to the results of a senate vote to roll back federal communications commission (fcc) rules prohibiting internet service providers (isps), such as comcast, verizon, and at&t, from selling customer web browsing histories and other usage data without customer permission. less than a week after the senate vote, the house followed suit and similarly voted in favor of rolling back the fcc rules, which were set to go into effect at the end of 2017.2 the repeal became official on april 3, 2017 when the president signed it into law.3 this decision by u.s. lawmakers serves as a reminder that today’s internet economy is a data economy, where personal data flows freely on the web, ready to be compiled and sold to the highest bidder. continuous online tracking and surveillance has become the new normal. shayna pekala (shayna.pekala@georgetown.edu) is discovery services librarian, georgetown university library, washington, dc. privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 49 isps are just one of the many players in the online tracking game. major web search engines, such as google, bing, and yahoo, also collect information about users’ search histories, among other personal information.4 by selling this data to advertisers, data brokers, and/or government agencies, these search engine companies are able to make a profit while providing the search engines themselves for “free.” in addition to profiting off of user data, web search engines also use it to enhance the user experience of their products. collecting and analyzing user data enables systems to learn user preferences, providing personalized search results that make it easier to navigate the ever-increasing sea of online information. the collection and sharing of user data that occurs on the open web is deeply troubling for libraries, whose professional ethics embody the values of privacy and intellectual freedom. a user’s search history contains information about a user’s thought process, and the monitoring of these thoughts inhibits intellectual inquiry.5 libraries, however, would be remiss to dismiss the success of web search engines and their use of data altogether. mit’s preliminary report on the future of libraries urges, “while the notion of ‘tracking’ any individual’s consumption patterns for research and educational materials is anathema to the core values of libraries...the opportunity to leverage emerging technologies and new methodologies for discovery should not be discounted.”6 this article examines the current landscape of library discovery, the competing interests of privacy and user experience at play, and proposes possible strategies to address them in the future design of library discovery tools. background library discovery in the digital age the advent of new technologies has drastically shaped the way libraries support information discovery. while users once relied on shelf-browsing and card catalogs to find library resources, libraries now provide access to a suite of online tools and interfaces that facilitate cross-collection searching and access to a wide range of materials. in an online environment, many paths to discovery are possible, with the open web playing a newfound and significant role. today’s library discovery tools fall into three categories: online catalogs (the patron interface of the integrated library system (ils)), discovery layers (a patron interface with enhanced functionality that is separate from an ils), and web-scale discovery tools (an enhanced patron interface that relies on a central index to bring together resources from the library catalog, subscription databases, and digital repositories).7 these tools are commonly integrated with a variety of external systems, including proxy servers, inter-library loan, subscription databases, individual publisher websites, and more. for the most part, libraries purchase discovery tools from third-party vendors. while some libraries use open source discovery layers, such as blacklight or vufind, there are currently no open source options for web-scale discovery tools.8 information technology and libraries | june 2017 50 outside of the library, web search engines (e.g. google, bing, and yahoo), and targeted academic discovery products (e.g. google scholar, researchgate, and academia.edu) provide additional systems that enable discovery.9 in fact, web search engines, particularly google, play a significant role in the research process. both students and faculty use google in conjunction with library discovery tools. students typically use google at the beginning of the research process to get a better understanding of their topic and identify secondary search terms. faculty, on the other hand, use google to find out how other scholars are thinking about a topic.10 unsurprisingly, google and google scholar provide the majority of content access to major content platforms.11 the data economy and online privacy concerns in an information discovery environment that is primarily online, new threats to patron privacy emerge. in today’s economy, user data has become a global commodity. commercial businesses have recognized the value of data mining for marketing purposes. bjorn bloching, et. al. explain, “from cleverly aggregated data points, you can draw multiple conclusions that go right to the heart and mind of the customer.”12 along the same lines, the ability to collect and analyze user data is extremely valuable to government agencies for surveillance purposes, creating an additional data-driven market.13 the increasing value of user data has drastically expanded the business of online tracking. in her book, dragnet nation, journalist julia angwin outlines a detailed taxonomy of trackers, including various types of government, commercial, and individual trackers.14 in the online information discovery process, multiple parties collect user data at different points. consider the following scenario: a user executes a basic keyword search in google to access an openly available online resource. in the fifteen seconds it takes the user to get to that resource, information about the user’s search is collected by the internet service provider (isp), the web browser, the search engine, the website hosting the resource, and any third-party trackers embedded in the website. the search query, along with the user’s internet protocol (ip) address, become part of the data collector’s profile on the user. in the future, the data collector can sell the user’s profile to a data broker, where it will be merged with profiles from other data collectors to create an even more detailed portrait of the user.15 the data broker, in turn, can sell the complete dataset to the government, law enforcement, commercial businesses, and even criminals. this creates serious privacy concerns, particularly since users have no legal right over how their data is bought and sold.16 privacy protection in libraries libraries have deeply-rooted values in privacy and strong motivations to protect it. intellectual freedom, the foundation on which libraries are built, necessarily requires privacy. in its interpretation of the library bill of rights, the american library association (ala) explains, “in a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.”17 many studies support this idea, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 51 having found that people who are indiscriminately and secretly monitored censor their behavior and speech.18 libraries have both legal and ethical obligations to protect patron privacy. while there is no federal legislation that protects privacy in libraries, forty-eight states have regulations regarding the confidentiality of library records, though the extent of these protections varies by state.19 because these statutes were drafted before the widespread use of the internet, they are phrased in a way that addresses circulation records and does not specifically include or exclude internet use records (records with information on sites accessed by patrons) from these protections. therefore, according to theresa chmara, libraries should not treat internet use records any differently than circulation records with respect to confidentiality.20 the library community has established many guiding documents that embody its ethical commitment to protecting patron privacy. the ala code of ethics states in its third principle, “we protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”21 the international federation of library associations and institutions (ifla) code of ethics has more specific language about data sharing, stating, “the relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”22 the library community has also established practical guidelines for dealing with privacy issues in libraries, particularly those issues relating to digital privacy, including the ala privacy guidelines23 and the national information standards organization (niso) consensus principles on user’s digital privacy in library, publisher, and software-provider systems.24 additionally, the library freedom project was launched in 2015 as an educational resource to teach librarians about privacy threats, rights, and tools, and in 2017, the library and information technology association (lita) released a set of seven privacy checklists25 to help libraries implement the ala privacy guidelines. personalization of online systems while user data can be used for tracking and surveillance, it can also be used to improve the digital user experience of online systems through personalization. because the growth of the internet has made it increasingly difficult to navigate the continually growing sea of information online, researchers have put significant effort into designing interfaces, interaction methods, and systems that deliver adaptive and personalized experiences.26 angsar koene, et. al. explain, “the basic concept behind personalization of on-line information services is to shield users from the risk of information overload, by pre-filtering search results based on a model of the user’s preferences… a perfect user model would…enable the service provider to perfectly predict the decision a user would make for any given choice.”27 the authors continue to describe three main flavors of personalization systems: 1. content-based systems, in which the system recommends items based on their similarity to items that the user expressed interest in; information technology and libraries | june 2017 52 2. collaborative-filtering systems, in which users are given recommendations for items that other users with similar tastes liked in the past; and 3. community-based systems, in which the system recommends items based on the preferences of the user’s friends.28 many popular consumer services, such as amazon.com, youtube, netflix, google, etc., have increased (and continue to increase) the level of personalization that they offer.29 one such service in the area of academic resource discovery is google scholar’s updates, which analyzes a user’s publication history in order to predict new publications of interest.30 libraries, in contrast, have not pressed their developers and vendors to personalize their services in favor of privacy, even though studies have shown that users expect library tools to mimic their experience using web search engines.31 some web-scale discovery services do, however, allow researchers to set personalization preferences, such as their field of study, and, according to roger schonfeld, it is likely that many researchers would benefit tremendously from increased personalization in discovery.32 in this vein, the american philosophical society library recently launched a new recommendation tool for archives and manuscripts that uses circulation data and user-supplied interests to drive recommendations.33 opportunities for user experience in library discovery a major challenge in today’s online discovery environment is that the user is inhibited by an overwhelming number of results. this leads to users rely on relevance rankings and to fail to examine search results in depth. creating fine-tuned relevance ranking algorithms based on user behavior is one remedy to this problem, but it relies on the use of personal user data.34 however, there may be opportunities to facilitate data-driven discovery while maintaining the user’s anonymity that would be suitable for library (and other) discovery tools. irina trapido proposes that relevance ranking algorithms could be designed to leverage the popularity of a resource measured by its circulation statistics or by ranking popular or introductory materials higher than more specialized ones to help users make sense of large results sets.35 michael schofield proposes “context-driven design” as an intermediary solution, whereby the user opts in to have the system infer context from neutral device or browser information, such as the time of day, business hours, weather, events, holidays, etc.36 jason clark describes a search prototype he built that applies these principles, but he questions whether these types of enhancements actually add value to users.37 rachel vacek cautions that personalization is not guaranteed to be useful or meaningful, and continuous user testing is key.38 discussion there are several aspects to consider for the design of future library discovery tools. the integrated, complex nature of the web causes privacy to become compromised during the information discovery process. library discovery tools have been designed not to retain borrowing records, but have not yet evolved to mask user behavior, which is invaluable in today’s data economy. it is imperative that all types of library discovery tools have built-in functionality to privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 53 protect patron privacy beyond borrowing records, while also enabling the ethical use of patron data to improve user experience. even if library discovery tools were to evolve so that they themselves were absolutely private (where no data were ever collected or shared), other online parties (isps, web browsers, advertisers, data brokers, etc.) would still have access to user data through other means, such as cookies and fingerprinting. the operating reality is such that privacy is not immediately and completely controllable by libraries. laurie rinehart-thompson explains, “in the big picture, privacy is at the mercy of ethical and stewardship choices on the part of all information handlers.”39 while libraries alone cannot guarantee complete privacy for their patrons, they can and should mitigate privacy risks to the greatest extent possible. at the same time, ignoring altogether the benefits of using patron data to improve the discovery user experience may threaten the library’s viability in the age of google. roger schonfeld explains, “if systems exclude all personal data and use-related data, the resulting services will be onedimensional and sterile. i consider it essential for libraries to deliver dynamic and personalized services to remain viable in today's environment; expectations are set by sophisticated social networks and commercial destinations.”40 libraries must find ways to keep up with greater industry trends while adhering to professional ethics. recommendations while libraries have traditionally shied away from collecting data about patron transactions, these conservative tendencies run counter to the library’s mission to provide outstanding user experience and the need to evolve in a rapidly changing information industry. as the profession adopts new technologies, ethical dilemmas present themselves that are tied into their use. while several library organizations have issued guidance for libraries about the role of user data in these new technologies, this does not go far enough. the niso privacy principles, for instance, acknowledge that its principles are merely “a starting point.”41 examining the substance of these guidelines is important for confronting the privacy challenges facing library discovery in the 21st century, but there are additional steps libraries can take to more fully address the competing interests of privacy and user experience in library discovery and in library technologies more generally. holding third parties accountable libraries are increasingly at the mercy of third parties when it comes to the development and design of library discovery tools. unfortunately, these third parties not have the same ethical obligations to protect patron privacy that librarians do. in addition, the existing guidance for protecting user data in library technologies is directed towards librarians, not third party vendors. the library community must hold third parties accountable for the ethical design of library discovery tools. one strategy for doing this would be to develop a ranking or certification process for discovery tools based on a community set of standards. the development of hipaa-compliant information technology and libraries | june 2017 54 records management systems in the medical field sets an example. because healthcare providers are required by law to guarantee the privacy of patient data,42 they must select electronic health records systems (erms) that have been certified by an office of the national coordinator for health information technology (onc)-authorized body.43 in order to be certified, the system must adhere to a set of criteria adopted by the department of health and human services,44 which includes privacy and security standards.45 another example is the consumer reports standard and testing program for consumer privacy and security, which is currently in development. consumer reports explains the reason for developing this new privacy standard, “if consumer reports and other public-interest organizations create a reasonable standard and let people know which products do the best job of meeting it, consumer pressure and choices can change the marketplace.”46 libraries could potentially adapt the consumer reports standards and rating system for library discovery tools and other library technologies. engaging in ux research & design libraries should not rely on third parties alone to address privacy and user experience requirements for library discovery tools. libraries are well-poised to become more involved in the design process itself by actively engaging in user experience research and design. the opportunities for “context-driven design” and personalization based on circulation and other anonymous data are promising for library discovery but require ample user testing to determine their usefulness. understanding which types of personalization features offer the most value while preserving privacy is key to accelerating the design of library discovery tools. the growth of user experience librarian jobs and the emergence of user experience teams and departments in libraries signals an increasing amount of user experience expertise in the field, which can be leveraged to investigate these important questions for library discovery. illuminating the black box when librarians adopt new discovery tools without fully understanding their underlying technologies and the data economy in which they operate, this does not serve users. librarians have ethical obligations that should require them to thoroughly understand how and when user data is captured by library discovery tools and other web technologies, and how this information is compiled and shared at a higher level. not only do librarians need to understand the technical aspects of discovery technologies, they also need to understand the related user experience benefits and privacy concerns and the resulting ethical implications. as technology continues to evolve, librarians should be required to engage in continued learning in these areas. such technology literacy skills could be incorporated in the curriculum of library and information science degree programs, as well as in ongoing professional development opportunities. empowering library users because information discovery in an online environment introduces new privacy risks, communication about this topic between librarians and patrons is paramount. librarians should privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 55 proactively discuss with patrons the potential risks to their privacy when conducting research online, whether they are using the open web or library discovery tools. it is ultimately up to the patron to weigh their needs and preferences in order to decide which tools to use, but it is the librarian’s responsibility to empower patrons to be able to make these decisions in the first place. conclusion with the rollback of the fcc privacy rules that prohibit isps from selling customer search histories without customer permission, understanding digital privacy issues and taking action to protect patron privacy is more important than ever. while privacy and user experience are both necessary and important components of library discovery systems, their requirements are in direct conflict with each other. an absolutely private discovery experience would mean that no user data is ever collected during the search process, whereas a completely personalized discovery experience would mean that all user data is collected and utilized to inform the design and features of the system. it is essential for library discovery tools to have built-in functionality that protects patron privacy to the greatest extent possible and enables the ethical use of patron data to improve user experience. the library community must take action to address these requirements beyond establishing guidelines. holding third party providers to higher privacy standards is a starting point. in addition, librarians themselves need to engage in user experience research and design to discover and test the usefulness of possible intermediary solutions. librarians must also become more educated as a profession on digital privacy issues and their ethical implications in order to educate patrons about their fundamental rights to privacy and empower them to make decisions about which discovery tools to use. collectively, these strategies enable libraries to address user needs, uphold professional ethics, and drive the future of library discovery. references 1. irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technologies and libraries 35, no. 3 (2016): 9-23, https://doi.org/10.6017/ital.v35i3.9190. 2. brian fung, “the house just voted to wipe away the fcc’s landmark internet privacy protections,” the washington post, march 28, 2017, https://www.washingtonpost.com/news/the-switch/wp/2017/03/28/the-house-justvoted-to-wipe-out-the-fccs-landmark-internet-privacy-protections. 3. jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica, april 3, 2017, https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 4. nathan freed wessler, “how private is your online search history?” aclu free future (blog), https://www.aclu.org/blog/how-private-your-online-search-history. 5. julia angwin, dragnet nation (new york: times books, 2014), 41-42. information technology and libraries | june 2017 56 6. mit libraries, institute-wide task force on the future of libraries (2016), 12, https://assets.pubpub.org/abhksylo/futurelibrariesreport.pdf. 7. trapido, “library discovery products,” 10. 8. marshall breeding, “the future of library resource discovery,” niso white papers, niso, baltimore, md, 2015, 4, http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_dis covery.pdf. 9. christine wolff, alisa b. rod, and roger c. schonfeld, ithaka s+r us faculty survey 2015 (new york: ithaka s+r, 2016), 11, https://doi.org/10.18665/sr.277685. 10. deirdre costello, “students and faculty research differently” (presentation, computers in libraries, washington, d.c., march 28, 2017), http://conferences.infotoday.com/documents/221/a103_costello.pdf. 11. roger c. schonfeld, meeting researchers where they start: streamlining access to scholarly resources (new york: ithaka s+r, 2015), https://doi.org/10.18665/sr.241038. 12. björn bloching, lars luck, and thomas ramge, in data we trust: how customer data is revolutionizing our economy (london: bloomsbury publishing, 2012), 65. 13. angwin, 21-36. 14. ibid., 32-33. 15. natasha singer, “mapping, and sharing, the consumer genome,” new york times, june 16, 2012, http://www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-ofconsumer-database-marketing.html. 16. lois beckett, “everything we know about what data brokers know about you,” propublica, june 13, 2014, https://www.propublica.org/article/everything-we-know-about-what-databrokers-know-about-you. 17. “an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 18. angwin, dragnet nation, 41-42. 19. anne klinefelter, “privacy and library public services: or, i know what you read last summer,” legal references services quarterly 26, no. 1-2 (2007): 258-260, https://doi.org/10.1300/j113v26n01_13. 20. theresa chmara, privacy and confidentiality issues: guide for libraries and their lawyers (chicago: ala editions, 2009), 27-28. 21. “code of ethics of the american library association,” american library association, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 57 amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 22. “ifla code of ethics for librarians and other information workers,” international federation of library associations and institutions, august 12, 2012, http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-informationworkers-full-version. 23. “privacy & surveillance,” american library association, approved 2015-2016, http://www.ala.org/advocacy/privacyconfidentiality. 24. national information standards organization, niso consensus principles on users’ digital privacy in library, publisher, and softwareprovider systems (niso privacy principles), published on december 10, 2015, http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20pr inciples%20on%20users%92%20digital%20privacy.pdf. 25. “library privacy checklists,” library and information technology association, accessed march 7, 2017, http://www.ala.org/lita/advocacy. 26. panagiotis germanakos and marios belk, “personalization in the digital era,” in humancentred web adaptation and personalization: from theory to practice, (switzerland: springer international publishing switzerland, 2016), 16. 27. ansgar koene et al., “privacy concerns arising from internet service personalization filters,” acm sigcas computers and society 45, no. 3 (2015): 167. 28. ibid., 168. 29. ibid. 30. james connor, “scholar updates: making new connections,” google scholar blog, https://scholar.googleblog.com/2012/08/scholar-updates-making-new-connections.html. 31. schonfeld, meeting researchers where they start, 2. 32. roger c. schonfeld, does discovery still happen in the library?: roles and strategies for a shifting reality (new york: ithaka s+r, 2014), 10, https://doi.org/10.18665/sr.24914. 33. abigail shelton, “american philosophical society announces launch of pal, an innovative recommendation tool for research libraries,” american philosophical society, april 3, 2017, https://www.amphilsoc.org/press/pal. 34. trapido, “library discovery products,” 17. 35. ibid. 36. michael schofield, “does the best library web design eliminate choice?” libux, september information technology and libraries | june 2017 58 11, 2015, http://libux.co/best-library-web-design-eliminate-choice/. 37. jason a. clark, “anticipatory design: improving search ux using query analysis and machine cues,” weave: journal of library user experience 1, no. 4 (2016), https://doi.org/10.3998/weave.12535642.0001.402. 38. rachel vacek, “customizing discovery at michigan” (presentation, electronic resources & libraries, austin, tx, april 4, 2017), https://www.slideshare.net/vacekrae/customizingdiscovery-at-the-university-of-michigan. 39. laurie a. rinehart-thompson, beth m. hjort, and bonnie s. cassidy, “redefining the health information management privacy and security role,” perspectives in health information management 6 (2009): 4.s 40. marshall breeding, “perspectives on patron privacy and security,” computers in libraries 35, no. 5 (2015): 13. 41. national information standards organization, niso consensus principles. 42. joel jpc rodrigues, et al., “analysis of the security and privacy requirements of cloud-based electronic health records systems,” journal of medical internet research 15, no. 8 (2013), https://www.ncbi.nlm.nih.gov/pmc/articles/pmc3757992/. 43. office of the national coordinator for health information technology, guide to privacy and security of electronic health information, april 2015, https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. 44. office of the national coordinator for health information technology, “health it certification program overview,” january 30, 2016, https://www.healthit.gov/sites/default/files/publichealthitcertificationprogramovervie w_v1.1.pdf. 45. office of the national coordinator for health information technology, “2015 edition health information technology (health it) certification criteria, base electronic health record (ehr) definition, and onc health it certification program modifications final rule,” october 2015, https://www.healthit.gov/sites/default/files/factsheet_draft_2015-10-06.pdf. 46. consumer reports, “consumer reports to begin evaluating products, services for privacy and data security,” consumer reports, march 6, 2017, http://www.consumerreports.org/privacy/consumer-reports-to-begin-evaluatingproducts-services-for-privacy-and-data-security/. 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 editorial board thoughts: the importance of staff change management in the face of the growing “cloud” mark dehmlow information technology and libraries | march 2016 3 the library vendor market likes to throw around the word “cloud” to make their offerings seem innovative and significant. in many ways, much of what the library it market refers to as “cloud,” especially saas (software as a service) offerings, are really just a fancier term for hosted services. the real gravitas behind the label cloud really emanated from grid-computing or large interconnected, and quickly deployable infrastructure like amazon’s aws or microsoft’s azure platforms. infrastructure at that scale and that level of geographic distribution was revolutionary when it emerged. still these offerings at their core are basically iaas (infrastructure as a service) bundled as a menu of services. so i think the most broadly applicable synonym for the “cloud” could be “it as a service” in various forms. outsourcing in this way isn’t entirely new to libraries. the function and structure of oclc has arguably been one of the earlier instantiations of “it as a service” for libraries vis-à-vis their marc record aggregation and distribution which oclc has been doing for decades. the more recent trend toward hosted it services has been relatively easy for non-it related units in our library. a service no different to most library staff based on where it is hosted. and with many services implementing apis for libraries, that distinction is becoming less significant for our application developers too. for many of our technology staff, who have built careers around systems administration, application development, systems integration, and application management, hosted services represent a threat to not only their livelihoods but in some ways also their philosophical perspectives that are grounded in open source and do-ityourself oriented beliefs. in many ways the “cloud” for the it segment of our profession is perhaps more synonymous with change, and with change requires effective management of that change, especially for the human element of our organizations. recently, our office of information technologies started an initiative to move 80% of their technology infrastructure into the cloud. they have proposed an inverted pyramid structure for determining where it solutions should reside — focusing first on hosted software as a service solutions for the largest segment of applications, followed by hosting those applications we would have typically installed locally onto a platform or infrastructure as a service provider, and then limiting only those applications that have specialized technical or legal needs to reside on premise. this is a big shift for our it staff, especially, but not limited to, our systems administrators. the iaas platform our university is migrating to is amazon web services and their infrastructure is mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is the director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 4 largely accessible via a web dashboard, so that the myriad of tasks our systems administrators took days and weeks to do can now, in some adjusted way, be accomplished with a few clicks. this example is on one extreme end of the spectrum as far as it change goes, but simultaneously, we have looked at the vendor market to lease pre-packaged tools that support standard functions in academic libraries and can be locally branded and configured with our data — things like course guides, a-z journal lists, scheduling events, etc. the overarching goals of these efforts are cost savings and increased velocity and resiliency of infrastructure, but also and perhaps more important, is giving us flexibility in how we invest our staff time. if we are able to move high level tasks from staff to a platform, then we will be able to reallocate our staff’s time and considerable talent to take on the constant stream of new, high level technology needs. partnering with the university, we are aiming towards their defined goal of moving 80% of our technical infrastructure into the “cloud.” we have adopted their overall strategy of approach to systems infrastructure, at least in principle and are integrating into our own strategy significant consideration for the impact of these changes on our staff. our organization has recognized that people form not only habits around process, but also personal and emotional attachments to why we do things the way we do them, both from a philosophical as well as a pragmatic perspective. our approach to staff change is layered as well as long term. we know that getting from shock to acceptance is not an overnight process and that staff who adopt our overarching goals and strategy as their own will be more successful in the long term. to make this transition, we have developed several strategic approaches: 1. explaining the case: my experience is that staff can live through most changes as long as they understand why. helping them gain that understanding can take some time, but ultimately having that comprehension will help them fully understand our strategic goals as well as help them make decisions that are in alignment with the overall approach. i often find it is important to remember that, as managers, we have been a part of all of the change conversations and we have had time to assimilate ideas, discuss points of view, and process the implications of change. each of our staff needs to go through the same process and it is up to leadership to guide them through that process and ensure they get to participate in similar conversations. it is tempting to want to hit an initiative running, but there is significant value in seeding those discussions gradually over a somewhat gradual time period to more holistically integrate staff into the broader vision. it is important to explain the case for change multiple times and actively listen to staff thoughts and concerns and to remember to lay out the context for change, why it is important, and how we intend to accomplish things. then reassure, reassure, and reassure. the threats to staff may seem innocuous or unfounded to managers, but staff need to feel secure during a process to ultimately buy in. 2. consistency and persistence: staff acceptance doesn’t always come easy — nor should it necessarily. listening and integrating their perspectives into the planning and information technology and libraries | march 2016 5 implementation process can help demonstrate that they matter, but equally important is that they feel our approach is built on something solid. stability is reinforced through consistency in messaging. not only in individual consistency, but also team consistency, and upper management consistency — everyone should be able to support and explain messaging around a particular change. any time staff approach me and say, “it was much easier to do it this other way,” i talk about the efficiency we will garner through this change and how we will be able to train and repurpose staff in the future. the more they hear the message, the more ingrained it becomes, and the more normative it begins to feel. 3. training and investment: it futures require investment, not just in infrastructure, but also in skill development. we continue to invest significantly in providing some level of training on new technologies that we implement. that training will not only prove to staff that you are invested in their development as well as their job security, but it will also give them the tools they need to be successful in implementing new technologies. change is anxiety inducing because it exposes so many unknowns. providing training helps build confidence and competence for staff, reducing anxieties and providing some added engagement in the process. it also gives them exposure to the real world implementation of technologies where they can begin to see the benefits that you have been communicating for themselves. 4. envisioning the future: improvements and roles — one of the initial benefits we will be getting from recouping staff time is around shoring up our processes. we have generally had a more ad hoc approach to managing the day to day. it has been difficult to institute a strong technical change management process, in part, because of time. we will be able to remove that consideration from our excuses as we take advantage of the “cloud.” the net effect will be that we will do our work more thoughtfully and less ad hoc and use better defined processes that will meet group-developed expectations. in addition to doing things better, we do expect to do things differently. with fewer tasks at the operational level, we believe we will be able to transition staff into newly defined roles. some of these roles include devops engineers, a hybrid of application engineering (the dev) and systems administration (the ops), these staff will help design automation and continuous integration processes that allow developers to focus on their programming and less on the environment they are deploying their applications in; financial engineers who will take system requirements and calculate costs in somewhat complex technical cloud environments; systems architects who will be focused on understanding the smorgasbord of options that can be tied together to provide a service to meet expected response performance, disaster recovery, uptime, and other requirements; and business analysts who will focus on taking technical requirements and looking at all of the potential approaches to solve that need whether it be a hosted service, a locally developed solution, an implementation of an open source system, or some integration of all or some of the editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 6 above. this list is by no means exhaustive, but i think it forms a good foundation on which to help staff develop their skill set along with our changing environment. i believe it is important to remind those of us who are managing it departments in libraries that in many ways the easiest parts of change are the logistics. the technology we work with is bounded by sets of guidelines that define how they are used and ensure that if they are implemented properly, they will work effectively. people on the other hand are not bounded as neatly by stringent rules. they are guided by diverse backgrounds, personalities, experiences, and feelings. they can be unpredictable, difficult to fully figure out, and behaviorally inconsistent. and yet, they are the great constant in our organizations and therefore require significant attention. our field needs “servant leaders” dedicated to supporting and developing staff, and not just being competent at implementing technologies. those managers who invest in staff, their well-being, development, and sense of engagement in their jobs, will find their organizations are able to tackle most anything. but those who ignore their staffs’ needs over pragmatic goals will likely find their organizations struggling to move quickly and instead spend too much energy overcoming resistance instead of energizing change. reproduced with permission of the copyright owner. further reproduction prohibited without permission. site license initiatives in the united kingdom: the psli and nesli experience borin, jacqueline information technology and libraries; mar 2000; 19, 1; proquest pg. 42 l site license initiatives in the united kingdom: the psli and nesli experience jacqueline borin this article examines the development of site licensing within the united kingdom higher education community. in particular, it looks at haw the pressure to make better use of dwindling fiscal resources led ta the conclusion that information technology and its exploitation was necessary in order to create an effective library service. these conclusions, reached in the follett report of 1993, led to the establishment of a pilot site license initiative and then a national electronic site license initiative. the focus of this article is these initiatives and the issues they faced, which included off-site access, definition of a site and perhaps most importantly, the unbundling of print and electronic journals. increased competition for institution funding around the world has resulted in an erosion of library funding. in the united states state universities are receiving a decreasing portion of their funds from the state while private universities are forced to limit tuition increases due to outside market forces. in the united kingdom the entitlement to free higher education is currently under attack and losing ground. today's economic pressures are requiring individual libraries to make better use of their fiscal resources while the emphasis moves from being a repository for information to providing access to information. jacqueline sorin (jborin@csusm.edu) is coordinator of reference and electronic resources, library and information services, california state university, san marcos. as in the united states, the use of consortia for cost sharing in the united kingdom is becoming imperative as producers produce more electronic materials and make them available in full-text formats. consortia, while originally formed to cooperate on interlibrary loans and union catalogs, have recently taken on a new role, driven by financial expediency, in negotiating electronic licenses for their members, and the percentage of vendor contracts with consortia are rising. academic libraries cannot afford the prevalent pricing model that asks for the current print price plus an electronic surcharge plus projected inflation surcharges, therefore group purchasing power allows higher education institutions to leverage the money they have and to provide resources that would otherwise be unavailable. advantages for the vendor include one negotiator and one technical person for the consortia as a whole. in addition, the use of consortia provide greater leverage in pushing for the need for stable archiving and for retaining the principles of fair use within the electronic environment as well as reminding publishers of the need for flexible and multiple economic models to deal with the diverse needs and funding structures of consortia. i during the spring of 1998, while visiting academic libraries in the united kingdom, i looked at an existing initiative within the uk higher education community-the pilot site license initiative (psli), which had begun as a response to the follett report and to rising journal prices. at the time the three-year initiative was nearing its end and its successor, the national electronic site license initiative (nesli), was already the topic of much discussion. i history the concept of site licensing in the united kingdom higher education 42 information technology and libraries i march 2000 community had already been established, since 1988, by the combined higher education software team (chest), based at the university of bath. chest has negotiated site licenses with software suppliers and some large database producers through two different methods. either the supplier sells a national license to chest, which passes it on to the individual institution or chest sells licenses to the institution on the suppliers behalf and passes the fees on to them (see figure 1). chest works closely with national information services and systems (niss). niss provides a focal point for the uk education and research communities to access information resources. niss's web service, the niss information gateway, provides a host for chest information such as ebsco masterfile and oclc netfirst. most chest agreements are institution-wide site licenses that allow for all noncommercial use of the product, normally for five years to allow for incorporation into the curriculum. once an institution signs up it is committed for the full term of the agreement. chest is not in the business of either evaluating products or differentiating among competing suppliers. evaluations and purchase decisions are left up to the individual institutions.2 chest does set up and support e-mail discussion lists for each agreement so that users can discuss features and problems of the product among themselves. they also send out electronic news bulletins to provide advance warning of forthcoming agreements and to assess level of interest in future agreements. chest operates in a similar manner to many library consortia in the united states. the major differences are that it sells to higher education institutions as a whole so the products they sell include not only databases but also for example, software programs. this is also beginning to change in reproduced with permission of the copyright owner. further reproduction prohibited without permission. the united states. a recent article in the chronicle of higher education mentions that institutions will not stop with library databases, "in the future we'll be negotiating site licenses for software and all sorts of things . . . not just databases."3 although chest is substantially self-funding it is strongly supported (as is niss) by the joint information systems committee (jisc) of the higher education funding councils of england (hefce). the majority of public funding for higher education funding in the united kingdom is funneled through the hefcs (one each for england, scotland, wales, and northern ireland). one of the jisc committees, the information services subcommittee (issc), which in 1997 became part of the committee for electronic information (cei) defined principles for the delivery of content. 4 they were: • free at the point of use; • subscriptions not transaction based; • lowest common denominator; • universality; • commonality of interfaces and • mass instruction. i follett report in 1993 an investigation into how to deal with the pressures on library resources caused by the rapid expansion of student numbers and the worldwide explosion in academic knowledge and information was undertaken by the joint funding council 's libraries review group, chaired by sir brian follett. this investigation resulted in the follett report. one of the key conclusions of the report was "the exploitation of it is essential to create the effective higher education and public research establishments software, data , training needs ! chest © chest (university of bath) 1996 figure 1. chest diagram chest deals , chest offers negotiations software , data, training materials t it product suppliers library service of the future ." the review group recommended that as a starting point "a pilot initiative between a small number of institutions and a similar number of publishing houses should be sponsored by the funding councils to demonstrate in practical terms how material can be handled and distributed electronically." 5 as a consequence £15 million was allocated to an electronic libraries program, managed by jisc on behalf of hefce. the electronic libraries program was to "engage the higher education community in developing and shaping th e implementation of the electronic library." 6 this project provided a body of electronic resources and services for uk higher education and influenced a cultural shift towards the acceptance and use of electronic resources instead of more traditional information storage and access methods. psli in may 1995 a pilot site license initiative subsidized by the funding councils was set up to : • test if the site license concept could provide wider access to journals for those in the academic community; • see if it would allow more flexibility in the use of scholarly material ; • test the methods for dissemination of scholarly material to the higher education sector in a variety of formats ; • test legal models for a national site license program; and • explore the possibility for increased value for money from scholarly journals.7 sixty-five publishers were invited by hefce to participate for three years commencing january 1, 1996. hefce was also responsible through jisc for the funding of the elib program, but no formal links were established between the elib project and communications i borin 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the psli. 8 the final selection of four companies included academic press ltd., blackwell publishers ltd., blackwell science ltd., and iop publishing ltd. the publishers agreed to offer print journals to higher education institutions for discounts of between 30 and 40 percent over the three year period as well as electronic access as available. originally the electronic journals were supposed to be the subsidiary component of the agreement but by the end of the agreement they had become the major focus. the psli achieved almost 100 percent take up among the higher education institutions due to the anticipated savings through the program.9 hefce did not specify how the publishers were to deliver their content. iopp hosted the journals on their own server, for example, while academic press linked their ideal server to the journals online service at the university of bath. one of the key provisions of the site license was the unlimited rights of authorized users to make photocopies (including their use within course packs) of the journals. academic press and iopp provided full-text access to all their journals while blackwell and blackwell science only allowed reading of full text where a print subscription existed. an integral part of the psli was that the funding from hefce to the higher education institutions was top sliced to support the discounted price offered to the institutions. several assessments of the initiative were made and a final evaluation of the pilot was concluded at the end of 1997. initial surveys indicated subscription savings through the program (average annual savings were approximately £11,800 per annum) and the first report of the evaluation team showed a wide level of support for the project despite major problems with lack of communication in a timely manner.10 the team recommended an extension of the psli to include more publishers and more emphasis on electronic delivery. one concern that was raised was ease of access, students had to know which system a journal they required was on. this was not easily discernible or user friendly. evaluations by focus groups showed users wanted one single access point to all electronic journals.11 also unresolved was the need for one consistent interface to the electronic journals and a solution to the archiving issue. at the end of the psli, hefce handed the next phase over to jisc. in the fall of 1997 jisc announced that a nesli would be set up and a new steering group was established. nesli was to be an electronic-only scheme and the invitation to tender went out at the end of 1997 with a decision to be made mid-1998. national electronic site license initiative nesli, a three-year jisc funded program, began on january 1, 1999 although the "official" launch was held at the british library on june 15, 1999. it is an initiative to deliver a national electronic journal service to the united kingdom higher education and research community (approximately 180 institutions) and is a successor program to the pilot site license initiative {psli). in may 1998 jisc appointed a consortium of swets and zeitlinger and manchester computing {university of manchester) to act as a managing agent (swets and blackwell ltd. announced in june 1999 their intention to combine swets subscription service and blackwell's information services, the two subscription agency services). the managing agent represents the higher education institutions in negotiations with publishers, manages delivery of the electronic material through a single web interface and oversees day-to-day operation of the program including the handling of subscriptions.12 44 information technology and libraries i march 2000 the managing agent also encourages the widespread acceptance by publishers of a standard model site license, one of the objectives of this being to reduce the number and diversity of site definitions used by publishers. other important provisions of the model site license addressed the issues of walk-in use by clients and the need for publishers to provide access to material previously subscribed to when a subscription is cancelled. the subscription model is currently the prevalent option although they are also working towards a pay-per-view option.13 priority has been given to publishers who had been involved in the psli and to those publishers participating in swetsnet, the delivery mechanism for the nesli. swetsnet is an electronic journal aggregation service that offers access to and management of internet journals. its search engine allows searching and browsing through titles from all publishers with links to the full-text articles. nesli is not a mandatory initiative, the higher education institutions can choose whether to participate in proposals and can pursue their own arrangements individually or through their own consortiums if they wish. while psli was basically a printbased initiative limited to a small number of publishers and funded via top slicing, nesli is an electronic initiative aimed at involving many more publishers. it is designed to be self-funding, although it did receive some start-up funding. although it is an electronic initiative, proposals that include print will be considered, as it is still not easy to separate print and electronic materials.14 the initiative addresses the most effective use, access, and purchase of electronic journals in the academic library community. its aims include: • access control-for on-site and remote users; • cost; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • definition of a site; • archiving; and • unbundling print from electronic. access to swetsnet, the delivery mechanism for journals included in nesli, has now been supplemented by the option of athens authentication. athens, an authentication system developed by niss, provides individuals affiliated with higher education institutions a single username and password for all electronic services they have permission to access. athens is linked to swetsnet to ensure access for off-site, remote, and distance learners who do not have a fixed ip address. this supplements swetsnet's ip address authentication, which does not allow for individual access to toc and sdi alerting. a help desk is available for all nesli users through the university of manchester. the definition of a site is being addressed by the nesli model site license, which tries to standardize site definitions (including access from places that authorized users work or study, including homes and residence halls); interlibrary loan (supplying an authorized user of another library a single paper copy of an electronic original of a individual document); walk-in-users; access to subscribed material in perpetuity (it provides for an archive to be made of the licensed material with access to the archive permissible after termination of the license); and inclusion of material in course packs. jisc' s nesli steering group approved the model nesli site license on may 11, 1999 for use by the nesli managing agent.15 the managing agent asks publishers to accept the model license with as few alterations as possible. during the term of the initiative the managing agent will be working on additional value added services. these include links from key indexing and abstracting services, provision of access via z39.50, linking from library opacs, creation of catalog records and assessing a model for ejournal delivery via subject clusters. in particular, they have begun to look at the technical issues concerned with providing marc records for all electronic journals included in nesli offers. additionally they will be looking at solutions for longer term archiving of electronic journals to provide a comfort level for librarians purchasing electronic only copies.16 two offers that have been made under the nesli umbrella so far are blackwell sciences for 130 electronic journals and johns hopkins university press for 46 electronic titles. most recently two additional vendors have been added to the list. elsevier has made a proposal to deliver full text content via the publishers sciencedirect platform that includes the full text of more than 1,000 elsevier science journals along with those of other publishers. a total of more than 3,800 journals would be included in the service.17 mcb university press, an independent niche publisher, is offering access to 114 full text journals and secondary information in the area of management through it's emerald intelligence + fulltext service. similarly, here in the united states, california state university (csu) put out for competitive tender a contract for the building of a customized database of 1200+ electronic journals based on the print titles subscribed to by 15 or more of the 22 campuses-journal access core collection oacc). the journals will be made available via pharos, a new unified information access system for the csu. like ohiolink, a consortium of 74 ohio libraries, it will provide a common interface to electronic journals for students and faculty and will facilitate the development of distance learning programs.18 by unbundling the journals, libraries will no longer be required to pay for journals they do not want or need leading to moderate price savings. additional savings can be realized through the lowering of overhead costs achieved by system wide purchasing of core resources. other issues being addressed within the jacc rfp included archiving and perpetual access to journal articles the university system has paid for, availability of e-journals in multiple formats, interlibrary loan of electronic documents, currency of content and cost value at the journal-title level. 19 currently 500 core journals are being provided under the jacc by ebsco information services and the csu plans on expanding those offerings. i conclusion as we move into the next millennium library consortia will continue to work together with vendors to further customize journal offerings. however it is still far too early to say whether nesli will be successful or whether it will succeed in getting the publishing industry to accept the model site license. if it is to work within the higher education community, it will depend greatly on the flexibility and willingness of the publishers of scholarly journals. it has made a start by developing a license that sets a wider definition of a site and that deals realistically with the question of off-site access. by encouraging the unbundling of electronic and print subscriptions nesli allows services to be tailored to specific needs of the information community, but it remains to be seen how many publishers are prepared to accept unbundled deals at this stage. also as technology stabilizes and libraries acquire increasingly larger electronic collections, we will not be able to rely on license negotiations as the only way to influence pricing, access, and distribution. an additional problem that remains unaddressed by either psli or nesli is the pressure on academics to publish in traditional journals and the corcommunications i borin 45 reproduced with permission of the copyright owner. further reproduction prohibited without permission. responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no. 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee, "principles for the delivery of content." accessed nov. 17, 1999, www.jisc.ac.uk/ pub97 /nl_97.html#issc. 5. joint funding council's libraries review group. the follett report. (dec. 1993): accessed nov. 20, 1999, www.niss.ac. uk/ education/ hefc / follett/report/. 6. john kirriemuir, "background of the elib programme." accessed nov. 21, 1999, www.ukoln.ac.uk/services.elib/ background/history.html. 7. psli evaluation team, "uk pilot site license initiative: a progress report," serials 10, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no. 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39. 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov. 21, 1999, www.nesli.ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site. accessed nov. 22, 1999, www.nesli.ac.uk/ mode1license8.html. 16. albert prior, "nesli progress through collaboration," learned publishing 12, no. 1 (1999). 17. science direct. accessed nov. 24, 1999, www.sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998). 19. the journal access core collection request for proposal. accessed nov. 22, 1999, www.calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf. 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nern w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands. nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. tull 201 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 2. optimization of key-sets, and evaluation of their retrieval efficiency dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. when combined in appropriate mtios, and used together with keys for each of the first two initials of personal names, they provide a high degme of discrimination in search. methods for optimization of key-sets are desc1·ibed, and the performance of key-sets varying in size between 150 and 300 is determined at file sizes of up to 50,000 name entries. the effects of varying the proportions of the queries present in the file are also examined. the results obtained with fixed-length keys are compared with those f01' variable-length keys, showing the latter to be greatly superior. implications of the work for a variety of types of information systems a1'e discussed. introduction in part i of this series the development of variety generators, or sets of variable-length keys with high relative entropies of occurrence, from the initial and terminal character strings of authors' surnames was described.1 their purpose, used singly or in combination, is to provide a high and constant degree of discrimination among personal names so as to facilitate searches for them. in this paper the selection of optimal combinations of the keys and evaluation of their efficiency in search are described. the performance of combined key-sets of various compositions is determined at a range of file sizes and compared with fixed-length keys. in addition, 202 1 ournal of lib1'm'y automation vol. 7 i 3 september 197 4 the extent of statistical associations among keys from different positions in the names is determined. balancing of key-sets the relative entropies of distribution of the first and last letters of the surnames of authors in the file of 100,000 entries from the inspec data base differ significantly, the former being 0.92 and the latter 0.86. as a result, a larger key-set has to be produced from the back of the surnames to reach the same value of the relative entropy as that of a key-set of given size from the front of the surname. for instance, the value of 0.954 is reached by a key-set comprising 41 keys from the front of the name, but a set of 101 keys from the back is needed to attain this value. it seemed reasonable to assume that keys from the front and rear should be combined in different proportions in order to maximize the relative entropy of the combined system, and that their proportions should reflect the redundancies of each distribution (redundancy = 1 hr). in order to test this, a series of combined key-sets of different total sizes was produced, in which the proportions of keys were varied around the ratio of the redundancies of the first and last character positions, i.e., ( 1 0.92): ( 1 0.86), or 8:14. the relative entropies of the name representations provided by combining these key-sets with keys for the first and second initials were determined by applying them to the 50,000 name file, and the entropy value used to determine the optimal ratio of keys. in one case, the correlation between the value of the relative entropy and retrieval efficiency, as measured by the precision ratio, was also studied, and shown to be high. the sizes of the combined key-sets studied were 148 and 296, with an intermediate set of 254 keys. the values of 148 and 296 were chosen in view of the projected implementation in the serial-parallel file organization.2 this relates the size of the key-set to the number of blocks on one cylinder of a disc. (the 30mbyte disc cartridges available to us have 296 blocks per cylinder.) otherwise the choice of key-set is arbitrary, and can be varied at will. the minimum key-set size is 106, consisting of 26 letters each for the first and last letter of the surname, and 27 ( 26 letters and the space symbol) each for the first and second initials. the numbers of n-gram keys ( n ::::,. 2) required for the key-sets numbering 148, 254, and 296 in size are . thus 42, 148, and 190. full details are given of the composition of the first and third of these sets. a slight refinement to key-set generation was employed to ensure as close an approximation to equifrequency as possible, especially with the smallest key-sets. precise application of a threshold frequency may occasionally result in arbitrary inclusion of either very high or very low frequency keys. thus, if almost all the occurrences of a longer key are accounted for by a shorter key (as with -mann and -ann), only the shorter n-gram is included. va1'iety-generato1· approach/fokker and lynch 203 optimal set of 148 keys the number of n-gram keys ( n ::::::,. 2) to be added to the minimum set of 106 keys is 42, the presumed optimum proportion being 8:14, which implies about 16 keys from the front of the name and 26 from the back. in order to examine the relationship between the ratio of keys from the front and rear of the surname and the relative entropy of the combined sets, the ratios were varied at intervals between 1:1 and 1:3 so that the numbers of n-grams varied from 21 and 21 to 11 and 31 respectively. for each ratio the keys were applied to the 50,000 name entries, and the distribution of the resultant descriptions determined. the ratios, the number of n-gram keys, and the relative entropies of the distributions are shown in table 1. the maximum value of the entropy is taken to be log250,000. in this case the balancing point, with the key-set including 16 n-gram keys table 1. relation between ratio of n-grams f1'0m f1'dnt and rear of surname, entropy of combined key-sets, and retrieval efficiency for a series of sets of 148 keys ratio numbm· of n-gram number of diffm·ent relative · precision(%) of n-gram keys representations entropy (file size= keys front back in 50,000 entries of system 25,000) 1:1 21 21 33,485 0.9450 71.5 3:4 18 24 33,501 0.9450 71.3 17:25 17 25 33,434 0.9447 70.9 8:13 16 26'* 33,454 0.9453 72.2 5:9 15 27 33,402 0.9450 72.0 1:2 14 28 33,378 0.9449 72.1. 1:3 11 31 33,126 0.9437 71.5 total number of different name entries = 41,469. '* key-set with highest relative entropy. from the front and 26 from the back, corresponds with the ratio of the redundancies of the first and last letters of the surnames. table 2 shows the composition of the optimal key-set of 148 keys, while table 3 gives the distribution of the name representations compiled from the combined key-set, and its corresponding relative entropy. optimal set of 296 keys a similar procedure to that used for the optimal148-key key-set was also applied in this instance. here the ratios of front and rear n-gram keys varied from 57 and 133 to 69 and 121 respectively. for each of the sets chosen, the distributions of the entries resulting from application of the combined key-sets to the file of 50,000 names were determined. these showed virtually no difference in terms of the relative entropy alone, although the total number of different entries differed slightly between keysets, and the highest value was used to choose the optimal set, detailed in table 4. the range of combinations studied is shown in table 5, and the distribution of the entries for the optimal set is given in table 6. .. , -::: 204 journal of library automation vol. 7/3 september 197 4 table 2. composition of balanced key-set of 148 keys keys from front of surname ( 42) : key p• key p• key p• key p• a .035 g .055 ma .030 sh .016 b .020 h .035 n .025 st .016 ba .020 ha .021 0 .017 t .040 be .017 i .013 p .038 u .005 bo .014 j .017 pa .014 v .025 br .014 k .041 q .001 w .040 c .036 ka .017 r .032 x ch .016 ko .017 ro .017 y .011 d .044 l .033 s .049 z .013 e .018 le .014 sa ,016 f .034 m .050 sc .015 keys from rear of surname (52) : a .060 ii .015 nn .010 is .012 ra .010 ki .015 on .018 t .042 va .015 j .001 son .027 u .013 b .003 k .033 0 .028 v .001 c .005 l .013 ko .013 ev .018 d .030 el .012 p .004 ov .026 e .068 ll .016 q .001 ,kov .012 f .006 m .013 r .016 nov .on g .012 n .009 er .064 w .005 ng .014 an .020 ler .013 x .003 h .020 man .017 ner .010 y .031 ch .017 en .025 s .055 ey .012 i .044 in .039 es .015 z .013 keys from first initial: 27 characters keys from second initial: 27 characters table 3. frequencies of entries represented by optimall48-key key-set in a file of 50,000 names frequency number of entries with f frequencyf 1 24,363 2 5,622 3 1,850 4 757 5 372 6 193 7 103 8 68 9 32 10 24 11-15 54 16--20 11 21-30 4 33 1 total number of different entries = 33,454 maximum number of possible combinations= 1,592,136 (i.e., 42 x 52 x 27") h = 14.7553 hmax = 15.6096{log,50,000) hr = 0.9453 variety-generator approach/fokker and lynch 205 table 4. composition of balanced key-set of 296 keys keys from front of surname ( 87) : a bu e ha ki ma ni ra si wa al c f he ko mar 0 re so we an qa fr ho kr mc p ri st wi b ch g hu ku me pa ro t x ba co ga i l mi pe s ta y bar d go j la mo po sa u z be da gr jo le mu pr sc v bo de gu k ll n q se· va br do h ka m na r sh w keys from rear of surname ( 155) : a ld ng vskii el lin r or nt sov ca nd ang ki ll tin ar s rt w da rd lng ski all nn er as ert x ka e rg wski ell on ber es st y ma de h li m son der nes tt ay na ee ch ni am lson ger is ett ey ina ge ich ri n nson nger ns u ley ra ke vich ti an rson her ins v ky ta le gh j man ton ier os ev ry va ne sh k rman 0 ker rs ov z ova re th ak yan ko ler ss kov tz wa se ith ck en nko ller ts ikov ya te i ek sen no mer us lov b f ai ik in to ner t nov c ff hi l ein p ser dt anov d g ii al kin q ter et rov keys from first initial: 27 characters keys from second initial: 27 characters table 5. relation between ratio of n-grams from front and rear of surname and entropy of combined key-sets for a series of sets of 296 keys (file size= 50,000) ratio ofn-gram keys 3:7 61:129 13:25 69:121 number of n-gram keys front 57 61 65 69 back 133 129'* 125 121 '* key-set with highest number of different entries. number of different representations 39,182 39,191 39,186 39,179 relative entropy of system 0.9679 0.9679 0.9679 0.9679 in this instance, the ratio of n-gram keys from the front and back of the surnames has been displaced from the ratio of the redundancies of the first and last characters of the surnames, i.e., 8:14 (1:1.7). here the ratio is roughly 1:2. this is undoubtedly due to the fact that the relative entropies of key-sets from the back of the surname increase less rapidly than those of key-sets from the front, and hence larger sets must be employed. evaluation of retrieval effectiveness the keys in the optimized key-sets represent name entries in an approxi,, i' i: 206 ] oumal of librm·y automation vol. 7 i 3 september 197 4 table 6. frequencies of entries represented by optimal key-set of 296 keys in a file of 50,000 names frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 total number of different entries = 39,191 number of entries with frequencyf 31,705 5,394 1,371 442 164 63 27 12 4 3 2 2 1 1 maximum number of possible combinations= 9,830,565 (i.e., 87 x 155 x 27') h = 15.108 hmax = 15.6096(log,50,000) hr = 0.9679 mate manner only, so that when a search for a name is performed, additional entries represented by the same combination of keys are identified. while these may be eliminated in a subsequent character-by-character match of the candidate hits, the proportion of unwanted items should remain low if the method is to offer advantages. in evaluating the effectiveness of the key-sets in the retrieval, the names in the search file were represented by concatenating the codes for the keys from the front and back of the surnames and the initials, and subjecting the query names to the same procedure. the matching procedure produced lists of candidate entries, of which the desired entries were a subset. the final determination was carried out manually. the tests were performed first with names sampled from the search file, so that correct items were retrieved for each query. since searches for name entries may be performed with varying probabilities that the authors' names are present in the file (especially in current-awareness searches), varying proportions of names of the same provenance, but known not to be present in the search file, were also added. in these cases candidate items were selected which included none of the desired entries. recall tests were also performed and recall shown to be complete. the measure used in determining the performance of the variety-generator search method is the precision ratio, defined as the ratio of correctly identified names to all names retrieved. it is presented both as the ratio of averages (i.e., the summation of items retrieved in the search and calculation of the average) and as the average of ratios (i.e., averaging the val'iety-genemtor app1'0ach/fokker and lynch 207 figures for individual searches). the latter gives higher figures, since many of the individual searches give 100 percent precision ratios. the precision ratio was found to be dependent on file size and to fall somewhat as the size of file increases. this is due to the fact that the keysets provided only a limited, if very high, total number of possible combinations, while the total possible variety of personal names is virtually unlimited. the evaluation was performed with a sample of 700 names, selected by interval sampling. this number ensured a 99 percent confidence limit in the results. a comparison of the interval sampled query names with randomly sampled names showed that no bias was introduced by interval sampling. a test to confirm that the retrieval effectiveness reached a peak at the maximum value of the relative entropy of a balanced key-set was performed first. this was carried out on a file of 25,000 names, using as queries names selected from the file and the optimal 148-key key-set. as shown in table 1, the values of the precision ratio (ratio of averages) and of the relative entropy both peak at the same ratio of n-gram keys from the front and back of the surnames. the performance of the optimal key-sets of 148, 254, and 296 keys with files of 10,000, 25,000, and 50,000 names is shown in table 7. calculated as the ratio of averages, the smallest key-set ( 148 keys) shows a precision ratio of 64 percent with a file of 50,000 names, which means that of every three names identified in the variety-generator search, two are those desired. with the largest key-set ( 296 keys), this rises to nine correctly identified names in every ten retrieved at this stage. on the other hand, calculated as the average of ratios, the precision ratios rise to 81 percent and 94 percent respectively. for smaller file sizes-typical, for instance, of current-awareness searches-the figures for all of these are cottespondingly higher. table 7. precision ratios obtained in variety-generator searches of personal names-queries sampled from sea1'ch file (confidence level= 99 pm·cent) precision as ratio of averages (%) : file size 50,000 25,000 10,000 precision as average of ratios (%) : file size 50,000 25,000 10,000 148 64 71 84 148 81 87 93 key-set size 254 87 90 93 key-set size 254 91 95 97 296 90 91 94 296 94 96 97 '; ~;: 208 journal of library automation vol. 7/3 september 1974 the effect of sampling from a larger file, so that increasing proportions of the names searched for are not present in the search file, is shown in table 8 for a file of 25,000 names. in this case, the proportion of correctly identified names in the total falls, so that overall performance is somewhat reduced. thus, depending both on file size and on the expected proportion of queries identifying hits, the key-set size can be adjusted to reach a desired level of performance. in addition, tests to determine the table b. effect of varying proportion of query names not present in search file of 25,000 names, using 296 keys (ratio of averages) %of names not precision% number of names number of names in search file (ratio of averages) ret1·ieved correctly retrieved 21 90 766 691 42 85 595 505 61 83 449 371 74 76 319 242 84 68 228 154 applicability of a key-set optimized for one file of 50,000 names to another file of the same provenance and size were carried out. the three key-sets derived from the first file were applied to the second, query names sampled from the latter, and the precision ratios determined. some reduction in performance was observed; expressed as ratio of averages, the precision with the 296-key key-set fell from 90 to 83 percent, with the 254-key keyset from 87 to 82 percent, and with the 148-key key-set from 64 to 56 percent, figures which seem unlikely to prejudice the net performance in any marked way. nonetheless, monitoring of performance and of data base name characteristics over a period of operation might well be advisable. distribution characteristics of other types of keys it is particularly instructive to examine the distribution characteristics of other types of keys, including those of fixed length, generated from various positions in the names, and to compare them with those of the optimal key-sets employed in the variety-generator approach. to this end, the file of 50,000 names was processed to produce the following keys or keysets: 1. initial digram of surname. 2. initial trigram of surname. 3. key-set of ninety-four n-grams from the front of the surname, with first and second initials. 4. key-set consisting of first and last character of surname, with first and second initials. the figures (table 9) show clearly that all have distributions which leave no doubt as to their relative inadequacy in resolving power, where this is defined as the ratio of distinct name representations provided by the key-set used to the number of different name entries ( 41,469) in the file. at the digram level, the value of the resolving power is 0.009, i.e., each vm·iety-generator approach/fokker and lynch 209 digram represents, on average, 110 different name entries, while no fewer than thirty-two specific digrams each represent between 500 and 1,000 different names. at the trigram level, the value of the resolving power rises to 0.08, a tenfold increase; however, one trigram still represents between 500 and 1,000 different names. use of the first and last letters of the surname plus the initials again increases the value of the resolving power to 0.627, or 1.6 distinct names per entry; eight of the representations now account for between thirty-one table 9. distributions of a variety of other representations of personal names in a file of 50,000 entries 94 n-grams from first and last frequency initial digram initial trigram front of surname letter of surname f of surname of surname plus 2 initials plus 2 initials 1 40 735 8,964 16,346 2 22 428 3,929 4,919 3 16 249 1,884 2,025 4 11 197 1,006 973 5 7 170 646 581 6 7 110 397 340 7 10 112 234 224 8 4 98 186 146 9 7 81 144 92 10 5 66 108 72 11 6 61 70 49 12 2 56 88 36 13 5 51 74 33 14 1 48 50 24 15 2 35 51 23 16 3 37 36 25 17 2 35 29 15 18 3 33 29 11 19 8 35 28 6 20 8 40 23 5 21-30 21 207 127 49 31-40 23 109 47 8 41-50 13 88 13 51-100 36 142 3 101-200 24 62 201-500 57 15 501-1000 32 1 total 375 3,301 18,166 26,002 resolving power .009 .080 .438 .627 and forty distinct entries. in contrast, however, the key-set of 148 keys comprising ninety-four n-gram keys from the front of the name and the first and second initials, although almost 50 percent larger than the fourcharacter representation, has a resolving power of only 0.438 (or 2.28 entries per representation). this contrast provides particularly strong evidence for the superiority of keys from the front and rear of the surnames over those from the front alone, even when the latter are variable in •' 210 journal of library automation vol. 7/3 september 1974 length. as expected, the precision ratio of the four-character representation is low, at 37 percent (ratio of averages), compared with 64 percent for the optimal148-key key-set. extent of statistical association among keys thus far, the frequency of occurrence of variable-length character strings from the front and back of the surnames is the only factor considered in their selection as keys. it is well known in other areas that statistical associations among keys can influence the effectiveness of their combinations. 3 where a strong positive association between two keys exists, their intersection results in only a small reduction of the number of items retrieved over that obtained by using each independently. when the association is strongly negative, the result of intersection may be much greater than that predicted on the basis of the product of the individual probabilities of the keys. to assess the extent of associations among keys from the front and rear of surnames and initials, sets of both fixedand variable-length keys from each of these positions were examined.· the kendall correlation coefficient v was calculated for each of the twenty most frequent combinations of these. this is related to the chi-square value by the expression x2 =m v2 where m is the file size, or 50,000. table 10 shows the values of the association coefficient for certain of the characters in the full name. those above .012 are significant at a 99 percent confidence level. positive associations are table 10. a8sociation coefficients for sets of the most frequent digrams from various positions in personal names first and last first letter of surname first and second letters of surname and first initial initials digram v digram v digram v kv .064 kv .054 hv .078 wr .050 hj .027 mv .069 ka .038 br -.024 kv .069 hn .028 sj -.023 rv -.055 sa .024 dj .022 dv -.053 sn .024 bg .018 tv .053 cn .022 ka .018 jv -.045 kn -.020 cj ,018 sv .034 ma .014 sd .015 fv .033 kr -.011 sv .013 nv -.029 sv ,010 mm .011 gv .022 rn .010 mj ,007 lv -.022 bn -.008 bj ,005 iv -.019 br .008 sg -.004 av -.019 mn -.007 sr .004 cv -.018 sr .007 ba .004 pv .017 mr .004 ma ,004 wv -.014 si -.002 sm -.003 yv .010 gn .001 mr .002 bv .005 ln .001 sa -.000 ev -.002 variety-generator app1'0ach/fokker and lynch 211 more frequent than negative. the figures indicate that intersection of certain of these characters as keys in search would result in some slight diminution in performance against that expected. the figures for the association coefficients among the twenty most frequent combinations of keys from the front and back of surnames in the 148and 296-key key-sets show magnitudes (mostly positive) which are substantially greater than those for single characters (see table 11). the reasons for these values are obvious; in certain instances, e.g., miller, jones, and martin, common complete names are apparent, while in one case, lee, an overlap between keys from the front and rear exists. in others, linguistic variations on common names can be discerned, as with br n-brown or braun. table 11. association coefficients in the twenty most frequent key combinations from front and back of surnames in two key-sets key-set size key-set size 148 296 keys v keys v s h .146 s ith .343 j son .127 jo nson .297 sc er .104 jo nes .278 w s .043 an rson .274 t a .038 si gh .249 t i .038 le ee .221 w er .038 mu ller .214 c e .034 ta or .195 f er .033 gu ta .168 p s .025 br n .160 d e .023 mi ller .151 l e .022 mar tin .145 w e .022 wi s .137 g in .020 f her .133 m e .009 sc der .121 s a .008 sa to .110 g e .006 t as .084 m a .005 sc er .069 m er -.004 ch en .055 g er -.000 t son .050 such associations are inevitable. when the selection of keys is based solely on frequency, some deviation from the ideal of independence must result, becoming larger as the size of the key-sets increases, and as the length of certain of the keys increases. however, since its effect in the most extreme cases is merely to lead to virtually exact definition of the most frequent surnames, no particular disadvantage results. possible implementations of the variety-generator name search approach the variety-generator approach permits a number of possible implementations of searches for personal names to be considered, if only in outline f ( f•j/ 212 journal of library automation vol. 7/3 september 1974 at this stage, using a variety of file organization methods. the most widely known methods (apart from purely sequential files) are direct access (utilizing hash-addressing), chained, and index sequential files. direct application of the concatenated key-numbers as the basis for hash-address computation appears attractive in instances where the personal name is used alone or in combination (as, for instance, with a part of the document title). the almost random distribution of the bits in this code should result in a general diminution of the collision and overflow problems commonly encountered with fixed-length keys. since only four keys are used to represent each name, and the four sets of keys from which these are selected are limited in number and of approximately equal probability, the keys can be used to construct chained indexes, to which, however, the usual constraints still apply. index sequential storage again offers opportunities, in particular since the low variety of key types means that the sorting operations which this entails can be eliminated. in effect, each name entry would be represented by an entry in each of four lists of document numbers or addresses, and documents retrieved by intersection of the lists. while four such numbers are stored for each name, in contrast to a single entry for the more conventional name list, the removal of the name list itself would more than compensate for the additional storage required for the lists. in the index sequential mode, the lists of document addresses or numbers stored with each key are more or less equally long. they may thus be replaced by bit-vectors in which the position of a bit corresponds to a name or document number. if the number of keys bears a simple relation to the number of blocks on a disc cylinder, the vectors can be stored in predetermined positions within a cylinder, resulting in the serial-parallel file. the usefulness of this file organization has yet to be fully evaluated; however, it also promises substantial economies in storage. on average, only four of the bits are set at the positions in the vectors corresponding to the name or document entry. on average, then, the density of 1-bits is very low, and long runs of zeros occur in the vectors. they can, therefore, be compressed using run-length coding, for instance as applied by bradley.3· 4 preliminary work with the 296-key key-set has indicated already that a gross compression ratio of nine to one is attainable, so that the explicit storage requirements to identify the association between a name and a document number would be just over thirty bits. conclusions the work described here relates solely to searches for individual occurrences of personal names. clearly, in operational systems in which one or more author names are associated with a particular bibliographical item, it will be necessary to provide for description of each of these for access. if this is provided solely on the basis of a document number, some false coordination will occur-for instance, when the initials of one entry are variety-generator approach/fokker and lynch 213 combined with the surname of another. a number of strategies can be envisaged to overcome this problem. , the performance figures show clearly that a small number of characteristics-between 100 and 300 in this study-are sufficient to characterize the entries in large files of personal names and to provide a high degree of resolution in searches for them. while performance in much larger files, involving the extension of key-set sizes to larger munbers, has yet to be studied, the logical application of the concept of variety generation would appear to open the way to novel approaches to searches for documents associated with particular personal names, which seem likely to offer advantages in terms of the overall economic performance of search systems, not only in bibliographic but also in more general computer-based information systems. acknowledgments we thank m. d. martin of the institution of electrical engineers for provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south mrica) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references 1. d. w. fokker and m. f. lynch, "application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names," journal of library automation 7:105-18 (june 1974). 2. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 3. s. d. bradley, "optimizing a scheme for run-length encoding," proceedings of the ieee 57:108-9 (1969). 4. m. f. lynch, "compression of bibliographic files using an adaptation of runlength coding," information storage and retrieval 9:207-14 (1973). r /' i, evaluation and comparison of discovery tools: an update f. william chickering and sharon q. yang information technology and libraries | june 2014 5 abstract selection and implementation of a web-scale discovery tool by the rider university libraries (rul) in the 2011–2012 academic year revealed that the endeavor was a complex one. research into the state of adoption of web-scale discovery tools in north america and the evolution of product effectiveness provided a good starting point. in the following study, we evaluated fourteen major discovery tools (three open source and ten proprietary), benchmarking sixteen criteria recognized as the advanced features of a “next generation catalog.” some of the features have been used in previous research on discovery tools. the purpose of the study was to evaluate and compare all the major discovery tools , and the findings serve to update librarians on the latest developments and user interfaces and to assist them in their adoption of a discovery tool. introduction in 2004, the rider university libraries’ (rul) strategic planning process uncovered a need to investigate federated searching as a means to support rese arch. a tool was needed to search and access all journal titles available to rul users at that time, including 12,000+ electronic full-text journals. lacking the ability to provide relevancy ranking due to its real-time search operations, as well as the cost of the products then available, the decision was made to defer implementation of federated search. monitoring developments yearly revealed no improvements strong enough to adopt the approach. by 2011, the number of electronic full-text journals had increased to 51,128, and by this time federated search as a concept had metamorphosed into web -scale discovery. clearly, the time had come to consider implementing this more advanced approach to searching the ever-growing number of journals available to our clients. though rul passed on federated searching, viewing it as too cumbersome to serve our students well, we anticipated the day when improved systems would emerge. vaughn nicely describes the ability of more highly evolved discovery systems to “provide qu ick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content.” 1 yang and hofmann anticipated the emergence of web-scale discovery with their evaluation of next generation catalogs. 2,3 by 2011, informed by yang and hofmann’s research, we believed that the systems in the marketplace were sufficiently evolved to make our efforts at assessing available systems worthwhile. this coincided nicely with an important objective in our strategic plan : f. william chickering (chick@rider.edu) is dean of university libraries, rider university, lawrenceville, new jersey. sharon q. yang (yangs@rider.edu) is associate professor–librarian at moore library, rider university, lawrenceville, new jersey. mailto:chick@rider.edu mailto:yangs@rider.edu evaluation and comparison of discovery tools: an update | chickering and yang 6 investigate link resolvers and discovery tools for federated searching and opac by summer 2011. heeding alexander pope’s advice to “be not the first by whom the new are tried, nor yet the last to lay the old aside,”4 we set about discovering what systems were in use throughout north america and which features each provided. some history in 2006, antelman, lynema, and pace observed that “library catalogs have represented stagnant technology for close to twenty years.” better technology was needed “to leverage the rich metadata trapped in the marc record to enhance collection browsing. the promise of online catalogs has never been realized. for more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them.” 6 dissatisfaction with catalog search tools led us to review the vufind discovery tool. while it had some useful features (spelling, “did you mean?” suggestions), it still suffered from inadequacies in full-text search and the cumbersome nature of searcher-designated boolean searching. it did not work well in searching printed music collections and, of course, only served as a catalog front end. with this all in mind, rul developed a set of objectives to improve information access for clients: • to provide information seekers with • an easy search option for academically valid information materials • an effective search option for academically valid information materials • a reliable search option for academically valid information materials across platforms • to recapture student academic search activity from google • to attempt revitalizing the use of monographic collections • to provide an effective mechanism to support offerings of e -books • to build a firm platform for appropriate library support of distance learning coursework literature review marshall breeding first discussed broad based discovery tools in 2005, shortly after the launch of google scholar. he posits that federated search could not compete with the power and speed of a tool like google scholar. he proclaims the need for, as he describes it, a “centralized search model.”7 information technology and libraries | june 2014 7 building on breeding’s observations four years earlier, diedrichs astutely observe d in 2009 that “user expectations for complete and immediate discovery and delivery of information have been set by their experiences in the web2.0 world. libraries must respond to the needs of those users whose needs can easily be met with google-like discovery tools, as well as those that require deeper access to our resources.”10 in that same year, dolski described the common situation in many academic libraries when in reference to the university of nevada las vegas (unlv) library he states, “our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation.” 11 this perfectly described the way we had come to view rul’s situation. in 2010, breeding reviewed the systems in the market, noting that these are not just nextgeneration catalogs. he stressed “equal access to content in all forms,” a concept we now take for granted. a key virtue in discovery tools, he notes, is the “blending of the full text of journal articles and books alongside citation data, bibliographic, and authority records resulting in a powerful search experience. rather than being provided a limited number of access points selected by catalogers, each word and phrase within the text becomes a possible point of retrieval.” breeding further points out that: “web-scale discovery platforms will blur many of the restrictions and rules that we impose on library users. rather than having to explain to a user that the library catalog lists books and journal titles but not journal articles, users can simply begin with the concept, author, or title of interest and straightaway begin seeing results across the many formats within the library’s collection.”12 working with freshmen at rider university revealed that they are ahead of the professionals in approaching information this way, and we believed that web-scale discovery tools could help our users. as we began the process of selecting a discovery tool, we looked at the experiences of others. fabbi at the university of nevada las vegas (unlv) folded in a strong component of organizational learning in a highly structured manner that was unnecessary at rider. 13 no information was disclosed on the process of selecting a discovery vendor, though the website reveals the presence of a discovery tool (http://library.nevada.edu/). in contrast, many librarians at rider explored a variety of libraries’ application of search tools. following hofmann and yang’s work, a process of vendor demonstrations and analysis of feasibility led to a trial of ebsco discovery service. what we hoped for is what way at grand valley state reported in 2010 of his analysis of serials solutions’ summon: http://library.nevada.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 8 an examination of usage statistics showed a dramatic decrease in the use of traditional abstracting and indexing databases and an equally dramatic increase in the use of full text resources from full text database and online journal collections. the author concludes that the increase in full text use is linked to the implementation of a web‐scale discovery tool.14 method understanding both rul’s objectives and the state of the art as reflected in the literature, we concluded that an up-to-date review of discovery tool adoptions was in order before moving forward in the process of selecting a product. 1. the resulting study included these steps: (1) compiling a list of all the major discovery tools, (2) developing a set of criteria for evaluation, (3) examining between four to seven websites where a discovery tool was deployed and evaluating each tool against each criteria, (4) recording the findings, and (5) analyzing the data. the targeted population for the study included all the major discovery tools in use in the united states. we define a discovery tool as a library user interface independent of any library systems. a discovery tool can be used to replace the opac module of an integrated library system or liv e sideby-side with the opac. other names for discovery tools include stand -alone opac, discovery layer, or discovery user interface. lately, a discovery tool is more often called a discovery service because most are becoming subscription-based and reside remotely in a cloud-based saas (software as a service) model. the authors compiled a list of fourteen discovery tools based on marshall breeding’s “major discovery products” guide published in “library technology guides.”15 those included aquabrowser library, axiell arena, bibliocommons (bibliocore), blacklight, ebsco discovery service, encore, endeca, extensible catalog, sirsidynix enterprise, primo, summon, visualizer, vufind, and worldcat local. two open-source discovery layers, sopac (the social opac) and scriblio, were excluded from this study because very few libraries are using them. for evaluation in this study, academic libraries were preferred over public libraries during the sample selection process. however, some discovery tools , such as bibliocommons, were more popular among public libraries. therefore examples of public library websites were included in the evaluation. the sites that made the final list were chosen either from the vendor’s website that maintained a customer list or breeding’s “library technology guides.”16 the following is the final list of libraries whose implementations were used in the study. information technology and libraries | june 2014 9 example library sites with proprietary discovery tools: aquabrowser (serials solutions) 1. allen county public library at http://smartcat.acpl.lib.in.us/ 2. gallaudet university library at http://discovery.wrlc.org/?skin=ga 3. harvard university at http://lib.harvard.edu/ 4. norwood young america public library at http://aquabrowser.carverlib.org/ 5. selco southeastern libraries cooperating at http://aquabrowser.selco.info/?c_profile=far 6. university of edinburgh (uk) at http://aquabrowser.lib.ed.ac.uk/ axiell arena (axiell) 1. doncaster council libraries (uk) at http://library.doncaster.gov.uk/web/arena 2. lerums bibliotek (lerums library, sweden) at http://bibliotek.lerum.se/web/arena 3. london libraries consortium-royal kingston library (uk) at http://arena.yourlondonlibrary.net/web/kingston 4. norddjurs (denmark) at https://norddjursbib.dk/web/arena/ 5. north east lincolnshire libraries (uk) at http://library.nelincs.gov.uk/web/arena 6. someron kaupunginkirjasto (finland) at http://somero.verkkokirjasto.fi/web/arena 7. syddjurs (denmark) at https://bibliotek.syddjurs.dk/web/arena1 bibliocore (bibliocommons) 1. halton hills public library at http://hhpl.bibliocommons.com/dashboard 2. new york public library at http://nypl.bibliocommons.com/ 3. oakville public library at http://www.opl.on.ca/ 4. princeton public library at http://princetonlibrary.bibliocommons.com/ 5. seattle public library at http://seattle.bibliocommons.com/ 6. west perth (australia) public library at http://wppl.bibliocommons.com/dashboard 7. whatcom county library system at http://wcls.bibliocommons.com/ ebsco discovery service/eds (ebsco) 1. aston university (uk) at http://www1.aston.ac.uk/library/ 2. columbia college chicago library at http://www.lib.colum.edu/ 3. loyalist college at http://www.loyalistlibrary.com/ 4. massey university (new zealand) at http://www.massey.ac.nz/massey/research/library/library_home.cfm 5. rider university at http://www.rider.edu/library 6. santa rosa junior college at http://www.santarosa.edu/library/ 7. st. edward's university at http://library.stedwards.edu/ encore (innovative interfaces) http://smartcat.acpl.lib.in.us/ http://discovery.wrlc.org/?skin=ga http://lib.harvard.edu/ http://aquabrowser.carverlib.org/ http://aquabrowser.selco.info/?c_profile=far http://aquabrowser.lib.ed.ac.uk/ http://library.doncaster.gov.uk/web/arena http://bibliotek.lerum.se/web/arena http://arena.yourlondonlibrary.net/web/kingston https://norddjursbib.dk/web/arena/ http://library.nelincs.gov.uk/web/arena http://somero.verkkokirjasto.fi/web/arena https://bibliotek.syddjurs.dk/web/arena1 http://hhpl.bibliocommons.com/dashboard http://nypl.bibliocommons.com/ http://www.opl.on.ca/ http://princetonlibrary.bibliocommons.com/ http://seattle.bibliocommons.com/ http://wppl.bibliocommons.com/dashboard http://wcls.bibliocommons.com/ http://www1.aston.ac.uk/library/ http://www.lib.colum.edu/ http://www.massey.ac.nz/massey/research/library/library_home.cfm http://www.rider.edu/library http://www.santarosa.edu/library/ http://library.stedwards.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 10 1. adelphi university at http://libraries.adelphi.edu/ 2. athens state university library at http://www.athens.edu/library/ 3. california state university at http://coast.library.csulb.edu/ 4. deakin university (australia) at http://www.deakin.edu.au/library/ 5. indiana state university at http://timon.indstate.edu/iii/encore/home?lang=eng 6. johnson and wales university at http://library.uri.edu/ 7. st. lawrence university at http://www.stlawu.edu/library/ endeca (oracle) 1. john f. kennedy presidential library and museum at http://www.jfklibrary.org/ 2. north caroline state university at http://www.lib.ncsu.edu/endeca/ 3. phoenix public library at http://www.phoenixpubliclibrary.org/ 4. triangle research libraries network at http://search.trln.org/ 5. university of technology, sydney (australia) at http://www.lib.uts.edu.au/ 6. university of north carolina at http://search.lib.unc.edu/ 7. university of ottawa (canada) libraries at http://www.biblio.uottawa.ca/html/index.jsp?lang=en enterprise (sirsidynix) 1. cerritos college at http://cert.ent.sirsi.net/client/cerritos 2. maricopa county community colleges at https://mcccd.ent.sirsi.net/client/default 3. mountain state university/university of charleston at http://msul.ent.sirsi.net/client/default 4. university of mary at http://cdak.ent.sirsi.net/client/uml 5. university of the virgin islands at http://uvi.ent.sirsi.net/client/default 6. western iowa tech community college at http://wiowa2.ent.sirsi.net/client/default primo (ex libris) 1. aberystwyth university (uk) at http://primo.aber.ac.uk/ 2. coventry university (uk) at http://locate.coventry.ac.uk/ 3. curtin university (australia) at http://catalogue.curtin.edu.au/ 4. emory university at http://web.library.emory.edu/ 5. new york university at http://library.nyu.edu/ 6. university of iowa at http://www.lib.uiowa.edu/ 7. vanderbilt university at http://www.library.vanderbilt.edu visualizer (vtls) 1. blinn college at http://www.blinn.edu/library/index.htm 2. edward via virginia college of osteopathic medicine at http://vcom.vtls.com:1177/ 3. george c. marshall foundation at http://gmarshall.vtls.com:6330/ 4. scugog memorial public library at http://www.scugoglibrary.ca/ http://libraries.adelphi.edu/ http://www.athens.edu/library/ http://coast.library.csulb.edu/ http://www.deakin.edu.au/library/ http://timon.indstate.edu/iii/encore/home?lang=eng http://library.uri.edu/ http://www.stlawu.edu/library/ http://www.jfklibrary.org/ http://www.lib.ncsu.edu/endeca/ http://www.phoenixpubliclibrary.org/ http://search.trln.org/ http://www.lib.uts.edu.au/ http://search.lib.unc.edu/ http://www.biblio.uottawa.ca/html/index.jsp?lang=en http://cert.ent.sirsi.net/client/cerritos https://mcccd.ent.sirsi.net/client/default http://msul.ent.sirsi.net/client/default http://cdak.ent.sirsi.net/client/uml http://uvi.ent.sirsi.net/client/default http://wiowa2.ent.sirsi.net/client/default http://primo.aber.ac.uk/primo_library/libweb/action/search.do?dscnt=1&dstmp=1326479965873&vid=aberu_vu1&fromlogin=true http://locate.coventry.ac.uk/primo_library/libweb/action/search.do?dscnt=1&fromlogin=true&dstmp=1326480439550&vid=cov_vu1&fromlogin=true http://catalogue.curtin.edu.au/primo_library/libweb/action/search.do?dscnt=0&dstmp=1326480547980&vid=cur&fromlogin=true http://web.library.emory.edu/ http://library.nyu.edu/ http://www.lib.uiowa.edu/ http://www.library.vanderbilt.edu/ http://www.blinn.edu/library/index.htm http://vcom.vtls.com:1177/ http://gmarshall.vtls.com:6330/ http://www.scugoglibrary.ca/ information technology and libraries | june 2014 11 summon (serials solutions) 1. arizona state university at http://lib.asu.edu/ 2. dartmouth college at http://dartmouth.summon.serialssolutions.com/ 3. duke university at http://library.duke.edu/ 4. florida state university at http://www.lib.fsu.edu/ 5. liberty university at http://www.liberty.edu/index.cfm?pid=178 6. university of sydney at http://www.library.usyd.edu.au/ worldcat local (oclc) 1. boise state university at http://library.boisestate.edu/ 2. bowie state university at http://www.bowiestate.edu/academics/library/ 3. eastern washington university at http://www.ewu.edu/library.xml 4. louisiana state university at http://lsulibraries.worldcat.org/ 5. saint john's university at http://www.csbsju.edu/libraries.htm 6. saint xavier university at http://lib.sxu.edu/home examples of open source and free discovery tools: blacklight (the university of virginia library) 1. columbia university at http://academiccommons.columbia.edu/ 2. johns hopkins university at https://catalyst.library.jhu.edu/ 3. north carolina university at http://historicalstate.lib.ncsu.edu 4. northwestern university at http://findingaids.library.northwestern.edu/ 5. stanford university at http://www-sul.stanford.edu/ 6. university of hull (uk) at http://blacklight.hull.ac.uk/ 7. university of virginia at http://search.lib.virginia.edu/ extensible catalog/xc (extensible catalog organization/carli/university of rochester) 1. demo at http://extensiblecatalog.org/xc/demo 2. extensible catalog library at http://xco-demo.carli.illinois.edu/dtmilestone3 3. kyushu university (japan) at http://catalog.lib.kyushu-u.ac.jp/en 4. spanish general state authority libraries (spain) at http://pcu.bage.es/ 5. thailand cyber university/asia institute of technology (thailand) at http://globe.thaicyberu.go.th/ vufind (villanova university) 1. auburn university at http://www.lib.auburn.edu/ 2. carnegie mellon university libraries at http://search.library.cmu.edu/vufind/search/advanced http://lib.asu.edu/ http://dartmouth.summon.serialssolutions.com/ http://library.duke.edu/ http://www.lib.fsu.edu/ http://www.liberty.edu/index.cfm?pid=178 http://www.library.usyd.edu.au/ http://library.boisestate.edu/ http://www.bowiestate.edu/academics/library/ http://www.ewu.edu/library.xml http://lsulibraries.worldcat.org/search?qt=affiliate_wcl_all&q=&wcsbtn2w.x=14&wcsbtn2w.y=9 http://www.csbsju.edu/libraries.htm http://lib.sxu.edu/home http://academiccommons.columbia.edu/ https://catalyst.library.jhu.edu/ http://historicalstate.lib.ncsu.edu/ http://findingaids.library.northwestern.edu/ http://www-sul.stanford.edu/ http://blacklight.hull.ac.uk/ http://search.lib.virginia.edu/ http://extensiblecatalog.org/xc/demo http://xco-demo.carli.illinois.edu/dtmilestone3 http://catalog.lib.kyushu-u.ac.jp/en http://pcu.bage.es/ http://globe.thaicyberu.go.th/ http://www.lib.auburn.edu/ http://search.library.cmu.edu/vufind/search/advanced evaluation and comparison of discovery tools: an update | chickering and yang 12 3. colorado state university at http://lib.colostate.edu/ 4. saint olaf college at http://www.stolaf.edu/library/index.cfm 5. university of michigan at http://mirlyn.lib.umich.edu 6. western michigan university at https://catalog.library.wmich.edu/vufind/ 7. yale university library at http://yufind.library.yale.edu/yufind/ the following list of criteria was used for the purpose of the evaluation. some were based on those used by the previous studies on discovery tools.17, 18, 19 the list embodied the librarians’ vision for the next-generation catalog and contained some of the most desirable features for a modern opac. the authors were aware of other desirable features for a discovery layer, and the following list was by no means the most comprehensive, but it served the purpose of the study well. 1. one-stop search for all library resources. a discovery tool should include all library resources in its search including the catalog with books and videos, journal articles in databases, and local archives and digital repository. this can be accomplished by the unified index or federated search, an essential component for a discovery tool. some of the discovery tools are described as web-scale because of their potential to search seamlessly across all library resources. 2. state-of-the-art web interface. a discovery tool should have a modern design similar to e-commerce sites, such as google, netflix, and amazon. 3. enriched content. discovery tools should include book cover images, reviews, and user driven input, such as comments, descriptions, ratings, and tag clouds. the enriched content can be either from library patrons, commercial sources, or both. 4. faceted navigation. discovery tools should allow users to narrow down the search results by categories, also called facets. the commonly used facets include locations, publication dates, authors, formats, and more. 5. simple keyword search box with a link to advanced search at the start page. a discovery tool should start with a simple keyword search box that looks like that of google or amazon. a link to the advanced search should be present. 6. simple keyword search box on every page. the simple keyword search box should appear on every page of a discovery tool. 7. relevancy. relevancy results criteria should take into consideration circulation statistics and books with multiple copies. more frequently circulated books indicate popularity and usefulness, and they should be ranked higher on the top of the display. a book of multiple copies may also be an indication of importance. http://lib.colostate.edu/ http://www.stolaf.edu/library/index.cfm http://mirlyn.lib.umich.edu/ https://catalog.library.wmich.edu/vufind/ http://yufind.library.yale.edu/yufind/ information technology and libraries | june 2014 13 8. “did you mean . . . ? spell-checking. when an error appears in the search, the discovery tool should correct the query spelling as a link so that users can simply click on it to get the search results. 9. recommendations/related materials. a discovery tool should recommend resources for readers in a similar manner to amazon or other e -commerce sites, based on transaction logs. this should take the form of “readers who borrowed this item also borrowed the following . . . ” or a link to recommended readings. it would be ideal if a discovery tool can recommend the most popular articles, a service similar to ex libris ’ bx usage-based services. 10. user contribution. user input includes descriptions, summaries, reviews, criticism, comments, rating and ranking, and tagging or folksonomies. 11. rss feeds. a modern opac should provide rss feeds. 12. integration with social networking sites. when a discovery tool is integrated with social networking sites, patrons can share links to library items with their friends on social networks like twitter, facebook, and delicious. 13. persistent links. records in a discovery tool contain a stable url capable of being copied and pasted and serving as a permanent link to that record. they are also called permanent urls. 14. auto-completion/stemming. a discovery tool is equipped with the computational algorithm that it can auto-complete the search words or supply a list of previously used words or phrases for users to choose from. google has stemming algorithms. 15. mobile compatibility. there is a difference between being “mobile compatible” and a “custom mobile website.” the former indicates a website can be viewed or used on a mobile phone, and the later denotes a different version of the user interface specially built for mobile use. in this study we include both as “yes.” 16. functional requirements for bibliographic retrieval (frbr). the latest development of rda certainly makes a discovery tool more desirable if it can display frbr relationships. for instance, a discovery tool may display and link different versions, editions or formats of a work, what frbr refers to as expressions and manifestations. for record keeping and analysis, a microsoft excel file with sixteen fields based on the above criteria was created. the authors checked the discovery tools on the websites of the selected libraries and recorded those features as present or absent. evaluation and comparison of discovery tools: an update | chickering and yang 14 rda compatibility is not used as a criterion in the study because most discovery tools allow users to add rda fields in marc. by now, all the discovery tools should be able to display, index, and search the new rda fields. findings one stop searching for all library resources—this is the most desirable feature when acquiring a discovery tool. unfortunately it also presented the biggest challenge for vendors. both librarians and vendors have been struggling with this issue for the past several years, yet no one has worked out a perfect solution. based on the examples the authors examined, this study found that only five out of fourteen discovery tools can retrieve articles from databases along with books, videos, and digital repositories. those include ebsco discovery service, encore, p rimo, summon, and worldcat local. whereas encore uses an approach similar to federated search performing live searches of databases, the other discovery tools build a single unified index. while the single unified index requires the libraries to send their catalog data and local information to the vendor for update and thus the discovery tools may fall behind in reflecting up to the minute accuracy in local holdings, federated search does real-time searching and does not lag behind in displaying current information. both approaches are limited in what they cover. both need permission from content providers for inclusion in the unified index or to develop a connection to article databases for real-time searching. for those discovery tools that do not have their own unified index or real-time searching capability, they provide web-scale searching through other means. for instance, vufind has developed connectors to application programming interfaces (apis) by serials solutions or oclc to pull search results from summon and worldcat local. encore not only developed its own realtime connection to electronic databases but is enhancing its web -scale search by incorporating the unified index from other discovery tools such as the ebsco discovery service. aquabrowse r is augmented by 360 federated search for the same purpose. despite of those possibilities, the authors did not find the article level retrieval in the sample discovery tools other than the main five mentioned above. comparing the coverage of each tools’ web-scale index can be challenging. ebsco, summon, and worldcat local publicize their content coverage on the web while primo and encore only share this information with their customers. this makes it hard to compare and evaluate the content coverage without contacting vendors and asking for that information. at present, none of the five discovery tools (ebsco discovery service, encore, primo, summon, and worldcat local) can boast 100% coverage of all library resources. in fact, none of the internet search engines, including google or google scholar, can retrieve 100% of all resources. therefore web -scale searching is more a goal than a possibility. apart from political and economic reasons, this is in part due to the nonbibliographic structure of the contents in databases such as scifinder and some others. one stop searching is still a work in progress because discovery tools provide students with a quick and simple way to retrieve a large number, but still an incomplete list of resources held by a information technology and libraries | june 2014 15 library. for more in-depth research, students are still encouraged to search the catalog, discipline specific databases, and digital repositories separately. state of the art interface—all the discovery tools are very similar in appearance to amazon.com. some are better than others. this study did not rate each discovery tool based on a scale and thus did not distinguish their fine degrees in appearance. rather each discovery tool is given a “yes” or “no.” the designation was based on subjective judgment. all the discovery tools received “yes” because they are very similar in appearance. enriched content—all the discovery tools have embedded book cover images or video jacket images, but some have displayed more, such as ratings and rankings, user -supplied or commercially available reviews, overviews, previews, comments, descriptions, title discussion, excerpts, or age suitability, just to name a few. a discovery tool may display enriched content by default out of box, but some may need to be customized to include it. the following is a list of enriched content implemented in each discovery tool that the authors found in the sample. the number in the last column indicates how many types of enriched content were found in the discovery tool at the time of the study. bibliocommons and aquabrowser stand out from the rest and made the top two on the list based on the number of enriched content from noncataloging sources (see figure 1). it is debatable how much nontraditional data a discovery tool should incorporate into its display. it warrants another discussion as to how useful such data is for users. faceted navigation—faceted navigation has become a standard feature in discovery tools over the last two years. it allows users to further divide search results into subsets based on predetermined terms. facets come from a variety of fields in marc records. some discovery tools have more facets than others. the most commonly seen facets include location or collections, publication dates, formats, author, genre, and subjects. faceted navigation is highly configurable as many discovery tools allow libraries to decide on their own facets. faceted navigation has become an integral part of a discovery tool. simple keyword search box on the starting page with a link to advanced search—the original idea is to allow a library’s user interface to resemble google by displaying a simple keyword search box with a link to advanced search at the starting page. most discovery tools provide the flexibility for libraries to choose or reject this option. however, many librarians find this approach unacceptable as they feel it lacks precision in searching and thus may mislead users. as the keyword box is highly configurable and up to the library to decide how they will present it, many libraries have added a pull down menu with options to search keywords, authors, titles, and locations. in doing so, the original intention for a google like simple search box is lost. therefore only a few libraries follow the goo gle-like box style at the starting page. most libraries altered the simple keyword search box on the starting page to include a dropdown menu or radio buttons, so the simple keyword search box is neither simple nor limited to keyword search only. nevertheless, this study gave all the discovery tools a “yes.” all the systems are capable of this feature even though libraries may choose not to use it. evaluation and comparison of discovery tools: an update | chickering and yang 16 rank discovery tool enriched content total 1 bibliocommons cover images, tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and rating 11 2 aquabrowser cover images, previews, reviews, summary, excerpts, tags, author notes & sketches, full text from google, rating/ranking 9 3 enterprise cover images, reviews, google previews, summary, excepts 5 4 axiell arena cover images, tags, reviews, and title discussion 4 vufind cover images, tags, reviews, comments 4 5 primo cover images, tags, previews 3 worldcat local cover images, tags, reviews 3 6 encore cover images, tags 2 visualizer cover images, reviews 2 summon cover images, reviews 2 7 blacklight cover images 1 ebsco discovery service cover images 1 endeca cover images 1 extensible catalog cover images 1 figure 1. the ranked list of enriched content in discovery tools . simple keyword search box on every page—the feature enables a user to start a new search at every step of navigation in the discovery tool. most of the discovery tools provide such a box on the top of the screen as users navigate through the search results and record displays except extensible catalog and enterprise by sirsidynix. the feature is missing from the former while the latter almost has this feature except when displaying bib records in a pop-up box. information technology and libraries | june 2014 17 relevancy—traditionally, relevancy is uniformly based on a computer algorithm that calculates the frequency and relative position of a keyword (field weighting) in a record and displays the search results based on the final score. other factors have never been a part of the decision in the display of search results. in the discussion on next-generation catalogs, relevancy based on circulation statistics and other factors came up as a desirable possibility, and no discovery tool has met this challenge until now. primo by ex libris is the only one among the discovery tools under investigation that can sort the final results by popularity. “primo’s popularity ranking is calculated by use. this means that the more an item record has been clicked and viewed, the more popular it is.”20 even though those are not real circulation statistics, this is considered to be a revolutionary step and a departure from traditional relevancy. three years ago none of the discovery tools provided this option.21 to make relevancy ranking even more sophisticated, scholarrank, another service by ex libris, can work with primo to sort the search results not only based on a query match but also an item’s value score (its usage and number of citations) and a user’s characteristics and information needs. this shows the possibility of more advanced relevancy ranking in discovery tools. other vendors will most likely follow in the future incorporating more sophistication in their relevancy algorithms. spell checker/“did you mean . . . ?”—the most commonly observed way of correcting a misspelling in a query is, “did you mean . . . ?” but there are other variations providing the same or similar services. some of those variations are very user-friendly. the following is a list of different responses when a user enters misspelled words (see figure 2). “xxx” represents the keyword being searched. evaluation and comparison of discovery tools: an update | chickering and yang 18 discovery tools responses for misspelled search words notes acquabrowser did you mean to search: xxx, xxx, xxx? the suggested words are hyperlinks to execute new searches. axiell arena your original search for xxx has returned no hits. the fuzzy search returned n hits. automatically displays a list of hits based on fuzzy logic. “n” is a number. bibliocommons did you mean xxx (n results)? displays suggested word along with the number of results as a link. blacklight no records found. no spell checker, but possible to add by local technical team. ebsco discovery service results may also be available for xxx. the suggested word is a link to execute a new search. encore did you mean xxx? the suggested word is a link to execute a new search. endeca did you mean xxx? the suggested word is a link to execute a new search. enterprise did you mean xxx? the suggested word is a link to execute a new search. extensible catalog sorry, no results found for: xxx. no spell checker, but possible to add by local technical team. primo did you mean xxx? the suggested word is a link to execute a new search. summon did you mean xxx? the suggested word is a link to execute a new search. visualizer did you mean xxx? the suggested word is a link to execute a new search. vufind 1. no results found in this category. search alternative words: xxx, xxx, xxx. 2. perhaps you should try some spelling variation: xxx, xxx, xxx. 3. your search xxx did not match any resources. what should i do now? a list of suggestions including checking a web dictionary. 1. alternative words are links to execute new searches. 2. suggested words are links to execute new searches. 3. suggestions what to do next. worldcat local did you mean xxx? the suggested word is a link to execute a new search. figure 2. spell checker. most of the discovery tools on the list provide this feature except blacklight and extensible catalog. open-source solutions sometimes provide a framework that you add features to. this leaves many information technology and libraries | june 2014 19 possibilities for local developers to add and develop. for instance, a diction ary or spell checker may be easily installed even if a discovery tool does not come with one out of the box. this feature may be configurable. 9. recommendation—amazon has one of those search engines with a recommendation system such as “customers who bought item a also bought item b.” the ecommerce recommendation algorithms analyze the activities of shoppers on the web and build a database of buyer profiles. the recommendations are made based on shopper behavior. when this applies to the library content, it could become “readers who were interested in item a were also interested in item b .” however, most discovery tools do not have a recommendation system. instead, they have adopted different approaches. most discovery tools make recommendations from bibliographic data in marc records such as subject headings for similar items. primo is one of the few discovery tools with a recommendation system similar to those used by amazon and other internet commercial sites. its bx article recommender service is based on usage patterns collected from its link resolver, sfx. developed by ex libris, bx is an independent service that integrates with primo well, but can serve as an add-on function for other discovery tools. bx is an excellent example that discovery tools can suggest new leads and directions for scholars in their research. the authors counted all the discovery tools that provide some kind of recommendations regardless of their technological approaches using marc data or algorithms. ten out of fourte en discovery tools provide this feature in various forms (see figure 3). those include axiell arena, bibliocommons, ebsco discovery service, encore, endeca, extensible catalog, primo, summon, worldcat local, and vufind. the following are some of the recommendations found in those discovery tools. the authors did not find any recommendation in the libraries that use aquabrowser, enterpri se, visualizer, or blacklight. discovery tools language used for recommending or linking to related items axiell arena “see book recommendations on this topic” “who else writes like this?” bibliocommons “similar titles & subject headings & lists that include this title” ebsco discovery service “find similar results” encore “other searches you may try” “additional suggestions” endeca “recommended titles for. . . . view all recommended titles that match your search” evaluation and comparison of discovery tools: an update | chickering and yang 20 “more like this” extensive catalog “more like this” “searches related to . . . ” primo “suggested new searches by this author” “suggested new searches by this subject” “users interested in this article also expressed an interest in the following:” summon “search related to . . . ” worldcat local “more like this” “similar items” “related subjects” “user lists with this item” vufind “more like this” “similar items” “suggested topics” “related subjects” figure 3. language used for recommendation. some discovery tool recommendations are designed in a more user friendly manner than others. most recommendations exist exclusively for items. ideally, a discovery tool should provide an article recommendation system like ex libris’ bx usage-based service that will show users the most frequently used and most popular articles. at the time of this evaluation, no discovery tool has incorporated an article recommendation system except primo. research is needed to evaluate how patrons utilize recommendation services or if they find recommendations beneficial in discovery tools. user contribution—traditionally, bibliographic data has been safely guarded by cataloging librarians for quality control. it has been unthinkable that users would be allowed to add data to library records. the internet has brought new perspectives on this issue. half of the discovery tools (7) under evaluation provide this feature to varying degrees (see figure 4). designed primarily for public libraries, bibliocommons seems the most open to user -supplied data among information technology and libraries | june 2014 21 all the discovery tools. many other discovery tools (7) allow users to contribute tags and reviews. all the discovery tools allow librarians to censor user -supplied data before releasing it for public display. the following figure is a summary of the types of data these discove ry tools allow users to enter. ranking discovery tool user contribution 1 bibliocommons tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and ratings (10) 2 aquabrowser tags, reviews, and ratings/rankings (3) axiell arena tags, reviews, and title discussions (3) vufind tags, reviews, comments (3) 3 primo tags and reviews (2) worldcat local tags and reviews (2) 4 encore tags (1) 5 blacklight (0) endeca (0) enterprise (0) extensible catalog (0) summon (0) visualizer (0) figure 4. discovery tools based on user contribution. past research indicates that folksono mies or tags are highly useful.22 they complement librarycontrolled vocabularies, such as library of congress subject headings, and increase access to library collections. a few discovery tools allow user entered tags to form “word clouds.” the relative importance of tags in a word cloud is emphasized by font color and size. a tag list is another way to organize and display tags. in both cases , tags are hyperlinked to a relevant list of items. some tags serve as keywords to start new searches, while others narrow search results. only four discovery tools, aquabrowser, encore, primo, and worldcat local, provide both tag clouds and lists. bibliocommons provides only tag lists for the same purpose. the rest of the discovery tools do not have either. one setback of user-supplied tags for subject access is their evaluation and comparison of discovery tools: an update | chickering and yang 22 incomplete nature. they may lead users to partial retrieval of information as users add tags only to items that they have used. the coverage is not systematic and inclusive of all collections. therefore data supplied by users in discovery tools remains controversial. it is possible to seed systems with folksonomies using services like librarything for libraries, which could reduce the impact of this issue. rss feed/email alerts—this feature can automatically send a list of new library resources to users based on his or her search criteria. it can be useful for experienced researchers or frequent library users. some discovery tools may use email alerts as well. eight out of fourteen discovery tools in this evaluation provide rss feeds. those with rss feeds include aquabrowser, axiell arena, ebsco discovery service, endeca, enterprise, primo, summon, and vufind. an rss feed can be added as a plug-in in some discovery tools if it does not come as part of the base system. integration with social networking sites—as most of the college students participate in social networking sites, this feature provides an easy way to share resources among college s tudents on social networking sites. users can place the link to a resource by clicking on an icon in the discovery tool and share the resource with friends on facebook, twitter, delicious and many other social network sites. nine out of the fourteen discovery tools provide this feature. some discovery tools provide integration possibilities with many more social networking sites than others. those with this feature include aquabrowser, axiell arena, bibliocommons, ebsco discovery service, encore, endeca, primo, worldcat local, and extensible catalog. so far , the interaction between discovery tools and social networking sites is limited to sharing resources. social networking sites should be carefully evaluated for the possibility of integra ting some of their popular features into discovery tools. persistent link—this is also called permanent link or permurl. not all the links displayed in a browser location box are persistent links, therefore some discovery tools specifically provid e a link in the records for users to copy and keep. five out of fourteen discovery tools explicitly listed this link in records. those include aquabrowser, axiell arena, blacklight, ebsco discovery service, and worldcat local. the authors marked a system a s “no” when a permanent link is not prominently displayed in a discovery tool. in other words, only those discovery tools that explicitly provide a persistent link are counted as “yes.” however, the url in a browser’s location box during the display of a record may serve as a persistent link in some cases. for instance, vufind does not provide a permanent url in the record, but indicates on the project site that url in the location box is a persistent link. auto-completion/stemming—when a user types in keywords in the search box, the discovery tool will supply a list of words or phrases that she or he can choose readily. this is a highly useful feature that google excels at. stemming not only automatically completes the spelling of a keyword, but also supplies a list of phrases that point to existing items. the authors found this feature in six out of fourteen discovery tools. they include axiell arena, endeca, enterprise, extensible catalog, summon, and worldcat local. information technology and libraries | june 2014 23 mobile interface—the terms “mobile compatible” and “mobile interface” are two different concepts. a mobile interface is a simplified version of a normal browser version of a discovery tool interface so it is optimized for use on mobile phones , and the authors only counted those discovery tools that have a separate mobile interface. a discovery tool may be mobile friendly or compatible and does not necessarily need a separate mobile interface. many discovery tools, such as ebsco, can detect the request from a mobile phone and automatically direct the request to the mobile interface. eleven out of fourteen claim to provide a separate mobile interface. blacklight, enterprise, and extensible catalog do not seem to have a separate mobile interface even though they may be mobile friendly. frbr—frbr groupings denote the relationships between work, manifestation, expression, and items. for instance, a search will not only retrieve a title, but different editions and formats of the work. only three discovery tools can display frbr relationships: extensible catalog (open source), primo by ex libris, and worldcat local by oclc. so far , most discovery tools are not capable of displaying the manifestations and expressions of a work in a meaningful way. from the user’s point of view, this feature is highly desirable. figure 5 is a screenshot from primo demonstrating displays indicating a large number of different adaptations of the work “romeo and juliet.” figure 6 displays the same intellectual work in different manif estations such as dvd, vhs, books, and more. figure 5. display of frbr relationships in primo . evaluation and comparison of discovery tools: an update | chickering and yang 24 figure 6. different versions of the same work in primo . summary the following are the summary tables of our comparison and evaluation. proprietary and open source programs are listed separately in these tables. the total number of features the authors found in a particular discovery tool is displayed at the end of the column. proprietary discovery tools seem to have more advanced characteristics of a modern d iscovery tool than the opensource counterparts. the open-source program blacklight displays fewer advanced features, but seems flexible for users to add features. see figures 7, 8, and 9. information technology and libraries | june 2014 25 figure 7. proprietary discovery tools. aquabrower axiell arena bibliocommons ebsco/ eds encore endeca 1. single point of search no no no yes yes no 2. state of the art interface yes yes yes yes yes yes 3. enriched content yes yes yes yes yes yes 4. faced navigation yes yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes yes 6. simple keyword search box on every page yes yes yes yes yes yes 7. relevancy no no no no no no 8. spell checker/ “did you mean . . . ?” yes yes yes yes yes yes 9. recommendation no yes yes yes yes yes 10. user contribution yes yes yes no yes no 11. rss yes yes no yes no yes 12. integration with social network sites yes yes yes yes yes yes 13. persistent links yes yes no yes no no 14. stemming/autocomplete no yes no no no yes 15. mobile interface yes yes yes yes yes yes 16. frbr no no no no no no total 11/16 13/16 10/16 12/16 11/16 11/16 evaluation and comparison of discovery tools: an update | chickering and yang 26 enterprise primo summon visualizer worldcat local 1. single point of search no yes yes no yes 2. state of the art interface yes yes yes yes yes 3. enriched content yes yes yes yes yes 4. faced navigation yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes 6. simple keyword search box on every page no yes yes yes yes 7. relevancy no yes no no no 8. spell checker/ did you mean...? yes yes yes yes yes 9. recommendation no yes yes no yes 10. user contribution no yes no no yes 11. rss yes yes yes no no 12. integration with social network sites no yes no no yes 13. persistent links no no no no yes 14. stemming/autocomplete yes no yes no yes 15. mobile interface no yes yes yes yes 16. frbr no yes no no yes total 7/16 14/16 11/16 7/16 14/16 figure 8. proprietary discovery tools (continued). blacklight extensible catalog vufind 1. one point of search no no no 2. state of the art interface yes yes yes 3. enriched content yes yes yes 4. faceted navigation yes yes yes 5. simple keyword search box on the starting page yes yes yes 6. simple keyword search box on every page yes yes yes 7. relevancy no no no 8. spell checker/did you mean ...? no no yes 9. recommendation no yes yes 10. user contribution no no yes 11. rss no no yes 12. integration with social network sites no yes no information technology and libraries | june 2014 27 13. persistent links yes no no 14. stemming/auto-complete no yes no 15. mobile interface no no yes 16. frbr no yes no total 6/16 9/16 10/16 figure 9. free and open-source discovery tools. as one-stop searching is the core of a discovery tool, this consideration placed five discovery tools above the rest: encore, ebsco discovery service, primo, summon, and worldcat local ( see figure 10). these five are web-scale discovery services. all of them use their native unified index except encore, which has incorporated the ebsco unified index in its search. despite of great progress made in the past three years in one-stop searching, none of the discovery to ols can truly search across all library resources—all of them have some limitations as to the coverage of content. each unified index may cover different databases as well as overlap each other in many areas. one possible solution may lie in a hybrid approach that combines a unified index with federated search (also called real-time discovery). those old and new technologies may work well when complementing each other. it remains a challenge if libraries will ever have one-stop searching in its true sense. discovery tools one-stop searching encore yes ebsco discovery service yes primo yes summon yes worldcat local yes figure 10. the discovery tools capable of one stop searching . it is also worth mentioning that one-stop searching is a vital and central piece of discovery tools. those discovery tools without a native unified index or connectors to databases for real -time searching are at a disadvantage. therefore discovery tools that do not provide web -scale searching are investigating various possibilities to incorporate one-stop searching. some are drawing on the unified indexes of those discovery tools that have them through connectors to the application programming interfaces (apis) of those products. for instance, vufind in cludes connectors to the apis of a few other systems that have a unified index or vast resources such as summon and worldcat. blacklight may provide one-stop searching through the primo api. such a practice may present other problems such as calculating relevancy ranking across resources that may not live in the same centralized index, thus not achieving fully balanced relevancy ranking. nevertheless, discovery tool developers are working hard to achieve one-stop searching. as a unified index can be shared across discovery tools, in the next few years, more and more discovery services may offer one-stop searching. evaluation and comparison of discovery tools: an update | chickering and yang 28 based on the count of the sixteen criteria in the checklist, we ranked primo and worldcat local as the top two discovery tools. based on our criteria , primo has two unique features that make it stand out: relevancy enhanced by usage statistics and value score and the frbr relationship display. worldcat local and extensible catalog are the other two discov ery tools that can display frbr relationships (see figure 11). rank discovery tools number of advanced features 1 primo and worldcat local 14/16 2 axiell arena 13/16 3 ebsco discovery service 12/16 4 aquabrowser, encore, and endeca 11/16 5 bibliocommons, summon, and vufind 10/16 6 extensible catalog 9/16 6 enterprise and visualizer 7/16 7 blacklight 6/16 figure 11. ranked discovery tools. limitations as discovery tools are going through new releases and improvements, what is true today may b e false tomorrow. discovery tools constantly improve and evolve , and many features are not included in this evaluation, such as integration with google maps for the location of an item and user-driven acquisitions. innovations are added to discovery tools constantly. this study only covers the most common features that the library community agreed upon as those that a discovery tool should have. some open-source discovery tools may provide a skeleton of an application that leaves the code open for users to develop new features. therefore different implementations of an open-source discovery tool may encompass totally different features that are not part of the core application. for instance, the university of virginia developed virgo based on blacklight, adding many advanced features. thus it is quite a challenge to distinguish what comes with the software and what are local developments. this study focused on the user interface of discovery tools. what are not included are content coverage, application administration, and searching capability of the discovery tools. those three are important factors when choosing a discovery tool. conclusion search technology has evolved far beyond federated searching. the concept of a “next generation catalog” has merged with this idea, and spawned a generation of discovery tools bringing almost google-like power to library searching. the problems facing libraries now are the intelligent information technology and libraries | june 2014 29 selection of a tool that fits their contexts, and structuring a process to adopt a nd refine that tool to meet the objectives of the library. our findings indicate that primo and worldcat local have better user interfaces, displaying more advanced features of a next generation catalog than their peers. for rul, ebsco discovery service (eds) provides something approaching the ease of google searching from either a single search box or a very powerful advanced search. being aware of the limitations noted above, rider’s libraries elected to continue displaying traditional search options in addition to what we’ve branded “library one search.” another issue we discovered in this process is when negotiating for a vendor-hosted test, libraries must be sure that the test period begins when the configuration is complete rather than only whe n the data load begins. all phases of the project took far more time than anticipated. the client institution’s implementation coordinator or team needs to be reviewing the progress on a daily basis and communicating often with the vendor-based implementation team. with the evaluative framework this study provides, libraries moving toward discovery tools should consider changing capabilities of the available discovery tools to make informed choices. references 1. jason vaughan, “investigations into library web-scale discovery services,” information technology & libraries 31, no. 1 (2012): 32–33, http://dx.doi.org/10.6017/ital.v31i1.1916. 2. sharon q. yang and melissa a. hofmann, “next generation or current generation? a study of the opacs of 260 academic libraries in the usa and canada,” library hi tech 29 no. 2 (2011): 266–300. 3. melissa a. hofmann and sharon q. yang, “‘discovering’ what’s changed: a revisit of the opacs of 260 academic libraries,” library hi tech 30, no. 2 (2012): 253–74. 4. alexander pope, “alexander pope quotes,” http://www.brainyquote.com/quotes/authors/a/alexander_pope.html. 5. f. william chickering, “linking information technologies: benefits and challenges,” proceedings of the 4th international conference on new information technologies, budapest, hungary, december 1991, http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3, (2006): 128-39, http://dx.doi.org/10.6017/ital.v25i3.3342. 7. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27–29. http://www.brainyquote.com/quotes/authors/a/alexander_pope.html http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm evaluation and comparison of discovery tools: an update | chickering and yang 30 8. judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (2009): 161–63, http://dx.doi.org/10:6017/ital.v28i4.3326. 9. priscilla caplan, “on discovery tools, opacs and the motion of library language,” library hi tech 30, no. 1 (2012): 108–15. 10. carol pitts diedrichs, “discovery and delivery: making it work for users,” serials librarian 56, no. 1–4 (2009): 79, http://dx.doi.org/10.1080/03615260802679127. 11. alex a. dolski, “information discovery insights gained from multipac, a prototype library discovery system,” information technology & libraries 28, no. 4, (2009): 173, http://dx.doi.org/10.6017/ital.v28i4.3328. 12. marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34. 13. jennifer l. fabbi, “focus as impetus for organizational learning,” information technology & libraries 28, no. 4 (2009): 164–71, http://dx.doi.org/10.6017/ital.v28i4.3327. 14. douglas way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4: (2010): 214–20, http://dx.doi.org/10.1016/j.serrev.2010.07.002. 15. marshall breeding, “library technology guides: discovery products,” http://www.librarytechnology.org/discovery.pl. 16. ibid. 17. sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 18. yang and hofmann, “next generation or current generation? ” 266–300. 19. melissa a. hofmann and sharon q. yang, “how next-gen r u? a review of academic opacs in the united states and canada,” computers in libraries 31, no. 6 (2010): 26–29. 20. brown library of virginia western community college, “primo-frequently asked questions,” http://www.virginiawestern.edu/library/primo -faq.php#popularity_ranking. 21. yang and wagner, “evaluating and comparing discovery tools,” 690–709. 22. yanyi lee and sharon q. yang, “folksonomies as subject access—a survey of tagging in library online catalogs and discovery layers,” paper presented at ifla post-conference “beyond libraries-subject metadata in the digital environment and semantic web ,” tallinn, estoniai, 18 august 2012, http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx http://athena.rider.edu:2054/eds/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie42eik6tmvsk6k63nn5kx94um%2bsa2otkewpq9lnqe4sk%2bws0yexss%2b8ujfhvhx4yzn5eyb4rorsbguteq1r7u%2b6tfsf7vb7d7i2lt94unjho6c8nnls79mpnfsvdgmrlg2rbdjsaeusk6mtlcwnosh8opfjlvc84tq6uoq8gaa&hid=20 http://www.librarytechnology.org/discovery.pl http://www.virginiawestern.edu/library/primo-faq.php#popularity_ranking http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listings . historically, this information must be kept indefinitely on an archival basis . the machine-readabl e codes themselves are grouped together and maintained from the library's budget . finally , a reference desk is staffed to answer questions from the entire user community. documentation for programs is strictly controlled . code changes are arranged chronologically to provide only the most current release of a program to all users. historical information is kept and is crucial to satisfy the demands of auditors (such as the nuclear regulatory commission). additionally, the names of people administratively connected with the program are recorded and their responsibilities communications 53 defined (valuable in situations of liability for work complete d yesteryear). the backbone of the operation is a standards manual that spells out and discusses the file requirements, documentation specifications, and control forms. this standard is made readily available throughout bechtel. in addition, there are in-house education classes about the same document. indeed, the central data processing library is the repository of computer information at bechtel. the centralization and control of computer programs eliminates the chaos that can occur if too many individuals maintain and use the same computer program . 110 information technology and libraries | september 2009 employing virtualization in library computing: use cases and lessons learned arwen hutt, michael stuart, daniel suchy, and bradley d. westbrook this paper provides a broad overview of virtualization technology and describes several examples of its use at the university of california, san diego libraries. libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. this paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure. o perating system virtualization, herein referred to simply as “virtualization,” is a powerful and highly adaptable solution to several library technology challenges, such as managing computer labs, automating cataloging and other procedures, and demonstrating new library services. virtualization has been used in one manner or another for decades,1 but it is only within the last few years that this technology has made significant inroads into library environments. virtualization technology is not without its drawbacks, however. libraries need to assess their needs, as well as the resources required for virtualization, before embarking on large-scale implementations. this paper provides a broad overview of virtualization technology and explains its benefits and drawbacks by describing some of the ways virtualization has been used at the university of california, san diego (ucsd) libraries.2 n virtualization overview virtualization is used to partition the physical resources (processor, hard drive, network card, etc.) of one computer to run one or more instances of concurrent, but not necessarily identical, operating systems (oss). traditionally only one instance of an operating system, such as microsoft windows, can be used at any one time. when an operating system is virtualized—creating a virtual machine (vm)—the vm communicates through virtualization middleware to the hardware or host operating system. this middleware also provides a consistent set of virtual hardware drivers that are transparent to the enduser and to the physical hardware. this allows the virtual machine to be used in a variety of heterogeneous environments without the need to reconfigure or install new drivers. with the majority of hardware and compatibility requirements resolved, the computer becomes simply a physical presentation medium for a vm. n two approaches to virtualization: host-based vs. hypervisor virtualization can be implemented using type 1 or type 2 hypervisor architectures. a type 1 hypervisor (figure 1), commonly referred to as “host-based virtualization,” requires an os such as microsoft windows xp to host a “guest” operating system like linux or even another version of windows. in this configuration, the host os treats the vm like any other application. host-based virtualization products are often intended to be used by a single user on workstation-class hardware. in the type 2 hypervisor architecture (figure 2), commonly referred to as “hypervisor-based virtualization,” the virtualization middleware interacts with the computer’s physical resources without the need of a host operating system. such systems are usually intended for use by multiple users with the vms accessed over the network. realizing the full benefits of this approach requires a considerable resource commitment for both enterprise-class server hardware and information technology (it) staff. n use cases archivists’ toolkit the archivists’ toolkit (at) project is a collaboration of the ucsd libraries, the new york university libraries, and the five colleges libraries (amherst college, hampshire college, mt. holyoke college, smith college, and university of massechusetts, amherst) and is funded by the andrew w. mellon foundation. the at is an open-source archival data management system that provides broad, integrated support for the management of archives. it consists of a java client that connects to a relational database back-end (mysql, mssql, or oracle). the database can be implemented on a networked server or a single workstation. since its initial release in december 2006, the at has sparked a great deal of interest and rapid uptake of the application within the archival community. this growing interest has, in turn, created an increased demand for demonstrations of the product, workshops and training, and simpler methods for distributing the application. (of the use cases described here, the two for the at arwen hutt (ahutt@ucsd.edu) is metadata specialist, michael stuart (mstuart@ucsd.edu) is information technology analyst, daniel suchy (dsuchy@ucsd.edu) is public services technology analyst, and bradley d. westbrook (bradw@library.ucsd.edu) is metadata librarian and digital archivist, university of california, san diego libraries. employing virtualization in library computing | hutt et al. 111 distribution and laptop classroom are exploratory, whereas the rest are in production.) at workshops the society of american archivists sponsors a two-day at workshop occurring on multiple dates at several locations. in addition, the at team provides oneand two-day workshops to different institutional audiences. at workshops are designed to give participants a hands-on experience using the at application. accomplishing this effectively requires, at the minimum, supplying all participants with identical but separate databases so that participants can complete the same learning exercises simultaneously and independently without concern for working in each other’s space. in addition, an ideal configuration would reduce the workload of the instructors, freeing them from having to set up the at instructional database onsite for each workshop. for these workshops we needed to do the following: n provide identical but separate databases and database content for all workshop attendees n create an easily reproducible installation and setup for workshops by preparing and populating the at instructional database in advance virtualization allows the at workshop instructors to predefine the workstation configuration, including the installation and population of the at databases, prior to arriving at the workshop site. to accomplish this we developed a workshop vm configuration with mysql and the at client installed within a linux ubuntu os. the workshop instructors then built the at vm with the data they require for the workshop. the at client and database are loaded on a dvd or flash drive and shipped to the classroom managers at the workshop sites, who then need only to install a copy of the vm and the freely available vmplayer software (necessary to launch the at vm) onto each workstation in the classroom. the at vm, once built, can be used many times both for multiple workstations in a classroom as well as for multiple workshops at different times and locations. this implementation has worked very well, saving both time and effort for the instructors and classroom support staff by reducing the time and communication figure 1. a type 1 hypervisor (host-based) implementation figure 2. a type 2 hypervisor-based implementation 112 information technology and libraries | september 2009 necessary for deploying and reconfiguring the vm. it also reduces the chances that there will be an unexpected conflict between the application and the host workstation’s configuration. but the method is not perfect. more than anything else, licensing costs motivated us to choose linux as the operating system instead of a proprietary os such as windows. this reduces the cost of using the vm, but it also requires workshop participants to use an os with which they are often unfamiliar. for some participants, unfamiliarity with linux can make the workshop more difficult than it would be if a more ubiquitous os was used. at demonstrations in a similar vein, members of the at team are often called upon to demonstrate the application at various professional conferences and other venues. these demonstrations require the setup and population of a demonstration database with content for illustrating all of the application’s functions. one of the constraints posed by the demonstration scenario is the importance of using a local database instance rather than a networked instance, since network connections can be unreliable or outright unavailable (network connectivity being an issue we’ve all faced at conferences). another constraint is that portions of the demonstrations need some level of preparation (for example, knowing what search terms will return a nonempty result set), which must be customized for the unique content of a database. a final constraint is that, because portions of the demonstration (import and data merging) alter the state of the database, changes to the database must be easily reversible, or else new examples must be created before the database can be reused. building on our experience of using virtualization to implement multiple copies of an at installation, we evaluated the possibility of using the same technology for simplifying the setup necessary for demonstrating the at. as with the workshops, the use of a vm for at demonstrations allows for easy distribution of a prepopulated database, which can be used by multiple team members at disparate geographic locations and on different host oss. this significantly reduces the cost of creating (and recreating) demonstration databases. in addition, demonstration scripts can be shared between team members, creating additional time savings as well as facilitating team participation in the development and refinement of the demonstration. perhaps most important is the ability to roll back the vm to a specific state or snapshot of the database. this means the database can be quickly returned to its original state after being altered during a demonstration. overall, despite our initial anxiety about depending on the vm for presentations to large audiences, this solution has proven very useful, reliable, and cost-effective. at distribution implementing the at requires installing both the toolkit client and a database application such as mysql, instantiating an at database, and establishing the connection between database and client. for many potential customers of the at, the requirements for database creation and management can be a significant barrier due to inexperience with how such processes work and a lack of readily available it resources. many of these customers simply desire a plug-and-play version of the application that they can install and use without requiring technical assistance. it is possible to satisfy this need for a plug-and-play at by constructing a vm containing a fully installed and ready-to-use at application and database instance. this significantly reduces the number and difficulty of steps involved in setting up a functional at instance. the customer would only need to transfer the vm from a dvd or other source to their computer, download and install the vm reader, and then launch the at vm. they would then be able to begin using the at immediately. this removes the need for the user to perform database creation and management; arguably the most technically challenging portion of the setup process. users would still have the option of configuring the application (default values, lookup lists, etc.) in accord with the practices of their repository. batch processing catalog records the rapid growth of electronic resources is significantly changing the nature of library cataloging. not only are types of library materials changing and multiplying, the amount of e-resources being acquired increases each year. electronic book and music packages often contain tens of thousands of items, each requiring some level of cataloging. because of these challenges, staff are increasingly cataloging resources with specialized programs, scripts, and macros that allow for semiautomated record creation and editing. such tools make it possible to work on large sets of resources—work that would not be financially possible to perform manually item by item. however, the specialized configuration of the workstation required for using these automated procedures makes it very difficult to use the workstation for other purposes at the same time. in fact, user interaction with the workstation while the process is running can cause a job to terminate prior to completion. in either scenario, productivity is compromised. virtualization offers an excellent remedy to this problem. a virtual machine configured for semiautomated batch processing allows for unused resources on the workstation to process the batch requests in an isolated environment while, at the same time and on the same machine, the user is able to work on other tasks. in cases employing virtualization in library computing | hutt et al. 113 where the user’s machine is not an ideal candidate for virtualization, the vm can be hosted via a hypervisorbased solution, and the user can access the vm with familiar remote access tools such as remote desktop in windows xp. secure sandbox in addition to challenges posed by increasingly large quantities of acquisitions, the ucsd libraries is also encountering an increasing variety of library material types. most notable is the variety and uniqueness of digital media acquired by the library, such as specialized programs to process and view research data sets, new media formats and viewers, and application installers. cataloging some of these materials requires that media be loaded and that applications be installed and run to inspect and validate content. but running or opening these materials, which are sometimes from unknown sources, poses a security risk to both the user’s workstation and to the larger pool of library resources accessible via the network. many installers require a user to have administrative privileges, which can pose a threat to network security. the virtual machine allows for a user to have administrative privileges within the vm, but not outside of the vm. the user can be provided with the privileges needed for installing and validating content without modifying their privileges on the host machine. in addition, the vm can be isolated by configuring its network connection so that any potential security risks are limited to the vm instance and do not extend to either the host machine or the network. laptop classroom instructors at the ucsd libraries need a laptop classroom that meets the usual requirements for this type of service (mobility, dependability, etc.) but also allows for the variety of computing environments and applications in use throughout our several library locations. in a least-common-denominator scenario, computers are configured to meet a general standard (usually microsoft windows with a standard browser and office suite) and allow minimal customization. while this solution has its advantages and is easy to configure and maintain from the it perspective, it leaves much to be desired for an instructor who needs to use a variety of tools in the classroom, often on demand. the goal in this case is not to settle for a single generic build but instead look for a solution that accommodats three needs: n the ability to switch quickly between different customized os configurations n the ability to add and remove applications on demand in a classroom setting n the ability to restore a computer modified during class to its original state of course, regardless of the approach taken, the laptops still needed to retain a high level of system security, application stability, and regular hardware maintenance. after a thorough review of the different technologies and tools already in use in the libraries, we determined that virtualization might also serve to meet the requirements of our laptop classroom. the need to support multiple users and multiple vms makes this scenario an ideal candidate for hypervisor-based virtualization. we decided to use vdi (virtual desktop infrastructure), a commercially available hypervisor product from vmware. vmware is one of the largest providers of virtualization software, and we were already familiar with several iterations of its host-based vm services. the core of our project plan consists of a base vm to be created and managed by our it department. to support a wide variety of applications and instruction styles, instructors could create a customized vm specific to their library’s instruction needs with only nominal assistance from it staff. the custom vm would then be made available on demand to the laptops from a central server (as depicted in figure 2 above). in this manner, instructors could “own” and maintain a personal instructional computing environment, while the classroom manager could still ensure the laptop classroom as a whole maintained the necessary secure software environment required by it. as an added benefit, once these vms are established, they could be accessed and used in a variety of diverse locations. n considerations for implementation before implementing any virtualization solution, in-depth analysis and testing is needed to determine which type of solution, if any, is appropriate for a specific use case in a specific environment. this analysis should include three major areas of focus: user experience, application performance in the virtualized environment, and effect on the enterprise infrastructure. in this section of this paper, we review considerations that, in hindsight, we would have found to be extremely valuable in the ucsd libraries’ various implementations of virtualization. user experience traditionally, system engineers have developed systems and tuned performance according to engineering metrics (e.g., megabytes per second and network latency). while such metrics remain valuable to most assessments of a 114 information technology and libraries | september 2009 computer application, performance assessments are being increasingly defined by usability and user experience factors. in an academic computing environment, especially in areas such as library computer labs, these newer kinds of performance measures are important indicators of how effectively an application performs and, indirectly, of how well resources are being used. virtualization can be implemented in a way that allows library users to have access to both the virtualized and host oss or to multiple virtualized oss. since virtualization essentially creates layers within the workstation, multiple os layers (either host or virtualized) can cause the users to become confused as to which os they are interacting with at a given moment. in that kind of implementation, the user can lose his or her way among the host and guest oss as well as become disoriented by differing features of the virtualized oss. for example, the user may choose to save a file to the desktop, but may not be aware that the file will be saved to the desktop of the virtualized os and not the host os. external device support can also be problematic for the end user, particularly with regard to common devices such as flash drives. the user needs to be aware of which operating system is in use, since it is usually the only one with which an external device is configured to work. authentication to a system is another example of how the relationship between the host and guest os can cause confusion. the introduction of a second os implicitly creates a second level of authentication and authorization that must be configured separately from that of the host os. user privileges may differ between the host and guest os for a particular vm configuration. for instance, a user might need to remember two logins or at least enter the same login credentials twice. these unexpected differences between the host and guest os produce negative effects on a user’s experience. this can be a critical factor in a time-sensitive environment such as a computer lab, where the instructor needs to devote class time to teaching and not to preparing the computers for use and navigating students through applications. interface latency and responsiveness latency (meaning here the responsiveness or “sluggishness” of the software application or the os) in any interface can be a problem for usability. developers devote a significant amount of time to improving operating systems and application interfaces to specifically address this issue. however, users will often be unable to recognize when an application is running a virtualized os and will thus expect virtualized applications to perform with the same responsiveness as applications that are not-virtualized. in our experience, some vm implementations exhibit noticeable interface latency because of inherent limitations of the virtualization software. perhaps the most notable and restrictive limitation is the lack of advanced 3d video rendering capability. this is due to the lack of support for hardware-accelerated graphics, thus adding an extra layer of communication between the application and the video card and slowing down performance. in most hardware-accelerated 3d applications (e.g., google earth pro or second life), this latency is such a problem that the application becomes unusable in a virtualized environment. recent developments have begun to address and, in some cases, overcome these limitations.3 in every virtualization solution there is overhead for the virtualization software to do its job and delegate resources. in our experience, this has been found to cause an approximately 10–20 percent performance penalty. most applications will run well with little or moderate changes to configuration when virtualized, but the overhead should not be overlooked or assumed to be inconsequential. it is also valuable to point out that the combination of applications in a vm, as well as vms running together on the same host, can create further performance issues. traditional bottlenecks the bottlenecks faced in traditional library computing systems also remain in almost every virtualization implementation. general application performance is usually limited by the specifications of one or more of the following components: processor, memory, storage, and network hardware. in most cases, assuming adequate hardware resources are available, performance issues can be easily addressed by reconfiguring the resources for the vm. for example, a vm whose application is memorybound (i.e., performance is limited by the memory available to the vm), can be resolved by adjusting the amount of memory allocated to the vm. a critical component of planning a successful virtualization deployment includes a thorough analysis of user workflow and the ways in which the vm will be utilized. although the types of user workflows may vary widely, analysis and testing serve to predict and possibly avoid potential bottlenecks in system performance. enterprise impact when assessing the effect virtualization will have on your library infrastructure, it is important to have an accurate understanding of the resources and capabilities that will form the foundation for the virtualized infrastructure. it is a misconception that it is necessary to purchase stateof-the-art hardware to implement virtualization. not only are organizations realizing how to utilize existing hardware better with virtualization for specific projects, they are discovering that the technology can be extended employing virtualization in library computing | hutt et al. 115 to the rest of the organization and be successfully integrated into their it management practices. virtualization does, however, impose certain performance requirements for large-scale deployments that will be used in a 24/7 production environment. in such scenarios, organizations should first compare the level of performance offered by their current hardware resources with the performance of new hardware. the most compelling reasons to buy new servers include the economies of scale that can be obtained by running more vms on fewer, more robust servers, as well as the enhanced performance supplied by newer, more virtualization-aware hardware. in addition, virtualization allows for resources to be used more efficiently, resulting in lower power consumption and cooling costs. also, the network is often one of the most overlooked factors when planning a virtualization project. while a local virtualized environment (i.e., a single computer) may not necessarily require a high performance network environment, any solution that calls for a hypervisor-based infrastructure requires considerable planning and scaling for bandwidth requirements. the current network hardware available in your infrastructure may not perform or scale adequately to meet the needs of this vm use. again, this highlights the importance of thorough user workflow analyses and testing prior to implementation. depending on the scope of your virtualization project, deployment in your library can potentially be expensive and can have many indirect costs. while the initial investment in hardware is relatively easy to calculate, other factors, such as ongoing staff training and system administration overhead, are much more difficult to determine. in addition, virtualization adds an additional layer to oftentimes already complex software licensing terms. to deal with the increased use of virtualization, software vendors are devoting increasing attention to the intricacies of licensing their products for use in such environments. while virtualization can ameliorate some licensing constraints (as noted in the at workshop use case), it can also conceal and promote licensing violations, such as multiple uses of a single-license applications or access to license-restricted materials. license review is a prudent and highly recommended component of implementing a virtualization solution. finally, concerning virtualization software itself, it also should be noted that while commercial vm companies usually provide plentiful resources for aiding implementation, several worthy open-source options also exist. as with any opensource software, the total cost of operation (e.g., the costs of development, maintenance, and support) needs to be considered. n conclusion as our use cases illustrate, there are numerous potential applications and benefits of virtualization technology in the library environment. while we have illustrated a number of these, many more possibilities exist, and further opportunities for its application will be discovered as virtualization technology matures and is adapted by a growing number of libraries. as with any technology, there are many factors that must be taken into account to evaluate if and when virtualization is the right tool for the job. in short, successful implementation of virtualization requires thoughtful planning. when so implemented, virtualization can provide libraries with cost-effective solutions to long-standing problems. references and notes 1. alessio gaspar et al., “the role of virtualization in computing education,” in proceedings of the 39th sigcse technical symposium on computer science education (new york: acm, 2008): 131–32; paul ghostine, “desktop virtualization: streamlining the future of university it,” information today 25, no. 2 (2008): 16; robert p. goldberg, “formal requirements for virtualizable third generation architectures,” in communications of the acm 17, no. 7 (new york: acm, 1974): 412–21; and karissa miller and mahmoud pegah, “virtualization: virtually at the desktop,” in proceedings of the 35th annual acm siguccs conference on user services (new york: acm, 2007): 255–60. 2. for other, non–ucsd use cases of virtualization, see joel c. adams and w. d. laverell, “configuring a multi-course lab for system-level projects,” sigcse bulletin 37, no. 1 (2005): 525–29; david collins, “using vmware and live cd’s to configure a secure, flexible, easy to manage computer lab environment,” journal of computing for small colleges 21, no. 4 (2006): 273–77; rance d. necaise, “using vmware for dual operating systems,” journal of computing in small colleges 17, no. 2 (2001): 294–300; and jason nieh and chris vaill, “experiences teaching operating systems using virtual platforms and linux,” sigcse bulletin 37, no 1 (2005): 520–24. 3. h. andrés lagar-cavilla, “vmgl (formerly xen-gl): opengl hardware 3d acceleration for virtual machines,” www .cs.toronto.edu/~andreslc/xen-gl/ (accessed oct. 21, 2008). editorial board thoughts: “india does not exist.” mark cyzyk information technology and libraries | june 2013 4 often, i find myself trolling online forums, searching for and praying i find a bona-fide solution to a technical problem. typically, my process begins with the annoying discovery that many others are running into the same, or very similar, difficulty. many others. once i get over my initial frustration ("why isn't this problem fixed by now?"), i proceed to read, to attempt to determine which of the often conflicting and even contradictory suggestions for fixing the problem might actually work. i thought it would be instructive to step back for a moment and examine this experience. to do so, i want to use as my example, as my straw man, not a technical question, but a more generic question, the sort of question anyone might conceivably ask. i'll ask this question, then i'll list what i think might be answers, in form and substance, from the technical forums had it been asked there: "i want to go to india. how best to get there?" why would you want to go there? you could fly. you could take a ship. why go to india? iceland is much better. i went to india once and it wasn't that great. you never specify where in india you want to go. we can't help you until you tell us where in india you want to go. i am sick and tired of these people who don't read the forums. your query has been answered before. the only way to get there is to fly first class on continental. you could ride a mule to india. new zealand is much better. you should go there instead. it is impossible to go to india. you can get from india to anywhere in europe very easily via india air. you should read a passage to india, i forget who wrote it. i read it as an undergraduate. it was very good. you are an idiot for wanting to go to india. india does not exist. mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. mailto:mcyzyk@jhu.edu editorial board thoughts: india does not exist | cyzyk 5 i think it's safe to say that the signal to noise ratio here is high. if we truly want to answer a question, we don't want to add noise. pontificating, posturing, and automatically posing as a mentor in a mentor/protégé relationship will typically be construed as adding nothing but noise to the signal. in most cases, we who answer such questions are not here to educate, except insofar as we provide a clear and concise answer to a technical query issued by one of our peers. what should we assume? first off, we should assume that the person writing the question is sincere: he truly does want to go to india. we need not question his motives. the best way to think about this is that the query is a hypothetical: if he were to want to go to india, how best to do it? if you were to want to go to india, how best to do it? this requires a certain level of empathy on the part of the one answering the question, a level of empathy of which the technical forums are all but devoid. many answers on those forums are so tone-deaf to human need they may as well have been written by robots. "how best to get there" is tricky because you must make some assumptions. assumptions are fine as long as you're explicit about them. one assumption might be: he is leaving from the east coast of the united states. another assumption might be: he is going to india only for a short while, for a conference or vacation. yet another one might be: by "best" he means "quickest, most efficient, least expensive." stating these assumptions, then stating your answer to the question, is appropriate and is what is most helpful. stating your assumptions is tantamount to stating your understanding of the original question, its scope and context. this is always a helpful thing to do when attempting to communicate with another human being. now, communication and plumbing the depth of human need, at least with respect to informational and bibliographic needs, has always been a strong suit of librarians, so what i write here is not really directed at librarians. it is, though, directed at we who straddle both the library world and the technology world, if that distinction is not a false one and can be usefully made. i think it important for those of us split between two cultures to ensure that we fall to one side and not the other, in particular that we do not fall into the oftentimes loutish and ultimately unproductive communication mores exhibited by many of the online technical forums. whenever my wife and i hear a news story on tv or radio openly wondering why more women do not go into i.t., i blurt out something like: "you wanna know why? just go read the comments section of most posts at slashdot.com. why on earth would anyone who didn't have to put up with that kind of culture actually choose to put up with it?" isn't "india does not exist" exactly the kind of response one would find on slashdot.com if the initial question was, "i want to go to india -how best to get there?"? with all this in mind, i hereby issue my own question, this time a technical one: information technology and libraries | june 2013 6 "i want to programmatically convert a largish set of documents from pdf to docx format. how best to do it?" i hope you don't think i'm an idiot. editorial board thoughts: requiring and demonstrating technical skills for library employment emily morton-owens information technologies and libraries | september 2016 6 recently i’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ital article by monica maceli1 and a code4lib presentation by jennie rose halperin.2 maceli performed a text analysis of job postings on code4lib to reveal what skills are cooccurring and most frequent. halperin problematized the expense of the mls credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. this work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how mls programs can teach library technology. during a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. these advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. although there have been conflicting interpretations of the hewlett-packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 i observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. among my students, for example, constant confusion attends the years of experience required. is this library experience? general job experience? experience at the same type of library? paid or unpaid? postings are often ambiguous and students may choose to apply or not. similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. someone who understands subversion will be capable of understanding git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. git, subversion, or mercurial).” i recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. more of us had come to working in library technology through work experience or graduate programs. and yet, our job postings contained long statements that conflated education and experience, such as “bachelor’s degree in computer science, information science, or other emily morton-owens (egmowens@upenn.edu), a member of the ital editorial board, is director of digital library development and systems, university of pennsylvania libraries, philadelphia, pennsylvania. mailto:egmowens@upenn.edu editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in object oriented and scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 8 references 1. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. jennie rose halperin, “our $50,000 problem: why library school?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. tara sophia mohr, “why women don’t apply for jobs unless they’re 100% qualified,” harvard business review, august 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobsunless-theyre-100-qualified. 4. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate microsoft word december_ital_farnell_final.docx editorial board thoughts: metadata training in canadian library technician programs sharon farnel information technologies and libraries | december 2016 3 the core metadata team at my institution is small but effective. in addition to myself as coordinator, we include two librarians and two full-time metadata assistants. our metadata assistant positions are considered to be similar, in some ways, to other senior assistant positions within the organization which require or at least prefer that individuals have a library technician diploma. however, neither of our metadata assistants has such a diploma. their credentials, in fact, are quite different. in part, this difference is driven by the nature of the work that our metadata assistants do. they work regularly with different metadata standards such as mods, dc, ddi in addition to marc. the perform operations on large batches of metadata using languages such as xslt or r. this is quite different in many ways than the work of their colleagues who work with the ils, many of whom do have a library technician diploma. as we prepare for an upcoming short-term leave of one of our team members, i have been thinking a great deal about the work our metadata assistants do and whether or not we would find an individual who came through a librarian technician program who had the skills and knowledge we need a replacement to have. and i have also been reminded of conversations i have had with recently graduated library technicians who felt their exposure to metadata standards, practices, and tools beyond rda and marc had been lacking in their programs. this got me thinking about the presence or absence of metadata courses in library technician programs in canada. i reached out to two colleagues from macewan university—norene erickson and lisa shamchuk—who are doing in-depth research into library technician education in canada. they kindly provided me with a list of canadian institutions that offer a library technician program so i could investigate further. now, i must begin with two caveats. one, this is very much a surface level scan rather than an indepth examination, although this is simply the first step in what i hope will be a longer term investigation. second, although several francophone institutions in canada offer library technician programs, i did not review their programs; i was concerned that my lack of fluency in the french language could lead to inadvertent misrepresentations. sharon farnel (sharon.farnel@ualberta.ca), a member of the ital editorial board, is metadata coordinator, university of alberta libraries, edmonton, alberta. editorial board thoughts | farnel https://doi.org/10.6017/ital.v35i4.9601 4 canadian institutions offering a library technician program (by province) are: alberta ● macewan university (http://www.macewan.ca/wcm/schoolsfaculties/business/programs/libraryandinforma tiontechnology/) ● southern alberta institute of technology (http://www.sait.ca/programs-and-courses/fulltime-studies/diplomas/library-information-technology) british columbia ● langara college (http://langara.ca/programs-and-courses/programs/library-informationtechnology/) ● university of the fraser valley (http://www.ufv.ca/programs/libit/) manitoba ● red river college (http://me.rrc.mb.ca/catalogue/programinfo.aspx?progcode=libifdp®ioncode=wpg) nova scotia ● nova scotia community college (http://www.nscc.ca/learning_programs/programs/plandescr.aspx?prg=lbtn&pln=libin ftech) ontario ● algonquin college (http://www.algonquincollege.com/healthandcommunity/program/library-andinformation-technician/) ● conestoga college (https://www.conestogac.on.ca/parttime/library-and-informationtechnician) ● confederation college (http://www.confederationcollege.ca/program/library-andinformation-technician) ● durham college (http://www.durhamcollege.ca/programs/library-and-informationtechnician) ● seneca college (http://www.senecacollege.ca/fulltime/lit.html) ● mohawk college (http://www.mohawkcollege.ca/ce/programs/community-services-andsupport/library-and-information-technician-diploma-800) information technologies and libraries | december 2016 5 quebec ● john abbott college (http://www.johnabbott.qc.ca/academics/careerprograms/information-library-technologies/) saskatchewan ● saskatchewan polytechnic (http://saskpolytech.ca/programs-andcourses/programs/library-and-information-technology.aspx) my method was quite simple. using the program websites listed above, i reviewed the course listings looking for ‘metadata’ either in the title or in the description when it was available. of the fourteen (14) programs examined, nine (9) had no course with metadata in the title or description. two (2) programs had courses where metadata was listed as part of the content but not the focus: langara college as part of “special topics: creating and managing digital collections” and seneca college as part of “cataloguing iii” which has a partial focus on metadata for digital collections. three (3) of the programs had a course with metadata in the title or description; all are a variation on “introduction to metadata and metadata applications”. (importantly, the three institutions in question conestoga college, confederation college, and mohawk college are all connected and share courses online). so, what do these very preliminary and impressionistic findings tell us? it seems that there is little opportunity for students enrolled in library technician programs in canada to be exposed to the metadata standards, practices, and tools that are increasingly necessary for positions involved in work with digital collections, research data management, digital preservation, and the like. admittedly, no program can include courses on all potentially relevant topics. in addition, formal course work is only one aspect of training and education that can prepare graduates for their career; practica and work placements and other more informal activities during a program are crucial, as are the skills and knowledge that can only be developed once hired and on the job. nevertheless, based on the investigation above, one would be justified in asking if we are disadvantaging students by not working to incorporate additional coursework focused on metadata standards, application, and tools, as well as on basic skills in manipulation of metadata in large batches. scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our editorial board thoughts | farnel https://doi.org/10.6017/ital.v35i4.9601 6 applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,1 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. references 1. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. a semantic model of selective dissemination of information | morales-del-castillo et al. 21 a semantic model of selective dissemination of information for digital libraries j. m. morales-del-castillo, r. pedraza-jiménez, a. a. ruíz, e. peis, and e. herrera-viedma in this paper we present the theoretical and methodological foundations for the development of a multi-agent selective dissemination of information (sdi) service model that applies semantic web technologies for specialized digital libraries. these technologies make possible achieving more efficient information management, improving agent–user communication processes, and facilitating accurate access to relevant resources. other tools used are fuzzy linguistic modelling techniques (which make possible easing the interaction between users and system) and natural language processing (nlp) techniques for semiautomatic thesaurus generation. also, rss feeds are used as “current awareness bulletins” to generate personalized bibliographic alerts. n owadays, one of the main challenges faced by information systems at libraries or on the web is to efficiently manage the large number of documents they hold. information systems make it easier to give users access to relevant resources that satisfy their information needs, but a problem emerges when the user has a high degree of specialization and requires very specific resources, as in the case of researchers.1 in “traditional” physical libraries, several procedures have been proposed to try to mitigate this issue, including the selective dissemination of information (sdi) service model that make it possible to offer users potentially interesting documents by accessing users’ personal profiles kept by the library. nevertheless, the progressive incorporation of new information and communication technologies (icts) to information services, the widespread use of the internet, and the diversification of resources that can be accessed through the web has led libraries through a process of reinvention and transformation to become “digital” libraries.2 this reengineering process requires a deep revision of work techniques and methods so librarians can adapt to the new work environment and improve the services provided. in this paper we present a recommendation and sdi model, implemented as a service of a specialized digital library (in this case, specialized in library and information science), that can increase the accuracy of accessing information and the satisfaction of users’ information needs on the web. this model is built on a multi-agent framework, similar to the one proposed by herrera-viedma, peis, and morales-del-castillo,3 that applies semantic web technologies within the specific domain of specialized digital libraries in order to achieve more efficient information management (by semantically enriching different elements of the system) and improved agent–agent and user–agent communication processes. furthermore, the model uses fuzzy linguistic modelling techniques to facilitate the user–system interaction and to allow a higher grade of automation in certain procedures. to increase improved automation, some natural language processing (nlp) techniques are used to create a system thesaurus and other auxiliary tools for the definition of formal representations of information resources. in the next section, “instrumental basis,” we briefly analyze sdi services and several techniques involved in the semantic web project, and we describe the preliminary methodological and instrumental bases that we used for developing the model, such as fuzzy linguistic modelling techniques and tools for nlp. in “semantic sdi service model for digital libraries,” the bulk of this work, the application model that we propose is presented. finally, to sum up, some conclusive data are highlighted. n instrumental basis filtering techniques for sdi services filtering and recommendation services are based on the application of different process-management techniques that are oriented toward providing the user exactly the information that meets his or her needs or can be of his or her interest. in textual domains, these services are usually developed using multi-agent systems, whose main aims are n to evaluate and filter resources normally represented in xml or html format; and n to assist people in the process of searching for and retrieving resources.4 j. m. morales-del-castillo (josemdc@ugr.es) is assistant professor of information science, library and information science department, university of granada, spain. r. pedrazajiménez (rafael.pedraza@upf.edu) is assistant professor of information science, journalism and audiovisual communication department, pompeu fabra university, barcelona, spain. a. a. ruíz (aangel@ugr.es) is full professor of information science, library and information science department, university of granada. e. peis (epeis@ugr.es) is full professor of information science, library and information science department, university of granada. e. herrera-viedma (viedma@decsai.ugr.es) is senior lecturer in computer science, computer science and artificial intelligence department, university of granada. 22 information technology and libraries | march 2009 traditionally, these systems are classified as either content-based recommendation systems or collaborative recommendation systems.5 content-based recommendation systems filter information and generate recommendations by comparing a set of keywords defined by the user with the terms used to represent the content of documents, ignoring any information given by other users. by contrast, collaborative filtering systems use the information provided by several users to recommend documents to a given user, ignoring the representation of a document’s content. it is common to group users into different categories or stereotypes that are characterized by a series of rules and preferences, defined by default, that represent the information needs and common behavioural habits of a group of related users. the current trend is to develop hybrids that make the most of content-based and collaborative recommendation systems. in the field of libraries, these services usually adopt the form of sdi services that, depending on the profile of subscribed users, periodically (or when required by the user) generate a series of information alerts that describe the resources in the library that fit a user’s interests.6 sdi services have been studied in different research areas, such as the multi-agent systems development domain,7 and, of course, the digital libraries domain.8 presently, many sdi services are implemented on web platforms based on a multi-agent architecture where there is a set of intermediate agents that compare users’ profiles with the documents, and there are input-output agents that deal with subscriptions to the service and display generated alerts to users.9 usually, the information is structured according to a certain data model, and users’ profiles are defined using a series of keywords that are compared to descriptors or the full text of the documents. despite their usefulness, these services have some deficiencies: n the communication processes between agents, and between agents and users, are hindered by the different ways in which information is represented. n this heterogeneity in the representation of information makes it impossible to reuse such information in other processes or applications. a possible solution to these deficiencies consists of enriching the information representation using a common vocabulary and data model that are understandable by humans as well as by software agents. the semantic web project takes this idea and provides the means to develop a universal platform for the exchange of information.10 semantic web technologies the semantic web project tries to extend the model of the present web by using a series of standard languages that enable enriching the description of web resources and make them semantically accessible.11 to do that, the project basis itself on two fundamental ideas: (1) resources should be tagged semantically so that information can be understood both by humans and computers, and (2) intelligent agents should be developed that are capable of operating at a semantic level with those resources and that infer new knowledge from them (shifting from the search of keywords in a text to the retrieval of concepts).12 the semantic backbone of the project is the resource description framework (rdf) vocabulary, which provides a data model to represent, exchange, link, add, and reuse structured metadata of distributed information sources, thereby making them directly understandable by software agents.13 rdf structures the information into individual assertions (e.g., “resource,” “property,” and “property value triples”) and uniquely characterizes resources by means of uniform resource identifiers (uris), allowing agents to make inferences about them using web ontologies or other, simpler semantic structures, such as conceptual schemes or thesauri.14 even though the adoption of the semantic web and its application to systems like digital libraries is not free from trouble (because of the nature of the technologies involved in the project and because of the project’s ambitious objectives,15 among other reasons), the way these technologies represent the information is a significant improvement over the quality of the resources retrieved by search engines, and it also allows the preservation of platform independence, thus favouring the exchange and reuse of contents.16 as we can see, the semantic web works with information written in natural language that is structured in a way that can be interpreted by machines. for this reason, it is usually difficult to deal with problems that require operating with linguistic information that has a certain degree of uncertainty (e.g., when quantifying the user’s satisfaction in relation to a product or service). a possible solution could be the use of fuzzy linguistic modelling techniques as a tool for improving system–user communication. fuzzy linguistic modelling fuzzy linguistic modelling supplies a set of approximate techniques appropriate for dealing with qualitative aspects of problems.17 the ordinal linguistic approach is defined according to a finite set of tags (s) completely ordered and with odd cardinality (seven or nine tags): { }{ }t,=hi,s=s i …∈ 0, the central term has a value of approximately 0.5, and the rest of the terms are arranged symmetrically around a semantic model of selective dissemination of information | morales-del-castillo et al. 23 it. the semantics of each linguistic term is given by the ordered structure of the set of terms, considering that each linguistic term of the pair (si, st-i) is equally informative. each label si is assigned a fuzzy value defined in the interval [0,1] that is described by a linear trapezoidal property function represented by the 4-tupla (ai, bi, αi, βi). (the two first parameters show the interval where the property value is 1.0; the third and fourth parameters show the left and right limits of the distribution.) additionally, we need to define the following properties: 1.–the set is ordered: si ≥ sj if i ≥ j. 2.–there is the negation operator: neg(si ) = sj, with j = t i. 3.–maximization operator: max(si, sj) = si if si ≥ sj. 4.–minimization operator: min(si, sj) = si if si ≤ sj. it also is necessary to define aggregation operators, such as linguistic weighted averaging (lwa),18 capable of and operating with and combining linguistic information. focusing on facilitating the interaction between users and system, the other starting objective is to achieve the development and implementation of the model proposed in the most automated way possible. to do this, we use a basic auxiliary tool—a thesaurus—that, among other tasks, assists users in the creation of their profile and enables automating the alerts generation. that is why it is critical to define the way in which we create this tool, and in this work we propose a specific method for the semiautomatic development of thesauri using nlp techniques. nlp techniques and other automating tools nlp consists of a series of linguistic techniques, statistic approaches, and machine learning algorithms (mainly clustering techniques) that can be used, for example, to summarize texts in an automatic way, to develop automatic translators, and to create voice recognition software. another possible application of nlp would be the semiautomatic construction of thesauri using different techniques. one of them consists of determining the lexical relations between the terms of a text (mainly synonymy, hyponymy, and hyperonymy),19 and extracting terms that are more representative for the text’s specific domain.20 it is possible to elicit these relations by using linguistic tools, like princeton’s wordnet (http://wordnet .princeton.edu) and clustering techniques. wordnet is a powerful multilanguage lexical database where each one of its entries is defined, among other elements, by their synonyms (synsets), hyponyms, and hyperonyms.21 as a consequence, once given the most important terms of a domain, wordnet can be used to create from them a thesaurus (after leaving out all terms that have not been identified as belonging or related to the domain of interest).22 this tool can also be used with clustering techniques—for example, to group documents of a collection in a set of nodes or clusters, depending on their similarity. each of these clusters is described by the most representative terms of their documents. these terms make up the most specific level of a thesaurus and are used to search in wordnet for their synonyms and most general terms, contributing (with the repetition of this procedure) to the bottom-up-development process of the thesaurus.23 although there are many others, these are some of the most well-known techniques of semiautomatic thesaurus generation (semiautomatic because, needless to say, the supervision of experts is necessary to determine the validity of the final result). for specialized digital libraries, we propose developing, on a multi-agent platform and using all these tools, sdi services capable of generating alerts and recommendations for users according to their personal profiles. in particular, the model presented here is the result of several previous models merging, and its service is based on the definition of “current-awareness bulletins,” where users can find a basic description of the resources recently acquired by the library or those that might be of interest to them.24 n the semantic sdi service model for digital libraries the sdi service includes two agents (an interface agent and a task agent) distributed in a four-level hierarchical architecture: user level, interface level, task level and resource level. its main components are a repository of full-text documents (which make up the stock of the digital library) and a series of elements described using different rdfbased vocabularies: one or several rss feeds that play a role similar to that of current-awareness bulletins in traditional libraries; a repository of recommendation log files that store the recommendations made by users about the resources, and a thesaurus that lists and hierarchically relates the most relevant terms of the specialization domain of the library.25 also, the semantics of each element (that is, its characteristics and the relations the element establishes with other elements in the system) are defined in a web ontology developed in web ontology language (owl).26 next, we describe these main elements as well as the different functional modules that the system uses to carry out its activity. elements of the model there are four basic elements that make up the system: 24 information technology and libraries | march 2009 the thesaurus, user profiles, rss feeds, and recommendation log files. thesaurus an essential element of this sdi service is the thesaurus, an extensible tool used in traditional libraries that enables organizing the most relevant concepts in a specific domain, defining the semantic relations established between them, such as equivalence, hierarchical, and associative relations. the functions defined for the thesaurus in our system include helping in the indexing of rss feeds items and in the generation of information alerts and recommendations. to create the thesaurus, we followed the method suggested by pedraza-jiménez, valverde-albacete, and navia-vázquez.27 the learning technique used for the creation of a thesaurus includes four phases: preprocessing of documents, parameterizing the selected terms, conceptualizing their lexical stems, and generating a lattice or graph that shows the relation between the identified concepts. essentially, the aim of the preprocessing phase is to prepare the documents’ parameterization by removing elements regarded as superfluous. we have developed this phase in three stages: eliminating tags (stripping), standardizing, and stemming. in the first stage, all the tags (html, xml, etc.) that can appear in the collection of documents are eliminated. the second stage is the standardization of the words in the documents in order to facilitate and improve the parameterization process. at this stage, the acronyms and n-grams (bigrams and trigrams) that appear in the documents are identified using lists that were created for that purpose. once we have detected the acronyms and n-grams, the rest of the text is standardized. dates and numerical quantities are standardized, being substituted with a script that identifies them. all the terms (except acronyms) are changed to small letters, and punctuation marks are removed. finally, a list of function words is used to eliminate from the texts articles, determiners, auxiliary verbs, conjunctions, prepositions, pronouns, interjections, contractions, and grade adverbs. all the terms are stemmed to facilitate the search of the final terms and to improve their calculation during parameterization. to carry out this task, we have used morphy, the stemming algorithm used by wordnet. this algorithm implements a group of functions that check whether a term is an exception that does not need to be stemmed and then convert words that are not exceptions to their basic lexical form. those terms that appear in the documents but are not identified by morphy are eliminated from our experiment. the parameterization phase has a minimum complexity. once identified, the final terms (roots or bases) are quantified by being assigned a weight. such weight is obtained by the application of the scheme term frequencyinverse document frequency (tf-idf), a statistic measure that makes possible the quantification of the importance of a term or n-gram in a document depending on its frequency of appearance and in the collection the document belongs to. finally, once the documents have been parameterized, the associated meanings of each term (lemma) are extracted by searching for them in wordnet (specifically, we use wordnet 2.1 for unix-like systems). thus we get the group of synsets associated with each word. the group of hyperonyms and hyponyms also are extracted from the vocabulary of the analyzed collection of documents. the generation of our thesaurus—that is, the identification of descriptors that better represent the content of documents, and the identification of the underlying relations between them—is achieved using formal concept analysis techniques. this categorization technique uses the theory of lattices and ordered sets to find abstraction relations from the groups it generates. furthermore, this technique enables clustering the documents depending on the terms (and synonyms) it contains. also, a lattice graph is generated according to the underlying relations between the terms of the collection, taking into account the hyperonyms and hyponyms extracted. in that graph, each node represents a descriptor (namely, a group of synonym terms) and clusters the set of documents that contain it, linking them to those with which it has any relation (of hyponymy or hyperonymy). once the thesaurus is obtained by identifying its terms and the underlying relations between them, it is automatically represented using the simple knowledge organization system (skos) vocabulary (see figure 1).28 user profiles user profiles can be defined as structured representations that contain personal data, interests, and preferences of users with which agents can operate to customize the sdi service. in the model proposed here, these profiles are basically defined with friend of a friend (foaf), a specific rdf/xml for describing people (which favours the profile interoperability, since this is a widespread vocabulary supported by an owl ontology) and another nonstandard vocabulary of our own to define fields not included in foaf (see figure 2).29 profiles are generated the moment the user is registered in the system, and they are structured in two parts: a public profile that includes data related to the user’s identity and affiliation, and a private profile that includes the user’s interests and preferences about the topic of the alerts he or she wishes to receive. to define their preferences, users must specify keywords and concepts that best define their information a semantic model of selective dissemination of information | morales-del-castillo et al. 25 needs. later, the system compares those concepts with the terms in the thesaurus using as a similarity measure the edit tree algorithm.30 this function matches character strings, then returns the term introduced (if there’s an exact match) or the lexically most similar term (if not). consequently, if the suggested term satisfies user expectations, it will be added to the user’s profile together with its synonyms (if any). in those cases where the suggested term is not satisfactory, the system must have any tool or application that enables users to browse the thesaurus and select terms that better describe their needs. an example of this type of applications is thmanager (http://thmanager .sourceforge.net), a project of the universidad de zaragoza, spain, that enables editing, visualizing, and going through structures defined in skos. each of the terms selected by the user to define his or her areas of interest has an associated linguistic frequency value (tagged as ) that we call “satisfaction frequency.” it represents the regularity with which a particular preference value has been used in alerts positively evaluated by the user. this frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. the range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “frequency,” whose expression domain is defined by the linguistic term set s = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. rss feeds thanks to the popularization of blogs, there has been widespread use of several vocabularies specifically designed for the syndication of contents (that is, for making accessible to other internet users the content of a website by means of hyperlink lists called “feeds”). to create our current-awareness bulletin we use rss 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. it utilizes the rdf/xml syntax and data model and is easily extensible because of the use of proceedings figure 1. sample entry of a skos core thesaurus diego allione sr. af9fa7601df46e95566 library management 0.83 figure 2. user profile sample 26 information technology and libraries | march 2009 modules that enable extending the vocabulary without modifying its core each time new describing elements are added. in this model several modules are used: the dublin core (dc) module to define the basic bibliographic information of the items utilizing the elements established by the dublin core metadata initiative (http:// dublincore.org); the syndication module to facilitate software agents synchronizing and updating rss feeds; and the taxonomy module to assign topics to feeds items. the structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, summary, hyperlink to the primary resource, date of creation, and subjects). recommendation log file each document in the repository has an associated recommendation log file in rdf that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. each of the entries of the recommendation log files consists of a recommendation value, a uri that identifies the user that has done the recommendation, and the date of the record (see figure 4). the expression domain of the recommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “quality of the resource”: q = {very_low, low, medium, high, very_high}. these elements represent the raw materials for the sdi service that enable it to develop its activity through four processes or functional modules: the profiles updating process, rss feeds generation process, alert generation process, and collaborative recommendation process. system processes profiles updating process since the sdi service’s functions are based on generating passive searches to rss feeds from the preferences stored -
14/03/2007 high ítems_e> figure 4. recommendation log file sample -
escudero sánchez, manuel fernández cáceres, josé luis broadcasting and the internet http://eprints.rclis.org/…/audiovideo_good.pdf this paper is about… 2002 redoc, 8 (4), 2008 virual communities figure 3. rss feed item sample in a user’s profile, updating the profiles becomes a critical task. user profiles are meant to store long-term preferences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. in our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. users are asked about their satisfaction degree (ej) in relation to the information alert generated by the system (i.e., whether the items a semantic model of selective dissemination of information | morales-del-castillo et al. 27 retrieved are interesting or not). this satisfaction degree is obtained from the linguistic variable “satisfaction,” whose expression domain is the set of five linguistic labels: s’ = {total, very_high, high, medium, low, very_low, null}. this mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. it requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 the function proposed here rewards the frequencies associated with the preference values present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. let ej { }t,=hba,|ss,s ba 0,...∈∈ s’ be the degree of satisfaction, and f j i l { }t,=hba,|ss,s ba 0,...∈∈ s the frequency of property i (in this case i = “preference”) with value l, then we define the updating function g as s’x s→s: { } { } ( ) {=f,eg sdownloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. if you opt to participate, we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. you will not be paid to be in this study. whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 28 research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 29 recruitment email: named recipients dear library colleague, we are writing today to ask for your participation in a research project “content strategy in practice within academic libraries,” (cu boulder irb protocol #18-0670), led by co-investigators courtney mcdonald and heidi burkhardt (university of michigan). our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding non academic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. we ask that you forward this message to the person in your organization whose role includes oversight of your public web site. we are only requesting a response from one person at each institution contacted. thank you for your assistance in routing this request. we have provided the information below as a downloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutio nal engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. if someone within your library opts to participate, we expect that person will be in this research study for the duration of the time it takes to complete our web-based survey. the participant will not be paid to be in this study. whether or not someone in your library takes part in this research is an individual choice. the participant can leave the research at any time and it will not be held against them. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of co lorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 30 ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 31 appendix c: survey questions web content strategy methods and maturity start of block: introduction q1 web content strategy methods and maturity in academic libraries (cu boulder irb protocol #20-0581) purpose of the study the purpose of the study is to gather feedback from practitioners on the proposed content strategy maturity model for academic libraries, and to further enhance our understanding of web content strategy practice in academic libraries and the needs of its community of practice. q2 please make a selection below, in lieu of your signature, to document that you h ave read and understand the consent form, and voluntarily agree to take part in this research. o yes, i consent to take part in this research. (1) o no, i do not grant my consent to take part in this research. (2) skip to: end of survey if q2 = no, i do not grant my consent to take part in this research. end of block: introduction start of block: demographic information information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 32 q3 estimated total number of employees (fte) at your library organization: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q4 estimated number of employees with editing privileges within your primary library website: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q5 does your library have a documented web content strategy and / or a web content governance policy? o no (1) o yes (2) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 33 q6 are there position(s) within your library whose primary duties are focused on creation, management, and/or editing of web content? o no (1) o yes, including myself (2) o yes, not including myself (3) end of block: demographic information start of block: web content strategy q7 please indicate the degree to which each of the five elements of content strategy are currently in practice at your library. q8 creation employ editorial workflows, consider content structure, support writing. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 34 q9 delivery consider findability, discoverability, and search engine optimization, plus choice of content platform or channels. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o q10 governance support maintenance and lifecycle of content, as well as measurement and evaluation. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q11 planning use an intentional and strategic approach, including brand, style, and writing best practices. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 35 q12 user experience consider needs of the user to produce relevant, current, clear, concise, and in context. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q13 please rank the elements of content strategy (as defined above) in order of their priority based on your observations of practice in your library. • ______ creation (1) • ______ delivery (2) • ______ governance (3) • ______ planning (4) • ______ user experience (5) q14 how would you assess the content strategy maturity of your organization? o basic (1) o intermediate (2) o advanced (3) end of block: web content strategy information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 36 start of block: thank you! q15 your name: ________________________________________________________________ q16 thank you very much for your willingness to be interviewed as part of our research study. prior to continuing on to finalize your survey submission, please sign up for an interview time: [link] (this link will open in a new window in order to allow you to finalize and submit your survey response after scheduling an appointment) please contact courtney mcdonald, crmcdonald@colorado.edu, if you experience any difficulty in registering or if there is not a time available that works for your schedule. end of block: thank you! information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 37 appendix d: informed consent document permission to take part in a human research study page 37 of 28 title of research study: content strategy in practice within academic libraries irb protocol number: 18-0670 investigators: courtney mcdonald and heidi burkhardt purpose of the study the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, and/or greater western library alliance. we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. we expect about 210 people, representing their institutions, in the entire study internationally. explanation of procedures we are directly contacting each library to request that the appropriate individual(s) complete a web-based survey. this survey will be available over a four-week period in the spring of 2020. voluntary participation and withdrawal whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. the person in charge of the research study can remove you from the research study without your approval. possible reasons for removal include an incomplete survey submission. confidentiality information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. payment for participation you will not be paid to be in this study. contact for future studies we would like to keep your contact information on file so we can notify you if we have future research studies we think you may be interested in. this information will be used by only th e information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 38 principal investigator of this study and only for this purpose. you can opt-in to provide your contact information at the end of the online survey. questions if you have questions, concerns, or complaints, or think the research has hurt you, contact to the research team at crmcdonald@colorado.edu this research has been reviewed and approved by an irb. you may talk to them at (303) 7353702 or irbadmin@colorado.edu if: • your questions, concerns, or complaints are not being answered by the research team. • you cannot reach the research team. • you want to talk to someone besides the research team. • you have questions about your rights as a research subject. • you want to get information or provide input about this research. signatures in lieu of your signature, your acknowledgement of this statement in the online survey document documents your permission to take part in this research. mailto:crmcdonald@colorado.edu mailto:irbadmin@colorado.edu information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 39 appendix e: other content management systems mentioned by respondents question #4: which of the following content management systems does your library use to manage library-authored web content? write-in responses for ‘proprietary system hosted by institution’ ● xxxxxxxxxxx • archivesspace • pressbooks • preservica • hippo cms • siteleaf • cascade • dotcms • terminal four • acquia drupal • fedora based digital collections system built in house write-in responses for ‘other” • wiki and blog • we draft content in google docs & also use gather content for auditing. • google sites • cascade • ebsco stacks • modx • islandora and online journal system • contentful • we also have some in-house-built tools such as for room booking; some of these are quite old and we would like to upgrade or improve them when we can. (very few people can make edits in these tools.) • cascade • the majority of the library website (and university website) is managed by a locally developed cms; however, the university is in the process of migrating to the acquia drupal cms. • blacklight, vivo, fedora • most pages are just non-cms for the website information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 40 appendix f: organizational responsibility for content; and position titles question 6 please explain how your organization distributes responsibility for content hosted in your content management system(s). if different parties (individuals, departments, collaborative groups) are responsible for managing content in different platforms please describe. • we have one primary website manager who oversees the management of the website, including content strategy and editing, and 2 editors who assist with small editing tasks. • we have content editors that edit content for individual libraries and collections. there is a content creator network managed by library communications. they provide trainings and guidance for content editors and act as reviewers, but not every single thing gets reviewed. • we have a team of developers and product owners who are responsible for managing web content. • we currently have a very distributed model, where virtually any library staff member or student assistant can request a drupal account and then make changes to existing content or develop new pages. we have a cross-departmental team that oversees the libraries' web interfaces and makes decisions about library homepage content, the menu navigation, overall ia, etc. we have web content guidelines to help staff as they develop new content. we have identified functional and technical owners for each of our cmss and have slightly different processes for managing content in those cmss. our general approach, however, is very inclusive (for better or worse ;) )-lots of staff have access to creating and editing content. we are, however, moving to a less distributed content for drupal in particular. moving forward, we'll have a small team responsible for editing and developing new content. this is to ensure that content is more consistent and user-centered. we attempted to identify funding for a full-time content manager but were unsuccessful, so this team will attempt to fill the role of a full-time content manager. • ux is the product owner and admin. if staff want content added to the website, they send a request to ux, we structure and edit content in a google doc, and then ux posts to the website. • there's no method for how or why responsibility is distributed. it ends up being something like, someone wants to add some content, they get editing access, they can now edit anything for as long as they're at the library. we are a super decentralized and informal library. • the primary content managers are the xxxxxx librarian and the xxxxxx. other individuals (primarily librarians) that are interested in editing their content have access on our development server. their edits are vetted by the xxxxxxand/or the xxxxxx librarian before being moved into production. • the xxxxxx department (6 staff) manages content and helps staff throughout the organization create and maintain content. ux staff sometimes teach others how to manage content, and sometimes do it for them. if design or content is complex, usually ux staff do the work. many staff don't maintain any content beyond their staff pages. subject specialists and instruction librarians maintain content [like] libguides-like content, but we don't use libguides. branch library staff maintain most of the content for their library pages. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 41 • in addition, the xxxxxx manages the catalog. the xxxxxx department manages special web projects. and the xxxxxx department manages social media, publications, and news. • a web content team made up of two administrators and librarians from xxxxxx and xxxxxx makes executive-level decisions about web content. • the xxxxxx team (xxxxxx) provides oversight and consulting for online user interfaces chaired by a xxxxxxposition which is new and is not yet filled. • for the public website, content editing is distributed to many groups and teams throughout the libraries. • the xxxxxxteam manages the main portions of the site including the homepage, news, maps, calendars, etc. the research librarians and subject liaisons manage the research guides. the xxxxxx provides guidance regarding overall responsibilities and style guidelines. • site structure and top-level pages for our main website resides with xxxxxx. page content is generally distributed to the departments closest to the services described by the pages. • right now editing of pages is distributed to those individuals who have the closest relationship to the pages being edited, with a significantly smaller number of people having administrative access to all of the libraries' websites. • primary website is co-managed by xxxxxx team (4 people) and xxxxxx team (3 people). xxxxxxteam creates timely content about news/events/initiatives while xxxxxx team manages content on evergreen topics. • research librarians and staff manage libguides content, which is in sore need of an inventory and pruning. • primarily me, plus two colleagues who serve with me as a web editorial board • one librarian manages the content and makes changes based on requests from other library staff • my role (xxxxxx) is xxxxxx. we also have a web content creator in our xxxxxx. i chair our xxxxxxgroup (xxxxxx), which has representatives from each division in the library and they are the primary stewards of supporting library authored web content. our "speciality" platforms (libguides, omeka, and wordpress for microsites) all have service leads, but content is managed by the respective stakeholders. the lead for libguides is a xxxxxx [group] member due to its scope and scale. in our primary website, we are currently structured around drupal organic groups for content management with xxxxxx [group] having broad editing access. in our new website, all content management will go through the xxxxxx, with communications for support and dynamic content (homepage, news, events) management. • management is somewhat in flux right now. we recently migrated our main web site to acquia drupal; there is a very new small committee consisting of xxxxxx, and three representatives from elsewhere in the library. for libguides, all reference, instructio n, and subject librarians can edit their own guides; the xxxxxx has tended to have final oversight but i don't know if this has ever been formally delegated. • librarians manage their own libguides subject guides; several members of xxxxxx can make administrative changes to coding, certificates, etc. on the entire site; there are individuals in different departments who control their own pages/libguides. there is a group within the library that administers wordpress for the institution. other content systems are administered by individuals within the library. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 42 • librarians are responsible for their own libguides. the xxxxxx department manages changes to most content, although some staff do manage their own wordpress content. they tend not to want to. • individuals. mainly one person authors content. the other individual has created some research guides. • individuals in different positions and departments within the library are assigned roles based on the type of content they frequently need to edit. • for instance, xxxxxx staff have the ability to create and edit exhibition content in drupal. xxxxxx staff and xxxxxx staff have the ability to create and edit equipment content. the event coordinator and librarians and staff involved in instruction are allowed to create and edit event and workshop listings. • only the communication coordinator is permitted to create news items that occupy real estate on the home page and various service point home pages. • as for general content, the primary internal stakeholders for that content typically create and edit that content, but if any staff notice a typo or factual error they are encouraged to correct them on their own, although they can also submit a request to the it department if they are not comfortable doing so. • subject specific content is hosted in libguides, and is maintained by subject liaison librarians. other content in libguides, software tutorials or information related to electronic resources for example, is created and maintained by appropriate specialists. • the drupal site when launched had internal stakeholders explicitly defined for each page, and only staff from the appropriate group could edit that content (e.g. if xxxxxx was tagged as the only stakeholder for a page about xxxxxx policies, then only staff from the xxxxxx department with editing privileges could edit that page). this system was abandoned after about two years as it was considered too much overhead to maintain and also the introduction of a content revisioning module that kept a history of edits alleviated fears of malicious editing. • individuals are assigned pages to keep content updated. the xxxxxx is responsible for coordinating with those staff and offers training to make sure content gets updated. • individual liaison librarians are responsible for their own libguides. i and the "xxxxxx" are the primary editors of the wordpress site, although 4 others have editing access (an employee who writes and posts news articles, the liaison librarian who spearheaded our new video tutorials, and two who work in special collections to update finding aids on that site, which is still on wordpress and i would consider under the main libraries web page, but is part of a multisite installation.) • in omeka and libguides, librarians are pretty self-sufficient and responsible for all of their own content. the three or four digital projects faculty and staff who work with omeka manage it internally alongside one of our developers. our omeka instance is relatively small-scale. • i (xxxxxx) oversee our libguides environment. while i am in the process of creating and implementing formal libguides content and structure guidelines, as of now it's a bit of a free-for-all with everyone responsible for the content pertaining to their own liaison department(s). content is made available to patrons via automatically populating legacy landing pages (we've had libguides for a decade and i've been with the institution not yet a year). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 43 • as the xxxxxx, i am ultimately responsible for almost all of the content in our wordpress environment. that said, i try to distribute content upkeep responsibilities to the relevant department for each piece of the site. managers and committee chairs provide me with what they want on the web, and as needed (and in consultation with them) i review/rewrite it for the web (plain language), develop information architecture, design the front-end, and accessibly publish the content. there are only a few faculty and staff at my library who are comfortable with wordpress -but one of my long-term goals is to empower more folks to enact their own minor edits (e.g., updating hours, lending policies, etc.) while i oversee large-scale content creation, overall architecture, and strategy. we have a blog portion of our wordpress site which is not managed by anyone in particular, but i tend to clean it up if things go awry. • generally all of our web authors *can* publish to most parts of the site. (a very few content types (mostly featured images that display on home pages) can be edited only by admins and a small number of super-users.) however the great majority of people who can post content very rarely do (and some never do). some edit or post only to specific blogs, some only to their own guides or to very specific pages or suites of pages (e.g. liaison librarians to their own guides; thesis assistant to thesis pages). our small group in xxxxxx reviews new and updated pages and edits for in-house style and usability guidelines, and also trains and works collaboratively with web authors to create more usable content and reduce duplication -but given the large number of authors (with varied priorities, skills, and preferences) and pages we have trouble keeping up. we also more actively manage content on home pages. • for the main website and intranet, we have areas broken apart by unit area. we use workbench access to determine who can edit which pages. libguides is managed by committee, but most of the librarians have access. proprietary systems have separate accounts for those who need access. • for libguides, librarians can create content as they like, though there is a group that provides some (light) oversight. for main library website, most content is overseen by departments (in practice, one person each from a handful of “areas”, such as the branches, access services, etc.). • dotcms is primarily managed in systems (2 staff), with delegates from admin and outreach allowed to make limited changes to achieve their goals. libguides is used by all librarians and several staff, with six people given admin privileges. wordpress is used only in special collections. • xxxxxx dept manages major public facing platforms (drupal, wordpress, and shares libguides responsibilities with xxxxxx dept). xxxxxx manages omeka. within platforms, responsibilities are largely managed by department with individuals assigned content duties & permissions as needed. • different units maintain their content; one unit has overall management and checks for uniformity, needed updates, and broken links. • developers/communications office oversees some aspects, library management, research and collections librarians, and key staff edit other pieces. • currently, content is maintained by the xxxxxx librarian in coordination with content stakeholders from around the organization. we are in the process of migrating our site from drupal to omniupdate. once that is complete, we will develop a new model for content responsibilities. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 44 • content is provided by department/services. • 5 librarians manage the libguides question 9 titles of positions in your organization whose primary duties involve creation, management and/or editing of web content: • head of web services; developer; web designer; user experience librarian • user experience librarian, lead librarian for discovery systems, digital technologies development librarian, lead librarian for software development. and we have titles that are university system it titles that don't mean a whole lot, such as technology support specialist and business and technology applications analyst. • web content specialist • user experience strategist, user experience designer, user experience student assistants , director of marketing communications and events • sr. ux specialist • web support consultant; coordinator, web services & library technology • editor & content strategist in library communications • web manager • discovery & systems librarian • head of library systems and technology • web services and data librarian • communications manager • web content and user experience specialist • metadata and discovery systems librarian, systems analyst, outreach librarian • digital services librarian; manager, communication services; communication specialist • (1) web project manager and content strategist, (2) web content creator • web services librarian • web developer ii • sr. software engineer, program director for digital services • user experience librarian • digital initiatives & scholarly communication librarian; senior library associate in digital scholarship and services • web services and usability librarian • senior library specialist -web content • web developer, software development librarian information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 45 appendix g: definitions of web content strategy question 11 in your own words, please define web content strategy. • a cohesive plan to create an overall strategy for web content that includes tone, terminology, structure, and deployment to best communicate the institution's message and enable the user. for the next question, the true answer is sort of. we have the start of a style guide. we also have the university's branding policies. we also have a web governance committee that is university-wide, of which i'm a part of. however, we don't have a complete strategy and it is certainly not documented. so you pick. • planning, development, and management of web content. two particularly important parts of web content strategy for academic library websites: 1. keeping content up to date and unpublishing outdated content. 2. building consensus for the creation and maintenance of a web style guide and ensuring that content across the large website adheres to the style guide. • strategies for management of content over its entire lifecycle to ensure it is accurate, timely, usable, accessible, appropriate, findable, and well-organized. • a system of workflows, training, and governance that supports the entire lifecycle of content, including creation, maintenance, and updating of content across all communications channels (e.g. websites, social media, signage). • a comprehensive, coordinated, planned approach to content across the site including components such as style guides, accessibility, information architecture, discoverability, seo. • not terribly familiar with the concept in a formal sense but think of it related to how the institution considers the intersection of content made available by the institution, the management and governance of issues such as branding/identity, accessibility, design, marketing, etc. • intentional and coordinated vision for content on the website • content strategy is the planning for the lifecycle of content. it includes creating, editing, reviewing, and deleting content. we also use a content strategy framework to determine each of the following for the content on our websites: audience, page goal, value proposition, validation, and measurement strategy. • website targets the community to ensure they can find what they need • the process of creating and enacting a vision for the organization and display of web content so that it is user friendly, accurate, up-to-date, and effective in its message. web content strategy often involves considering the thoughts and needs of many stakeholders, and creating one cohesive voice to represent them all. • web content strategy is the planning, design, delivery and governance plan for a website. this responsibility is guided by the library website management working group. • a web content strategy is a cohesive approach to managing and editing online content. an effective strategy takes into account web accessibility standards and endeavors to produce and maintain consistent, reliable, user-centered content. an effective content strategy evolves to meet the needs of online users and involves regular user testing and reviews of web traffic/analytics. • web content strategy is the theory and practice of creating, managing, and publishing web content according to evidence-based best practices for usability and readability information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 46 • making sure your content aligns with both your business goals and your audience needs. • a plan to oversee the life cycle of useful, usable content from its creation through maintenance and ultimately removal. • web content strategy is the overarching strategy for how you develop and disseminate web content. ideally, it would be structured and user tested to ensure that the content you are spending time developing is meeting the needs of your library and your community. • a web content strategy guides the full lifecycle of web content, including creation, maintenance, assessment, and retirement. it also sets guiding principles, makes responsibility and authority clear, and documents workflows. • an overarching method of bringing user experience best practices together on the website including: heuristics, information architecture, and writing for the web • planning and management of online content • a defined strategy for creating and delivering effective content to a defined audience at the right time. • in the most basic sense, web content strategy is matching the content, services and functionality of web properties with the organizational strategic goals. • web content strategy can include guidelines, processes, and/or approaches to making your website(s) usable, sustainable, and findable. it's a big-picture or higher-level way of thinking about your site(s), rather than page by page or function by function. • deliberate structures and practices to plan, deliver, and evaluate web content. • producing content that will be useful to users and easy for them to access • tying content to user behavior/user experience? • web content strategy is the thoughtful planning and construction of website content to meet users' needs. • n/a • cohesive planning, development, and management of web content, to engage and support library users. • working with teams and thinking strategically and holistically about the usability, functions, services, information, etc. provided on the website to best meet the needs of the site's users, as well as incorporating the marketing/promotional perspectives offered by the website. • planning and managing web content • web content strategy is the idea that all written and visual information on a certain site would conform to or align with the goals for that site. • ensuring that the most accurate and appropriate words, images, and other assets are presented to patrons at the point of need, while using web assets to tell stories patrons might not know they want to know. abstract introduction background maturity models application of maturity models within user experience work in libraries assessing the maturity of content strategy practice in libraries methods findings demographic information infrastructure & organizational structure content management systems dedicated positions, position titles, and organizational workflows web content strategy practices discussion proposed maturity model content strategy maturity model for academic libraries level 1: ad hoc level 2: establishing level 3: scaling level 4: sustaining level 5: thriving conclusion endnotes can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument rdfs:label> original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) index to volume 24 150 book reviews networks and disciplines; !proceedings of the educom fall conference, october 11-13, 1972, ann arbor, michigan. princeton: educom, 1973. 209p. $6.00. as with so many conferences, the principal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consensus. since few people are profound in ad lib commentaries, the result is shallow and repetitive. the forest of themes is completely lost among a bewildering array of trees. the conference was, i am sure, exciting and thought-provoking for the participants. it was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, museums, and social research. the paper on economics is by far the best organized presentation of the problems and potential of computers in any of the fields considered, and perhaps the best short presentation yet published for economics. the paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on social research highly personal, and that on museums a neat mixture of reporting and interpreting. much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. the speakers all directed their papers to the concept of networks, interpreted chiefly as widespread remote access to computational facilities. the papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. much of each summary is indicative and not really informative about the discussions. the concluding text again is the transcript of two final panels on themes and relationships among computer centers. the only description for this portion of the text is turgid. in the midst of all this is the banquet paper presented by ed parker, who as usual was thoughtful and insightful, and several presentations by national science foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in developing proposals. i can't think of another reference that touches on the potential of computers in so many different disciplines, but it is apparent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. this volume isn't it. russell shank smithsonian institution the analysis of information systems, by charles t. meadow. second edition. los angeles: melville publishing co., 1973. a wiley-becker & hayes series book. this is a revised edition of a book first published in 1967. the earlier edition was written from the viewpoint of the programmer interested in the application of computers to information retrieval and related problems. the second edition claims to be "more of a textbook for information science graduate students and users" (although it is not clear who these "users" are) . elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." the book is divided into four parts: language and communication (dealing largely with indexing techniques and the properties of index languages) , retrieval of information (including retrieval strategies and the evaluation of system performance), the organization of information (organization of records, of ffies, file sets), computer processing of information (basic file processes, data access systems, interactive information retrieval, programming languages, generalized data management systems). the second two sections are, i feel, . much better than the first. these are the areas in which the author has had the most direct experience, and the topics covered, at least in their information retrieval applications, are not discussed particularly well or particularly fully elsewhere. it is these sections of the book that make it of most value to the student of information science. i am less happy about meadow's discussion of indexing and index languages, which i find unclear, incomplete, and inaccurate in places. the distinction drawn between pre-coordinate and post-coordinate systems is inaccurate; meadow tends to refer to such systems simply as keyword systems, although it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially precoordinate. in fact, meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular document"), but this is not an accurate distinction between the two types of system. the real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. the references to faceted classification are also confusing and a number of statements are made throughout the discussion on index languages that are completely untrue. for example, meadow states (p. 51) that "a hierarchical classification language has no syntax to combine descriptors into terms." this is not at all accurate since several hierarchical classification schemes, including udc, do have synthetic elements which allow combination of descriptors, and some of these are highly synthetic. in fact, meadow himself gives an example (p. 3839) of this synthetic feature in the udc. it is also perhaps unfortunate that the student could read all through meadow's discussion of index languages without getting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. book reviews 151 moreover, meadow used medical subject headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not follow the usual thesaurus structure. my other criticism is that the book is too selective in its discussion of various aspects of information retrieval. for example, the discussion on automatic indexing is by no means a complete review of techniques that have been used in this field. likewise, the discussion of interactive systems is very limited, because it is based solely on nasa's system, recon. the student who relied only on meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. in short, i would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considerations. the author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and accurately by various other writers. i would not recommend the discussion on index languages to a student unless read in conjunction with other texts. f. w. lancaster university of illinois application of computer technology to librm·y processes, a syllabus, by joseph becker and josephine s. pulsifer. metuchen, n.j.: scarecrow press, 1973. 173p. $5.00. despite the large number of institutions offering courses related to library automation, including just about every library school in north america, accredited or not, there is a remarkable shortage of published material to assist in this instruction. with the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. this syllabus covers eight topics related 152 journal of library automation vol. 7/2 jtme 1974 to the use of computers in libraries, titled as follows: bridging the gap (librarians and automation); computer technology; systems analysis and implementation; marc program; library clerical processes (which encompasses acquisitions, cataloging, serials, circulation, and management information) ; reference services; related technologies; and library networks. each topic is treated as a unit of instruction, and each receives the identical treatment as follows. the units each start with an introductory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. the purpose of systems analysis, for example, is "to develop the sequence of steps essential to the introduction of automated systems into the library." a series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was unable to do before. for example, there are seven behavioral objectives in the unit on computer technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to represent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and describe their purposes, 3) the student will be able to differentiate hardware and software and describe briefly the part that programming plays in the overall computer processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." the remaining three objectives refer to the student's ability to enumerate and compare types of input, output, and storage devices. then an outline of the instructional material is presented, followed by the detailed and well-organized material for instruction. in no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. the end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. peter simmons school of librarianship university of british columbia the larc reports, vol. 6, issue 1. online cataloging and circulation at western kentucky university: an approach to automated instructional resources ~anagement. 1973. 78p. this is a detailed account of the design, development, and implementation of online cataloging and circulation which have been in operation at western kentucky university for several years. the library's reasons for using computers are similar to those of many college and university libraries that experienced rapid growth during the 1960s. the faculty of the division of library services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from dewey decimal to library of congress classification. the proposal was approved by the administration of the university, and the decision was made to utilize campus computer facilities via online input techniques for reclassification, cataloging, and circulation. "project reclass" was accomplished during 1970-71 using ibm 2741 ats/360 terminals. a circulation file was subsequently generated from the master record file. the main library is housed in a new building and has excellent computer facilities within the library that are connected to the university computer center. cataloging information is input directly into the system via ats terminals; ibm 2260 visual display terminals are used for inquiry into the status of books and patrons; and ibm 1031/1033 data collection terminals are used to charge out and check in books. catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. the on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. the attention given to planning, scheduling, training, and implementation is impressive. what has been accomplished to date is considered very successful, and plans are book reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. the report has some annoying shortcomings such as referring to the library of congress as "national library"; frequent use of the word "xeroxing," which the xerox corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar undertaking. lavahn overmyer school of library science case western reserve university subject access to a data base of library holdings alice s. clark: assistant dean for readers' services, university of new mexico general library, albuquerque. at the time this research was undertaken, the author was head of the undergraduate libraries at ohio state university. 267 as more academic and public libraries have some form of bibliographic description of their complete collection available in machine-readable form, public service librarians are devising ways to use the information for better retrieval. research at the ohio state university tested user 1'esponse to paper and com output from selected areas of the shelflist. results indicated usm·s at remote locations found such lists helpful, with some indication that paper printout was more popular than microfiche. while many of the computer applications in special libraries were designed to improve subject access to the collections, the systems adopted in academic and public libraries have often been those which would handle various file operations and improve control of circulation or technical processing functions. once some of the data describing the items in the collection became available in machine-readable form, reference librarians have been tempted to find ways to use it for subject retrieval. in november 1970, the ohio state university ( osu) libraries began to use its automated circulation system using a data base representing its complete shelflist with limited information on each title: field no. field 1 call number 2 author 3 title 4 lc number-or nolc if none available 5 title number 6 publication date (if available) 7 ser-serial indicator. when present indicates the title is a serial. 8 neng-non-english indicator. when present indicates the title is non-english. 9 size-oversize indicator. when present indicates the book is an oversize book. 268 journal of library automation vol. 7 i 4 december 197 4 field field no. 10 portxx:xx-portfolio number in which book is located (main library only). 11 mono-monographic set indicator. when present indicates 12 13 14 15 16 17 18 19 20 21 22 title has been designated a monographic set. number of holdings (not displayed if copy 1, main library) reference line number volume number copy number holdings· condition code library location patron identification number of specific saves for the copy circulation status date charged in the form of year, month, day date due in the form of year, month, day the system, modified from time to time, provided access by call number, record number, or author-title with an algorithm consisting of the first four letters of the author's name plus the first five letters of the title. a title search was also possible by entering four letters of the first significant word and five letters of the second significant word or five dashes. as soon as the system was implemented, it was immediately evident that the search option was one of the most important features of the system. the circulation clerk at any location either in the main library or in any department library could search the author and title and find: ( 1) if the osu libraries had the book; ( 2) where it was regularly housed; and ( 3) its status (charged out, missing, lost, or available for circulation). all of this was possible without checking the card catalog except when problems of identifying the main entry existed. the immediate lack was, of course, the subject approach. as use of the system continued and library personnel became more sophisticated, various procedures offering some kind of subject approach were developed. the title search option is one possibility for finding subject access. for example, to find a book on "evolution" one can enter the title search command tls/evol----and receive a report that there are 757 titles in which evolution is the first significant word. the terminal will then print out items as follows: tls/evol----page 1 757 matches 01 lan, h. j. 02 moody, paul amos. 1903 03 brosseau, george e 04 adler, irving 05 lotsy, j. p. 0 skipped evolutie (not all retrieved) introduction to evolution evolution evolution evolution 1946 1970 1967 1965 subject access/clark 269 06 smith, john maynard, 192007 miller, edward on evolution evolution evolution evolution evolution . 1972 1917 19-1924 1951 08 watson, j. a. s. 09 kellogg, v. l; 10 shull, a. franklin when the user types in pg2 or pg3, more titles will come up, and if more than thirty titles are desired, the original command can be reentered with a /skip 30 option to display others including all 757 titles if necessary. it is also possible to manipulate this option further since this first. search may tum up the name of an author recognized as an authority on the subject. in this case, when thomas huxley's evolution and ethics appears, the terminal attendant changes to an author-title search, ats/huxlevolu, and finds eight matches, four books by thomas huxley and four by julian sorell huxley on the same subject: ats/huxlevolu page 1 8 matches 01 huxley, thomas henry 02 huxley, thomas henry 03 huxley, julian 04 huxley, thomas henry 05 huxley, julian sorell 06 huxley, thomas henry 07 huxley, julian sorell 08 huxley, julian sorell 0 skipped (all retrieved in 1) evolution and ethics, and other essays evolution and ethics and other essays evolution, the modern synthesis evolution and ethics and other essays evolution as a process evolution and ethics and other essays evolution in action 1st ed evolution as a process 2d ed 1970 1916 1942 1897 1954 1896 1953 1958 to find the call number of any of these, the attendant merely enters a detailed line search dsl/1: dsl/1 hm106h91896a huxley, thomas henry evolution and ethics, and other nolc 902452 1970 1 01 001 3week und page 1 end the ability to search by a word in the title, which in the above example gives a form of kwic subject index, is even more specific if two words are used. for example, the attendant may enter tsl/chilpsych to bring up titles containing the words "child" and "psychology" as follows: tls / chilpsych page 1 52 matches 0 skipped (not all retrieved) 01 jersild, arthur thomas, 1902child psychology. 4th 1954 02 jersild, arthur thomas, 1902child psychology 5th ed 1960 270 journal of library automation vol. 7/4 december 1974 03 thompson, george greene, 191404 kanner, leo 05 curti, margaret (wooster) 06 clarke, paul a 07 greenberg, harold a 08 english, horace bidwell 09 chess, stella 10 curti, margaret (wooster) child psychology 1952 child psychiatry 3d ed 1957 child psychology 1930 child-adolescent psychology 1968 child psychiatry in the commun 1950 child psychology 1951 an introduction to child psych 1969 child psychology 2d ed 1938 the obvious subject approach is, of course, by call number. the system contains an option that permits a search on the general call number. the operator may enter either a real or an imaginary call number and receive the fifteen titles preceding and the fifteen titles subsequent to it in the shelflist. for example, with the command sps/hm106h9, using the call number from the previous example, the following ten titles will appear with that call number as the central item: sps/hm106h9 11 hm106g77 graubard, man the slave and master 12 hm106h3 haycraft, darwinism and race progress 13 hm106h57 herter, c. biological aspects of human problems 14 hm106h6 hill, g. c. heredity and selection in sociology 15 hm106h63 hoagland, evolution and man's progress 16 °hm106h9 17 hm106h91896 huxley, evolution and ethics and other essays 18 hm106h91896a huxley, evolution and ethics and other essays 19 hm106h91897 huxley, evolution and ethics and other essays 20 hm106h91916 huxley, evolution and ethics and other essays 21 hm106k29 keller, societal evolution; a study of the evolutionary basis page 2 input:hm106h9 entering pgl will bring up the ten preceding titles and pg3 the ten sub:sequent titles. one of the best features of this system is that the patron may call in by telephone and have at least some of this information read to him; if he is at a circulation area, he may receive a printout as an instant bibliography. recently an attempt has been made to use the file of data in other ways. in an attempt to provide better access to the main campus collection for the people at the five regional campuses of the university, an experiment was tried using a computer printout of certain selected parts of the shelflist. since microfiche is less expensive and more compact to handle, there were good reasons for using this form rather than the paper printout form. this was an obvious application for computer output microfiche (com). once subject access/clark 271 a master frame has been produced by com, the cost of additional copies is negligible. in order to test acceptance of form more accurately, it was decided to provide a list in each form to test on sample populations. to cover some of the subjects taught at the agricultural and technical institute at wooster, a total of 20,672 titles were selected in the following areas: agricultural economics botany agriculture agricultural machinery wood technology woodworking hd1401-2210 qk10-942 s tj148g-1496 ts80g-937 tt13g-200 2,121 titles 1,039 titles 17,157 titles 6 titles 197 titles 152 titles these titles were printed in a hard-copy printout in the following format with a program designed by gerry guthrie of the research and development division of the osu libraries: call number = tj1496c3a3 title number = 196795 author = caterpillar tractor company title = fifty years on tracks publ. date = 1954 holdings = cool com regular lc number = 55-20529 the physical form of the resulting documents varied somewhat due to the fact that each subject area was put in one cover. this meant "agriculture" ( s) with 17,157 titles was too bulky to carry around, but "wood technology" was compact and easily carried to one's office or home for leisurely browsing. a brief questionnaire was used to test the reaction to the list. responses were received from 6 percent of the students and faculty at the agricultural and technical institute. with the usual assumption that some students are not library users, there was some validity to the sample. results tabulated from these questionnaires fell into three categories: ( 1) nature of use; ( 2) value of the list; and ( 3) response to its form and format. since some questions were left blank, the totals were often less than 100 percent. nature of use the responses turned out to be evenly divided between faculty and students, 46 percent for each with some leaving this question blank. the faculty indicated that two-thirds of the use was for themselves and one-third for the students. students, of course, used it totally for their own purposes. the actual purpose of the list had been envisioned as access to the main campus collection, and increases in interlibrary loans indicated that it was 272 journal of libmry automation vol. 7/4 december 1974 effective. loans during the month of october 1973 totaled four while november's loans totaled thirty-four, showing a marked difference after the delivery of this search tool on october 31. the questionnaire showed that 77 percent indicated they used the information for this purpose. it should not have been a surprise to librarians to find that 34 percent of the sample population used the information to order a duplicate copy for the wooster ati library, an indication of readers' known proclivity for wanting their material close at hand. users' evaluation the increase in interlibrary loans was probably a better reflection of the users' approval than the actual questionnaire results, although the results themselves were also highly positive. seventy-seven percent checked that they found it valuable, against 15 percent who did not. eighty-five percent said they wanted more lists. requests for additional suggestions included a request to keep it up to date and a request to limit it to just recently published items, while another person asked for all of the titles located in the agricultural engineering library. the requests indicated that several additional subject areas were wanted: communication skills, personnel management, human relations, use of airplanes in agricultural, irrigation, and drainage engineering, and environmental pollution. suitability of form and format some attempt was made to determine how people react to the admittedly inconvenient form of a computer printout. since financial considerations limited the possibilities to either this form or microfiche, those options were presented in the questionnaire. preference for the paper form was expressed by the users of the list in this form-84 percent to 8 percent who would have preferred microfiche. · the population was evenly divided as to whether or not they wished to have the list in this call number order-50 percent wanted it by straight shelflist or call number order and 50 percent wanted it alphabetically by author. the latter response may very well reflect the proportionally large number of respondents who were faculty and who supposedly would know the authors in their fields and do not use a subject approach when seeking materials. while the original purpose of the research was to provide better subject access to a remote collection, it was also important to find out more about the user's response to microfiche if he could be given an improvement in service or a service he did not previously have. microfiche would be both more compact and less expensive if lists of this type were to be provided in many subjects and continually updated. for the microfiche section of the research project the library of congress classifications covering classics and related fields were chosen, partly subject access/clark 273 on the basis that faculty in these areas had agreed to participate and encourage their students to use the list. included were: de1-de98 df101-df289 dg11-dg209 n563q-n5790 na20q-na335 pa-all z7001-z7005 history-the mediterranean world history-greece history-italy history of art-greek and roman architecture-history-greek and roman language and literature of greece and rome bibliographies in linguistics, roman and greek literature, teaching languages this subset produced about eleven thousand titles. the format of the com was the same as that on the paper printout, with general titles appearing at the top of each sheet or frame, e.g., shelflist-classics-greece. this took twenty-two microfiche with sixty-nine frames each listing seven or eight titles. the last frame on each fiche was an index to that fiche. a nonreduced (eyeball) character at the top listed the first call number on the fiche. it was envisioned that the user might know the general classification number, search for it by the eyeball character, then consult the index in the last frame to locate the proper frame for a specific class. in this way the user could browse through the subject area. the chief advantage of com lay in the fact that the small envelope of microfiche and a portable reader were easy to check out of the library and carry home or to an office where the user could browse through the library shelflist at a leisurely pace. since initial reaction was negative, a subject index was prepared to make the list more usable to undergraduate students. this index was made up of the appropriate entries which appeared in the library of congress classification schedules, with all entries consolidated into one alphabet. 1 using this index to find an entry-for example, "caesar, c. julius" -the student would find two areas to search: dg261-267 and pa6235-6269. he would find these areas on the microfiche with the eyeball characters, then search the index frame to find the appropriate pages. the classics list with its index and instructions was packaged in neat, loose-leaf notebook form and, together with a portable reader, presented to classics faculty at two regional campuses. a set was also available in the library. the results were completely negative. reliance upon the cooperation of too small a number of cooperating teachers may have invalidated this part of the research, but the contrast in response to the similar printed list raised serious questions about user response to microfiche in an index or reference book situation.2 it had been anticipated that a population in the humanities or social sciences would have had more need than the science group for what was essentially a book list since serial titles did not include 27 4 j oumal of library automation vol. 7 i 4 december 197 4 holdings. the complete lack of interest from the faculty in the field of classics was an unexpected disappointment but no firm conclusions could be drawn without a research strategy designed to remove any possible variables. conclusion increased use of marc cataloging through such systems as oclc and ballots will mean many more libraries will have their total holdings in machine-readable form with the capability of using their records in new ways. programs for distributing microfiche copies of library catalogs such as georgia tech's lends program provide inspiration for public service librarians to make use of the data and technology that technical services automation projects are supplying. 8 this experiment in manipulating machine-readable library records for use in subject searching was an attempt toward better retrieval of a library's collection and indicated that such programs would be useful to extend service outside a single library location. references 1. it may soon be possible to do this in a much simpler fashion by using the combined indexes to the libl'ary of congmss classification schedttles (washington, d.c.: u.s. historical documents institute, 1974). 2. doris bole£, "computer-output microfilm," special libraries 65:169-75 (april 1974). in describing the use of com at the washington university school of medicine, bole£ said, "there is, however, an additional disadvantage, namely, the resistance of users to the use of microforms because of their inconvenience. patrons will sometimes choose not to read a publication when told it is available in some sort of microform only. it is assumed that librarians are not quite as reluctant, but it would be a mistake not to take this reluctance into consideration. this resistance by both librarians and patrons is stronger than is usually reported by com manufacturers and service bureaus" ( p.170-71). 3. the georgia tech libl'ary's complete card catalog is now available in microfiche form, brochure (atlanta: price gilbert memorial library, georgia institute of technology, 1972). 324 journal of library automation vol. 7/4 december 1974 book reviews current awareness and the chemist, a study of the use of ca con.densates by chemists, by elizabeth e. duncan. metuchen, n.j.: scarecrow press, 1972. 150p. $5.00. this book starts with a five-page foreword by allen kent entitled "kwic indexes have come a long way-or have they?" kent is always interesting but when one detects that his foreword is becoming almost an. apologia, one wonders just what is to come. the remainder of the book (apart from the index) appears already to have been presented as dr. duncan's ph.d. thesis at the university of pittsburgh. the first two chapters are the usual sort of stuff, taking us from alexandria in the third century to columbus, ohio in 1970, with undistinguished reviews of user studies and the history of the chemical abstracts service. the remaining sixty-four pages of text report and discuss a study of the use of ca condensates by quite a small sample of academic and industrial chemists in the pittsburgh area. the objective appears to have been to compare profile hits with periodical holdings and interlibrary loan requests at the client's library so that a decision model for the acquisition of periodicals could be developed. on the author's own admission, this objective was not achieved. a certain amount of data is presented but it is difficult to draw many conclusions from it, other than the fact that chemists do not appear to follow up the majority of profile hits that they receive nor do they use the current issues of chemical abstracts very frequently. it is difficult to understand why this material was published in book form. it could have been condensed to one or possibly two papers for ].chem.doc. or perhaps even left for the really diligent seeker to find on the shelves of university microfilms-but, as the old testament scribe bemoaned, "of making many books there is no end." at the bottom of page 118 a reference is made to the paper by abbott et al. in aslib proceedings (feb. 1968); at the top of page 119 the same paper's date is given as january 1968. other errors are less obvious, but one really questions whether the provision of a short foreword and an index makes even a good thesis worth publishing in hard covers. r. t. bottle the city university london, u.k. computer-based reference service, by m. lorrai'ne mathies and peter g. watson. chicago: american library assn., 1973. 200p. $9.95. the archetypal title and model for all works of explication is ....... without tears. lorraine mathies and peter watson have attempted the praiseworthy task of explaining computer-produced indexes to the ordinary reference librarian, but for a number of reasons, some of them probably beyond the control of the authors, the tears will remai'n, perhaps one difficulty is that this book was, in its beginnings at least, the product of a committee. back in 1968 the information retrieval committee of the reference services division of the ala wanted to present to "working reference librarians the essentials of the reference potential of computers and the machine-readable data they produce" (p.xxix). the proposal worked its way (not untouched, of course) through several other groups and eventually resulted in a preconference workshop on computer-based reference service being given at the dallas convention of 1971. the present book is based on the tutor's manual which mathies and watson prepared for that workshop but incorporates revisions suggested by the ala publishing services as well as changes initiated by the authors themselves. with so many people getting into the planning act, it is not surprising that the various parts of the book should end up by working at cross purposes to each other. unfortunately, the principal conflicts come at just those points where a volume of exposition needs to be most definite and precise: just what is the book trying to do and for whom? at the original workshop, the eric data base was chosen as a "model system" since educational terminology was more likely to be understood than that of the sciences. and because the participants were to learn by doing, they were told a great deal about eric so as to be able to "practice" on it. the trouble is that these objectives do not translate well from workshop to print. the detafls about eric, which may have been necessary as tutors' instructions, seem misplaced in book form. almost half the present book is devoted to a laborious explanation of how eric works and this is a great deal more than most workaday reference librarians will want to know about it. moreover, it is no longer clear whether mathies and watson aim to train "producers" or "consumers." the welter of detail suggests that they expect their readers to learn hereby to construct profiles and to program searches but it is highly doubtful that skills of this kind can or should be imparted on a "teach yourself" basis. once mathies and watson leave eric behind, they seem on surer ground. part ii (computer searching: principles and strategies) begins with a fairly routine chapter on binary numeration which is perhaps unnecessary since this material is easily available elsewhere. however, the section quickly moves on to an excellent explanation of boolean logic and weighting, describes their application in the formulation of search strategies, and ends with an admirably succinct and demystifying account of how one evaluates the output (principles of relevance and recall). the reader might well have been better served if the book had indeed begun with this part. the last section (part iii: other machine readable data bases) is also very useful, particularly for the "critical bibliography" (p.153) in which the authors describe and evaluate ten of the major bibliographic data bases. this critical bibliography is apparently a first of its kind, which makes the authors' perceptive and frank comments all the more welcome. part iii also contains chapters on marc and the 1970 census but, sh·angely enough, does not include a final resume and conclusions. it is true that in each book reviews 325 chapter there is a paragraph or so of summary but this is hardly a satisfactory substitute for the overall recapitulation one would expect. in the final analysis, indeed, one's view of the book will depend on just thatwhat one expects of it. if "working reference librarians" expect to read this book in order to be no longer "intimidated by these electronic tools" (p.ix), they are apt to be disappointed. the inordinate emphasis on eric, the rather dense language, and the fact that the main ideas are never pulled together at the end will all prevent easy enlightenment. however, if our workaday reference librarians are willing to work their way through a fairly difficult manual on computer-based indexing as in effect a substitute 'for a workshop on the subject, they will find this book a worthwhile investment of their time-and tears. samuel rothstein school of lihl'arianship university of british columbia the circulation system at the university of missouri-columbia library: an evolutionary approach. sue mccollum and charles r. sievert, issue eds. the larc reports, vol. 5, issue 2, 1972. 101p. in 1958 the university of missouri-columbia library was one of the first libraries to mechanize circulation by punching a portion of the charge slip with book and borrower and/ or loan information. in 1964 an ibm 357 data collection system utilizing a modified 026 keypunch was installed, but not until 1966 was 026 output processed on the library owned and operated ibm 1440 computer. however, budgetary constraints forced a transfer of operations in 1970 to the data processing center, which undertook rewriting of library programs in 1971. after explanation of hardware changes and an overview of the circulation department organization and data processing center operation, this report deals in depth with the major files of the circulation system-circulation master flle and location master file-and the main components of the circulation system-edit, update, overdues, fines, interlibrary loans, 326 journal of libmry automation vol. 7/4 december 1974 address file, location file, reserve book, listing of files, special requests, and utility programs. many examples of report layouts are included, particularly those accomplished by utilizing data gathered from main collection and reserve book loans. although this off-line batch processing circulation system is limited in that it does not handle any borrower reserve or lookup (tracer) routines, both of which are possible in off-line systems, the university of missouri-columbia system has merit as a pioneer system which influenced other university library circulation system designs in the 1960s. detailed reference given throughout the report to changes in the original library programs not only makes it of value as a case history for any library interested in circulation automation but also indicates the important fact that library programs do change and evolve in response to new demands and technological capabilities. lois m. kershnm university of pennsylvania libraries national science information systems, a guide to science information systems in bulgaria, czechoslovakia, hungary, poland, rumania, and yugoslavia, by david h. kraus, pranas zunde, and vladimir slamecka. (national science information series) cambridge, mass.: the m.i.t. press, 1972. 325p. $12.50. as indicated by the title, this volume provides a comparative description and analysis of the various organizational or political structures which have been adopted by six counb·ies of central and eastern europe in their attempts to develop effective national systems for the dissemination of scientific and technical information. for each country there is a detailed account of the national information system now existing, with a brief outline of its antecedents, a directory of information or documentation centers, a list of serials published by these centers, and a bibliography of recent papers dealing with the development of information systems in that country. this main section of the book is preceded by a brief review of the common characteristics of the six national systems and an outline of steps being taken to achieve international cooperation for the exchange of information in specific subjects. of particular interest is the description of the international center of scientific and technical information established in moscow in 1969, and which is now linked to five of these national systems. no attempt is made to describe the techniques being used to store, retrieve, and disseminate information. the authors point out that the six countries being examined "have experimented intensely with organizational variants of national science information systems." unfortunately, they do not attempt to indicate which of these organizational structures was most effective in bringing about the desired results. undoubtedly, this would have been an impossible task and probably not worth the effort, since a successful type of organization in a socialist country would not necessarily be effective in a democracy. the book will be of interest to political scientists and to those seeking the most effective ways of coordinating the information processing efforts of all types of government bodies. it will be only of academic interest to the information specialist concerned primarily with information processing techniques. jack e. brown national science library of canada ottawa information retrieval: on-line, by f. w. lancaster and e. g. fayen. los angeles: melville publishing co., 1973. 597p. lc: 73-9697. isbn: 0-471-51235-4. have you been reading the asis annual review of information science and technology year after year and wishing for a compendium of the best information and examples of the latest systems, user manuals, cost data, and other facts so that you would not have to go searching in a library for the interesting reports, journal articles, and books? well, if you have (and who hasn't), your prayers have been answered if you are interested in online bibliographic retrieval systems. the authors of the handy reference book have collected and reprinted, among other things, the complete dialog terminal users reference manual, the supars user manual, the user instructions for aim-twx, obar, and the caruso tutorial program. each of these systems, and several others (arranged alphabetically from aim-twx [medline] to to xi con [toxline]), is described and illustrated. features and functions of on-line systems, such as vocabulary control and indexing, cataloging, instruction of users, equipment, and file design, are all covered in a straightforward manner, simply enough for the uninformed and carefully enough so that a system operator could compare his system's features and functions with the data provided. richly illustrated with tables, charts, graphs, and figures, up-to-date bibliographies (only serious omission noticed was the afips conference proceedings edited by d. walker), and subject and author indexes, this volume will stand as another landmark in the state-of-the-art review series which the wiley-becker & hayes information science series has come to represent. emphasis has been placed on the design, evaluation, and use of on-line retrieval systems rather than the hardware or programming aspects. several of the chapters have a broader base of interest than on-line systems, covering as they do performance criteria of retrieval systems, evaluating effectiveness, human factors, and cost-performance-benefits factors. easy to use and as up to date and balanced a book as any in a rapidly changing field can be, lancaster and fayen have given students of information studies and planners and managers of information services a very valuable reference aid. pauline a. atherton school of information studies syracuse university national library of australia. australian marc specification. canberra: national library of australia, 1973. 83p. $2.50. isbn: 0-642-99014-x for those readers who are familiar with book reviews 327 the library of congress marc format, the australian marc specification will be, for the most part, self-explanatory. the intent of the document is to describe the basic format structure and to list the various content designators that are used in the format. no effort was made to include any background information or explanation of data elements. because of this, the reviewer found it necessary to refer to other documents, e.g., precis: a rotated subiect index system, by derek austin and peter butcher, in order to complete a comparative analysis of the australian format with other similar formats. perhaps the value of reviewing a descriptive document of this type lies in discovering how the format it describes compares to other existing formats developed for the same purpose. the international organization for standardization published a format for bibliographic information interchange on magnetic tape in 1973, international standard iso 2709, the australian format structure is the same throughout as the international standard. the only variance is in character positions 20 and 21 of the leader, which the australian format left undefined. a comparison of content designators cannot be made with the international standard because it specifies only the position and length of the identifiers in the structure of the format, but not the actual identifier (except for the three-digit tags 001-999 that identify the data fields). the best comparison of content designators can be made with the lc marc format, since the australian format uses many of the same tags, indicators, and subfield codes for the same purposes. the australian format has assigned to the same character positions the same fixed-length data elements as the lc format except for position 38, which is the periodical code in the australian format and the modified record code in the lc format. in the fixed-length character. positions for form of contents, publisher (government publication in lc marc), and literary text (fiction in lc 328 journal of library automation vol. 7/4 december 1974 marc) , the australian format assigned different codes than lc. in general, the australian format uses the same three-digit tags as lc to identify the primary access fields in their records, e.g., 100, 110, 111 for main entries; 400, 410, 411, 440, 490 for series notes; 600, 610, 611, 650, 651 for subject headings; and 700, 710, 711 for added entries. for the remaining bibliographic fields there are some variations in tagging between the two formats. the australian marc has chosen a different method of identifying uniform titles, and has identified five more note fields in the 5xx series of tags than has lc. the australians have also added some manufactured fields to their record. these fields do not contain actual data from the bibliographic record, but rather are fields consisting of data created by program for control and manipulation purposes, or from lists such as the precis subject index. the australian format has also included, as part of its record, a series of cross-reference fields identified by 9xx tags. lc has reserved the 9xx block of tags for local use. the use of indicators differs in most instances between the two formats. both allow for two indicator positions in each field as specified by the international standard format structure. however, the information conveyed by the indicators differs except where the first indicator conwhich means no intelligence carried in this position. in the australian format the indicators in the 6xx block of tags have three different patterns. inconsistency of this kind does not tend to destroy compatibility with other coding systems using the same format structure, as long as sufficient explanation and examples are given from which conversion tables may be developed by the institutions with whom one wants to exchange, or interchange, bibliographic data. an even greater degree of difference exists between the two formats in the subfield codes used to identify data elements. the australian marc has identified some data elements that lc has not, e.g., in personal name main entries, the australian record identifies first names with subfield code "h," whereas lc does not identify parts of a personal name, only the form of the name, i.e., forename form, single surname, family name, etc. in most of the fields the two formats have defined some of the same data elements, but each uses a different subfield code to represent the element. in the australian document, under each field heading, the subfield codes are listed alphabetically with a data element following each code. this arrangement causes the data elements to fall out of their normal order of occurrence in the field, i.e., name, numeration, titles, dates, relator, etc. for example: personal name main entry (tag 100) subfield code a b amtralian marc entry element ( name) relator lc marc entry element (name ) numeration c dates d e second or subsequent additions to name numeration titles ( honorary) dates relator f additions to name other than date date (of a work) veys form of name for personal and corporate name headings. within each block of tags, lc has made an effort to remain consistent in the use of indicators, e.g., in the 6xx block for subject headings, the first indicator specifies form of name where a form of name can be discerned. where no form of name is discernable such as in a topical subject heading (tag 650), a null indicator or blank is used the example demonstrates the need for precise definition and documentation of data elements for the purpose of conversion or translation when interchanging data with other institutions. the australian format has included the capability of identifying analytical entries by using an additional digit (called the level digit) placed between the tag and the indicators to identify the analytical entries. a subrecord directory (tag 002) is present in each record containing data for analytical entries. the australian document includes appendixes for the country of publication codes, language codes, and geographical area codes that were developed by the library of congress. their only deviabook reviews 329 tion from lc marc usage is in the country of publication codes, where the australians have added entities and codes for australian first-level administrative subdivisions. patricia e. parker marc development office library of congress mitchell multimedia will have a profound effect on libraries during the next decade. this rapidly developing technology permits the user to combine digital still images, video, animation, graphics, and audio. it can be delivered in a variety of finished formats, including streaming video on the web, video on dvd/vcd, embedded digital objects within a web page or presentation software such as powerpoint, utilized within graphic designs, or printed as hardcopy. this article examines the elements of multimedia creation, as well as requirements and recommendations for implementing a multimedia facility in the library. t he term multimedia, which some may remember being used in the early 1970s as the name for slide shows set to music, now is used to describe “a number of diverse technologies that allow visual and audio media to be combined in new ways for the purpose of communicating.”1 almost all personal computers sold today are capable of viewing multimedia; many can, with minor modifications, also create multimedia. one of the most important features of multimedia is its flexibility. multimedia creation has several distinct elements—inputs, processes performed on those inputs, and outputs (see figure 1). each element can be described as follows. � inputs—new video can be recorded, or existing video, stored on a hard disk, cd/dvd, or tape can be imported. the same is true of audio, with the added flexibility of creating soundtracks or sound effects later, during the editing process. digital still images can be used, either shot on a camera or created by scanning an existing picture. digital artwork or animated sequences created in other software also can be brought in. � processing—regardless of the source, these digital inputs are loaded into the editing software. at this stage, the user will select and arrange the images and sounds, and the software may permit special effects to be created. in addition, the editing software may compress the file so that it is easier to use than the large file sizes used in raw video and audio recording. � outputs—at this point, the user has more choices to make. the new multimedia file can be sent to a program that will encode it for a streaming video in any one of a variety of popular formats, such as windows media, realmedia, or clipstream. then it can be mounted on a web site (either a regular page or within courseware such as webct or blackboard), or the file could be burned onto a cd or dvd, or it could be used within presentation software such as microsoft powerpoint. or the output file from the editing process could be encoded and embedded so that it is an avatar running as part of a web page with a product such as rovion bluestream. the possibilities are nearly endless. all of this is made possible by advances in technology on a variety of fronts. one of the happy anomalies in technology is that greater performance is frequently accompanied by lower costs. this is certainly the case with much of the activity surrounding multimedia. the following factors have fostered advances in multimedia: � increase in processing power and decrease in cost of computer hardware; � quality and affordability of video equipment; � compression of multimedia files; � consumer broadband internet access; and � current multimedia editing software the first two technology factors concern the equipment involved in multimedia production. leading off is the familiar, ever-increasing speed of processors and improved memory and hard-drive space, all delivered for less money. this trend is something that many people take for granted, but a reality check is sometimes in order. the processor in the typical desktop machine on advertised special today is approximately forty-four times as fast as the first pentium processor sold ten years ago, and is equipped with sixteen times as much ram and 117 times as much hard-drive space—at 20 percent of the cost of the old machine (not even adjusted for inflation!). the second factor is the incredible quality available in consumer-market video equipment at reasonable costs. while the images produced with consumer-grade video would not play well at the local megaplex movie theater, they look very good on the small screens found on computers, televisions, and classroom projectors. the third factor is that tremendous compression of multimedia files can be achieved during the editing process. an incoming raw-video file (in the standard .avi format) can be compressed with editing, encoding, and dedicated third-party compression software to an incredible 1 to 2 percent of its original size, and it will still retain very good quality as a digital object on the web and in other desktop viewing applications. the fourth factor is extremely critical for the success of multimedia web applications. home access is shifting away from dial-up access to broadband, with its greatly increased transfer rates. half of all united states homes with internet access are already using broadband, and the 32 information technology and libraries | march 2005 gregory a. mitchell (mitchellg@utpa.edu) is assistant director, resource management at the university of texas—pan american library, edinburg, texas. distinctive expertise: multimedia, the library, and the term paper of the future gregory a. mitchell forecast is for steady increase in these numbers.2 although not all broadband is created equal, it is all significantly faster than dial-up access. the final technology factor concerns the software that is currently available to the multimedia web developer. a developer can achieve some quite professional results with even the most basic products, and then can grow into more complex software that supports increasing levels of expertise. once again, this software is being sold in the price range that typical consumers can afford. � small really is beautiful creating a multimedia lab in the library need not be a large, complex undertaking. in fact, it can be very low cost and as simple as a single workstation. so it is scalable, allowing the library to start small and build in complexity and cost as time, money, and human resources will permit. at the bare-bones minimum, a multimedia lab would consist of a workstation with the software necessary for acquiring, editing, and outputting the files. for practical purposes, though, the workstation should be equipped with a network connection, a cd/dvd burner, a scanner, and a webcam with microphone. another very useful option is an analog-digital bridge device, which enables the capture of analog input (such as vhs tape) into digital files for the editor. to achieve better-quality video when shooting original content, a digital-video camera, tripod, wireless microphone, and portable light kit would be recommended. since more time typically is spent at the editing station than with the camera, the lab can be expanded with additional workstations before investing in another camera. experience at the author’s institution has shown that it is possible to operate a lab with ten workstations and only three video cameras and three still cameras. finally, output from the editing process will likely be printed, so a photoquality printer is another convenient option. this illustrates that the entry into multimedia work need not be a large expense, especially if an existing workstation and any other equipment is already available. if a fairly recent workstation is available to dedicate to the project, the library’s total startup cost could range from $200 to $1,000. not many new library services can be launched for as little as that. rather than dwell on equipment specifications, as that is not the intent of this discussion, the reader may consult the excellent tutorials available from desktop video and pc magazine’s online product guide.3 finally, the creation of a studio is a worthwhile option. although some video will need to be shot on location, many times it is possible to set up and shoot in just one place. a studio is the best place in which to work because it is a controlled environment. it does not need to be large or complicated, and a quiet office or study room can be set up with little effort and expense. the studio gives the users control over the sound and the lighting, and involves minimal setup time for projects. � the research paper of the future multimedia has begun to attract attention in the library community. joe janes, chair of library and information science at the information school at the university of washington and the person responsible for developing the internet public library, recently stated he foresees a growing role for multimedia in the library. it will replace much of the traditional, text-based communication that people are accustomed to. for example, multimedia projects can become the research paper of the future for students.4 it is the media in which many library customers will be working. experience from the author’s institution with creating a multimedia lab would seem to confirm his observation. during the first year and a half of operation, use of the lab has steadily increased (see figure 2). � collaboration the multimedia lab opens the doors to collaborative opportunities with faculty and students from a variety of disciplines across campus. this is because multimedia, like geographic information systems (gis) or other electronic information and communication technologies, is a tool and is not discipline-specific. as important as it is to make the connection with faculty, this media is something with which the students will frequently lead the figure 1. multimedia creation process distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 33 34 information technology and libraries | march 2005 way. they are, after all, the mtv generation, and multimedia has an incredible appeal to their visual orientation. faculty themselves have used it to augment their web-based courses as well as traditional classroom instruction. the author ’s library has even initiated a multimedia résumé service for graduating students. the students can record a video introduction of themselves, encode this as a rovion bluestream avatar, and post it with their résumés on the web. this creates a much stronger impression than a standard résumé, hopefully giving the students an edge in promoting themselves on the job market. even more impressive is the variety of projects that are created in the lab by the students. one might expect to see interest from students in art and communications classes, but students come from many other disciplines as well. for example, business students have effectively used multimedia in their graduate-school business-plan presentations, while biology students like to use the graphics capabilities to study close-ups of slides. education students have employed it to produce multimedia instructional aids, and a sociology student put together a presentation on underserved, low-income neighborhoods. the library supplies the facility and instruction—only the imagination of students is needed. libraries have always been involved in the students’ research and writing process, by providing content, instruction, and facilities for producing the final research product. the same is true in the multimedia environment, although implementing a multimedia lab calls for some new skills for librarians. these include familiarity with basic principles of videography, learning how to use the cameras and other equipment, and gaining some mastery of the editing and encoding software. � why put it in the library? in addition to the research-paper analogy, the author believes that librarians can point with pride to the values and value that libraries offer their communities. it is a central and neutral location—not in one department’s or college’s turf. libraries are conveniently open for many hours per week. many of the information resources that students might use to prepare the presentation are in the library. and librarians have a professional ethic that drives them to provide instruction and assistance for the services the library offers. since multimedia production does have a learning curve and most new users need help in mastering the technology, it does not fit very well with the typical 24/7 drop-in computer lab that the campus information technology (it) often operates. this is a good opportunity for librarians to recognize some of their strengths and capitalize on them. in addition, this can be a breath of fresh air for librarians. here is an opportunity to learn about something new and creative. most people find that they have less room for creativity as time goes by.5 with a multimedia lab in the building, it will offer the librarians the opportunity to create multimedia productions for the library, besides assisting students and faculty with their projects. � potential problems there are some obstacles to overcome, of course. they need not be seen as major, but it is best to be realistic when beginning any new venture. it is almost always a good idea to start small, with a pilot project that will yield valuable lessons before venturing into anything big. � equipment—define what specifications are needed, see what is already available to use or borrow, then figure out what you will actually need to buy. � software—check out the variety of software for editing and production; think about how you want to begin using multimedia (primarily on the web, in presentation software such as powerpoint, as standalone videos on cds and dvds). � money—if funding permits, a library can invest several thousand dollars in a high-end multimedia computer, associated peripherals such as a color printer and one or more scanners, and a software suite to meet initial anticipated demands for multimedia creation and editing. if funding is scarce, you may want to investigate what existing equipment could be used in support of a pilot project. � location—this needs some space of its own, accessible to students and monitored by staff. although the figure 2. university of texas—pan american library multimedia lab usage editing workstation could be in an area with other computers, a quiet area is needed for shooting video so that there will not be interference from noise and unwanted foot traffic through the shots. � staffing and training—a multimedia lab is not a good candidate for self-service. librarians and staff who will provide the service need to learn how to use the equipment and software. make sure that they all have an acceptable level of competence and confidence so that the library can shine with its new service, but expect that everyone will need to continue to learn and grow in their proficiency. if your library plans to produce its own multimedia sessions as well, it would be a good investment to attend a class on television or video production. � hours—how many hours per week will the new service be available? if it is the entire time the library is open, be prepared to train plenty of staff. repeat users will need less help as their skills increase (by the way, some of these students can be great workstudy employees). � instruction—plan to offer formal orientation and instruction sessions to faculty and their classes. if your lab is small, this is challenging, but it can be accomplished with some creativity. for example, a general instruction session on concepts can be done in a classroom, followed up by a series of small groups working by appointment for the appliedlearning component in the multimedia lab. the author and a colleague have even done instruction outside the library using laptops and cameras, creating a de facto mobile studio. � copyright—if there are already vcrs or photocopiers in the library, you have had to deal with this issue. the pan american library at university of texas does not allow people to use its lab to copy movies, which is a request that surely will come to you, and we post the usual copyright notices just as we do at our photocopiers. for some excellent information on copyright, visit the american library association web site (www.ala.org). � evaluation—plan on at least basic evaluation of the service. this can include an assessment of the effectiveness of the instruction sessions, a survey of satisfaction with the lab itself, a questionnaire on the intended uses of the multimedia projects, demographic data on the students, or other student input. logs of the number of uses and peak-demand periods are extremely useful for planning and for justifying further expenditures and staffing requests. � flexibility for the future—whatever you do in a pilot phase, always keep in mind that you want to keep an open mind—you are trying to learn from the experience so that you can make good decisions for the direction of this new service. it may not go exactly the way you originally thought, because of serendipity, or changes in technology, or very strong demand from some segments of the campus instead of others, or other environmental factors. � conclusion benefits to the library from the multimedia lab are many. one of the most important benefits is that it keeps the library involved in the process of academic communication, as the medium of the communication changes with technology. by being involved in this evolving medium at its early stages, the library is poised to pounce on opportunities to employ it to the benefit of the library in instruction and content delivery. the library also would position itself on campus as a key player in it and the leading local expert in the growing field of multimedia. since multimedia is a tool that crosses the entire range of subject disciplines on campus, it opens the doors of faculty to collaborate with librarians in exciting new ways. just as many campuses already have learning and collaborative communities that grew around their web courseware or gis endeavors, so too can one develop around multimedia. the appendix offers a list of multimedia web sites to consider. libraries are more than warehouses of books and periodicals. as more and more of our resources have been made available electronically, and indeed more of higher education has moved to electronic delivery, many libraries have been faced with declining gate counts, circulations, and reference statistics. as someone observed, we are victims of our own success. so what is the role of the library? we are intrinsically involved in the process of instruction, academic research, and communication. as kling observed, “one important strategic idea is that libraries configure their it services and activities to emphasize the distinctive expertise of their librarians rather than simply concentrate on the size and character of the documentary collection.”7 it is imperative therefore that libraries pick out the new trends that will allow them to excel by capitalizing on their traditional strengths. references 1. scala, inc. multimedia directory. accessed apr. 21, 2004, www.scala.com/multimedia/multimedia-definition.html. 2. nielsen/netratings as of june, 2004. accessed aug. 10, 2004, www.websiteoptimization.com/. 3. about.com, dvt101. accessed apr. 15, 2004, http:// desktopvideo.about.com/library/weekly/aa040703a.htm; “anatomy of a video editing workstation,” pc magazine. accessed apr. 16, 2004, www.pcmag.com/article2/0,1759,1264650 ,00.asp. distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 35 36 information technology and libraries | march 2005 4. college of dupage, “joe janes and colleagues: preparing for the future of digital reference,” a satellite broadcast from the college of dupage, 16 apr. 2004. 5. sandra kerka, creativity in adulthood (columbus, ohio: eric clearinghouse on adult career and vocational education, eric digest no. 204, ed429186, 1999). 6. american library association, “copyright issues, primer on the digital millennium.” accessed may 10, 2004, www.ala .org/ala/washoff/woissues/copyrightb/dmca/dmcprimer.pdf. 7. rob kling, “the internet and the strategic reconfiguration of libraries,” library administration & management 15, no. 3 (summer 2001): 144–51. appendix. for further reading: a multimedia web-site tour the following is a sampling of some of the most popular and interesting multimedia software, with examples of completed productions. this is not an official endorsement of any one product over another, whether listed here or not. a look at these sites will, however, give the reader an idea about the power and possibilities of multimedia communications. adobe (www.adobe.com) the well-known makers of some of the most powerful and popular editing software packages for graphics and video. camtasia (www.camtasia.com) easy to use, this is a good example of the type of software that does screen capture and recording, which is handy for producing online tutorials. clipstream (www.clipstream.com) an excellent example of the type of newer encoding software that achieves incredible compression of video and delivers it over the web with no viewer or plug-ins required for the user. finalcut pro (www.apple.com/finalcutpro) a perennial favorite among the mac crowd, this software is relatively easy to learn and lets the developer achieve dramatic results. flashants (www.flashants.com) a handy program that converts flash animation into .avi video format so that you can integrate animated sequences into a video production. macromedia (www.macromedia.com) the makers of flash and director, which are some of the most popular graphics, animation, and mulitimedia editing tools in the business. pinnacle (www.pinnaclesys.com) what finalcut pro is to the mac, this package is for the pc environment. easy to use, yet sophisticated in the results achieved. rovion (www.rovion.com) rovion bluestream is an encoder that enables the creation of avatar characters to appear live on your web page. a plugin is required for the user, but this approach definitely gets attention. serious magic (www.seriousmagic.com) an award-winning software package that allows you to turn a workstation into a studio, complete with teleprompter capability, sound effects, graphics, and editing. university of texas—pan american library (www.lib.panam.edu/libinfo/media.asp) links to multimedia projects at the author’s institution, including productions made by staff and students. reproduced with permission of the copyright owner. further reproduction prohibited without permission. is this a geolibrary? a case of the idaho geospatial data center jankowska, maria anna;jankowski, piotr information technology and libraries; mar 2000; 19, 1; proquest pg. 4 is this a geolibrary? a case of the idaho geospatial data center maria anna jankowska and piotr jankowski the article presents the idaho geospatial data center (igdc), a digital library of public-domain geographic data for the state of idaho. the design and implementation of igdc are introduced as part of the larger context of a geolibrary model. the article presents methodology and tools used to build igdc with the focus on a geolibrary map browser. the use of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i n the era of integrated transnational economies, demand for fast and easy access to information has become one of the great challenges faced by the traditional repositories of information-libraries. globalization and the growth of market-based economies have brought about, faster than ever before, acquisition and dissemination of data, and the increasing demand for open access to information, unrestricted by time and location. these demands are mobilizing libraries to adopt digital information technologies and create new methods of cataloging, storing, and disseminating information in digital formats. libraries encounter new challenges constantly. participation in the global information infrastructure requires them to support public demand for new information services, to help the society in the process of selfeducation, and to promote the internet as a tool for sharing information. these tasks are becoming easier to accomplish thanks to the growing number of digital libraries. since 1994, when the digital library initiative originated as part of the national information infrastructure program, the internet has accommodated many digital libraries with spatial data content. for example, the electronic environmental library project at the university of california, berkeley (http:/ /elib.cs. berkeley.edu/) provides botanical and geographic data; the university of michigan digital library teaching and learning project (www.si.umich.edu/umdl/) focuses on earth and space sciences; the carnegie mellon's informedia digital video library (www.informedia. cs.cmu.edu) distributes digital video, audio, and images maria anna jankowska (majanko@uidaho.edu) is associate network resources librarian, university of idaho library, and piotr jankowski (piotrj@uidaho.edu) is associate professor, department of geography, university of idaho, moscow, idaho. 4 information technology and libraries i march 2000 with text; and the alexandria digital library at santa barbara (http:/ /alexandria.sdc.ucsb.edu/) provides geographically referenced information. the alexandria digital library is of special interest in this article because it implements a model of a geolibrary. a geolibrary stores georeferenced information searchable by geographic location in addition to traditional searching methods such as by author, title, and subject. the purpose of this article is to present the idaho geospatial data center (igdc) in the larger context of a geolibrary model. igdc is a digital library of publicdomain geographic and statistical data for the state of idaho. the article discusses methodology and tools used to build igdc and contrast its capabilities with a geolibrary model. the usage of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i geographic information systems for public services terms such as digital, electronic, virtual, or image libraries have existed long enough to inspire diverse interpretations. the broad definition by covi and king concentrates on the main objective of digital libraries, which is the collection of electronic resources and services for the delivery of materials in different formats.1 the common motivation for initiatives leading to the development of digital libraries is to allow conventional libraries to move beyond their traditional roles of gathering, selecting, organizing, accessing, and preserving information. digital libraries provide new tools allowing their users not only to access the existing data but also to create new information. the creation of new information using the existing data sources is essential to the very idea of the digital library. since the information in a digital library exists in virtual form, it can be manipulated instantaneously by computer-based information processing tools. this is not possible using traditional information media (e.g., paper, microfilm) where the information must first be transferred from non-digital into digital format. since late 1994, when the u.s. national science foundation founded the alexandria digital library project, the number of internet sites devoted to spatially referenced information has grown dramatically. today, it would require a serious expenditure of time and effort to visit all geographic data sites created by state agencies, universities, and commercial organizations. in 1997 karl musser wrote, "there are now more than 140 sites featuring interactive maps, most of which have been created in the last two years." 2 this incredible boom in publishing reproduced with permission of the copyright owner. further reproduction prohibited without permission. spatial data is possible thanks to geographic information system (gis) technology and data development efforts brought about by the rapidly increasing use of gis. this new technology provides its users with capabilities to automate, search, query, manage, and analyze geographic data using the methods of spatial analysis supported by data visualization. traditionally, geographic data were presented on maps considered as public assets. according to a norwegian survey, the aggregate benefit accrued from using maps was three times the total cost of their production, even though maps provided only static information.3 today, the conventional distribution of geographic data on printed maps has become less efficient than distributing them in the digital format through wide area data networks. this happened largely due to gis's ability to separate data storage from data presentation. as a result, data can be presented in a dynamic way, according to users' needs. often gis is termed "data mixing system" because it can process data from different sources and formats such as vector-format maps with full topological and attribute information, digital images of scanned maps and photos, satellite data, video data, text data, tabular data, and databases. 4 all of these data types provide a rich informational infrastructure about locations and properties of entities and phenomena distributed in terrestrial and subterrestrial space. the definition of gis changes according to the discipline using it. gis can be used as a map-making machine, a 3-d visualization tool, and as an analytical, planning, collaboration, and business information management tool. today, it is hard to find a planning agency, city engineering department, or utility company (not to mention individual internet users) that has not used digital maps. this is why the number of users seeking spatial data in digital format has increased so dramatically. data discovery can be for gis users the most time-consuming part of using the technology. 5 as a result, libraries are faced with the growing demand for services that help discover, retrieve, and manipulate spatial data. the web greatly improved the availability and accessibility of spatial data but, at the same time, stimulated public interest in using geographic information. the continuing migration to popular operating systems (i.e., microsoft windows family) and the adoption of their common functionality has brought gis software to many desktops. tools such as arcview gis from environmental systems research institute, inc. (esri, www.esri.com) or maplnfo from maplnfo corporation (maplnfo, www.mapinfo.com) have become popular gis desktop systems. new software tools such as arcexplorer, released by esri, are focused on making gis more accessible, simpler, and available for use by the public. by taking advantage of the popularity of the web, attempts are being made to gain a wider acceptance of gis. in the wake of the simplification of gis tools and improved access to spatial data, a new exciting area of gis use has recently emerged-public participation gis.6 public participation gis by definition is a pluralistic, inclusive, and nondiscriminatory tool that focuses on the possibility of reducing the marginalization of societies by means of introducing geographic information operable on a local level.7 it promotes an understanding of spatial problems by those who are most likely to be affected by the implementation of problem solutions, and encourages transfer of control and knowledge to these parties. this approach leads to a broader use of gis tools and spatial data and creates new challenges for libraries storing and serving geographic data in digital formats. broadening the use of data and gis tools requires attention to data access. traditional libraries have often fulfilled the crucial role of being an impartial information provider for all parties involved in public decision-making processes. will they be capable of serving the society in this capacity in the digital age? i geolibrary as a repository of georeferenced information according to brandon plewe, the user of spatial data can choose among seven types of distributed geographic information services available on the intemet. 8 they range from raw data download, through static map display, metadata search, dynamic map browsing, data processing, web-based gis query and analysis, to net-savvy gis software. yet, another important new category of geographic data service that can be added to this list is geolibrary. goodchild defines a geolibrary as a library filled with georeferenced information where the primary basis of representation and retrieval are spatial footprints that determine the location by geographic coordinates. "the footprints can be precise, when they refer to areas with precise boundaries, or they can be fuzzy when the limits of the area are unclear." 9 according to buttenfield, "the value of a geolibrary is that catalogs and other indexing tools can be used to attach explicit locational information to implicit or fuzzy requests, and once accomplished, can provide links to specific books, maps, photographs, and other materials." 10 a geolibrary is distinguished from a traditional library in being fully electronic, with digital tools to access digital catalogs and indexes. it is anticipated that most of the information is archived in digital form. the value of a geolibrary is that it can be more than a traditional, physical library in electronic form.11 is this a geolibrary? i jankowska and jankowski 5 reproduced with permission of the copyright owner. further reproduction prohibited without permission. since its introduction, the concept of a geolibrary has been synonymous with the alexandria digital library (aol) project. once aol was defined as the internetbased archive providing comprehensive browsing and retrieval services for maps, images, and spatial information.12 a more recent definition characterizes aol as a geolibrary where a primary attribute of collection objects is their location on earth, represented by geographic footprints. a footprint is the latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary.13 according to goodchild (1998) a geolibrary' s components include: • the browser-a specialized software application running on the user's computer and providing access to geolibrary via a computer network. • the basemap-a geographic frame of reference for the browser's searches. a basemap provides the image of an area corresponding to the geographical extent of geolibrary collection. for the worldwide collection this would be the image of the earth. for the statewide collection this could be the image of a state. the basemap may be potentially large, in which case it is more advantageous to include it in the browser then to download it from a geolibrary server each time a geolibrary is accessed. • the gazetteer-the index that links place names to a map. the gazetteer allows geographic searches by place name instead of by area. • server catalogs-collection catalogs maintained on distributed computer servers. the servers can be accessed over a network with the browser, utilizing basic server-client architecture. the value of a geolibrary lies in providing open access to a multitude of information with geographic footprints regardless of the storage media. because all information in a digital library is stored using the same digital medium, traditional problems of physical storage, accessibility, portability, and concurrent use (e.g., many patrons wanting to view the one and only copy of a map) do not exist. i idaho geospatial data center in 1996, inspired by the aol project, a team of geographers, geologists, and librarians started to work on a digital library of public-domain geographic data for the state of idaho. the main goal of the project was the development of a geographic digital data repository accessible through a flexible browsing tool. the project 6 information technology and libraries i march 2000 was funded by a grant from the idaho board of education's technology incentive program. the project resulted in the creation of the idaho geospatial data center (igdc, http://geolibrary.uidaho.edu). the first in the state of idaho, this digital library is comprised of a database containing geospatial datasets, and geolibrary software that facilitates access, browsing, and retrieval of data in popular gis data formats including digital line graph (dlg), digital raster graphics (drg), usgs digital elevation model (dem), and u.s. bureau of census tiger boundary files for the state of idaho. the site also provides an interactive visual analysis of selected demographic/economic data for idaho counties. additionally, the site provides interactive links to other idaho and national spatial data repositories. the key component of the library is the geolibrary software. the name "geolibrary" is not synonymous with the model of geolibrary defined by goodchild (1998). it was rather adopted as a reference to a geolibrary browser-one of the components of the geolibrary. the geolibrary browser (gl) supports online retrieval of spatial information related to the state of idaho. it was implemented using microsoft visual basic 5.0/6.0 and esri mapobjects technology. the software allows users to query an area of interest using a search based on map selection, as well as selection by area name (based on uses 7.5-minute quad naming convention). queries return gis data including dems, dlgs, drgs, and tiger files. queries are intended both for professionals seeking gis-format data and nonprofessionals seeking topographic reference maps in the drg format. the interface of gl consists of three panels resembling the microsoft outlook user interface. our intent in designing the interface was to have panels that would be used in the following order. first, the map panel is used to explore the geographic coverage of the geolibrary and to select the area of interest. next, the query panel is used to execute a query, and finally the result panel allows the user to analyze results and to download spatial data. users can use a shortcut to go directly to the query panel and type their query. both approaches result in the output being displayed as the list of files available for download from participating servers. the map panel (figure 1) includes a navigable map of idaho, a vertical command toolbar, and a map finder tool. the command toolbar allows the user to zoom in, zoom out, pan the map, identify by name the entities visible on the map canvas, and select a geographic area of interest. geographic entity name identification was implemented as a dynamic feature whereby the name of entity changes as the user moves the mouse over the map. spatial selection provides a tool to select a rectangular area of interest directly on the map canvas. the map finder provides additional means to simplify the exploration of the map. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the results panel shows the outcome of the query and includes important information about the data files: their size, type , projection, scale , the name of the server providing the data, as well as the access path (figure 4). based on this information , the user has the option of manually connecting to the server, using ftp protocol, and retrieving th e selected files. a much more convenient approach, however, is to rely on gl software to automatically retrieve the files through the software int erface. as an option , the result of the query can also be exported to a plain html document that contains links to all listed files . this feature can be very useful in the case of multifile files selected by the user and slow or limited-time internet access. this way the user can open the saved list of files in a web browser and download individual files as needed, without having to download all the files at once and tie up the internet connection for a long period of time. figure 1. map panel. the vertical toolbar provides zooming, panning , as well as labeling and simple feature querying capabilities. the map finder allows finding and selecting an area by county or usgs quad name . the screen copy here presents the selection of latah county in idaho. the result panel provides a flexible way to review and organize the outcomes of queries before commencing the download. one can sort files by name, size, scale, the user can select a county or a quad name and zoom in on the selected geographic unit. the query panel (figure 2) allows the user to perform a query, based either on the selection made on the map or a new selection using one of the available query tools (figure 3). in the latter case, the user can enter geographic coordinates (in decimal degrees) defining the area of interest. this approach is equivalent to selecting a rectangular area directly on the map, and will return all data files that spatially intersect with the selected area. optionally, the user can handpick quads of interest from the list. finally, a name can be entered to execute a more flexible query . for instance, the search containing the word "moscow" returns spatial data related to three quads containing "moscow" within their names. the query is executed when the user presses the query button . after the results are received, the application automatically switches to the results panel. projection, and server name . this feature may be useful if the user decides to retrieve data of only one type (e.g., dems), of one scale, or when the user prefers to connect only to a specific sever. in addition, individual records as well as entire file types can be selected to prevent files from being downloaded. the user can also remove selected files to scale down the set of data in the list. one of the most important assets of the gl browser is that all of the user activities described up to this point, with the exception of file download, take place entirely on the client-side without any network traffic. in fact, area/file selection as well as queries do not require an active internet connection. map exploration is based on vector-format maps contained in gl software and queries are run against the local database. such an approach limits bandwidth consumption and unnecessary network traffic. internet connection is only necessary to perform retrieval of selected files. is this a geolibrary? i jankowska and jankowski 7 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 2. query panel. the interface was set to query spatial selection from the map panel. figure 3. query panel. the query is based on the selection of usgs quads . optionally, the user can enter geographic coordinates of the area or a text to search. 8 information technology and libraries i march 2000 the vulnerability of the client-side approach to data query is to be left with a potentially outdated local database. in order to prevent this problem from happening, the gl is equipped with a database synchronization mechanism that allows users to keep up with the server database updates. the client-side database, contained in gl software, which mirrors the schema of the server database, can be synchronized automatically or by the user's request. in either case, the gl client contacts the server-based database synchronizer on the server side and handles all necessary processes. since the synchronization is limited to database record updates, the network traffic is kept low, making gl suitable for limited internet connections. igdc is an open solution. new local datasets can be added or removed making the collection easily adaptable to different geographical areas. in addition, datasets can physically reside on multiple servers, taking full advantage of the internet's distributed nature. i evaluation of igdc use geospatial information is among the most common public information needs; almost 80 percent of all information is geographic in nature. published research reflecting those needs and the role of libraries in resolving them is not extensive. the efforts of federal, state, and local agencies collecting digital geospatial data and the growth of gis created an interest in the role of libraries as repositories of geospatial data. 14 the main obstacle to providing access to digital spatial information is its complexity. this is why the user-friendly interface is critical for presenting spatially referenced information.15 the igdc has been a first attempt at creating a user-friendly interface in the form of a map-based data browser allowing the users to access and retrieve geographic datasets about idaho. in order to track and evaluate the use of geospatial data, webtrends software was installed on the igdc server. the webtrends software produces customized web log statistics and allows tracking information on traffic and reproduced with permission of the copyright owner. further reproduction prohibited without permission. ahsahka -southwick ·· lenore --juliaetta green knob -· aldeamand ridge park texas ridge · mcgary butte ·· bovill deary viola palouse dlg_aoads i.tj dlg_rai l!l ·dlg_transp01t dlg_hydro olg_bcu'ldaries tiger_streets tiger_bnds ----'-----'--"-'--'--'----'=---:__.:_::.._-_·-since the opening of igdc for public us e (april 1998), the geolibrary map browser was downloaded 1,352 times. the software proved to be relati vely easy to use by the public. out of fort y-four bug report s/ user questions submitted to igdc, most were concerned with filling out the software registration form and not with software failure. the igdc project spurred an interest in geographic information among students , faculty, and librarians at the university of idaho. in a direct response to this interest, the university of idaho library installed a new dedicated computer at the reference desk with geolibrar y software to access, view , and retrieve igdc data . i conclusion idaho geospatial data center is the first geospatial digital library for the state of idaho. it does not fulfill all requirements of a figure 4. the results panel. results of a query can be sorted; individual items can be removed from the list or can be deselected to prevent them from being downloaded . geolibrary model proposed by goodchild and others. the igdc has only two components of the geolibrary model; they are the datasets dissemination. during a one-year timeframe the number of successful hits was more than twenty-five thousand . almost 40 percent of users came from .com domain, 35 percent were .net domain users, 15 percent w ere .org, and 10 percent were .edu users (figure 5). tracking the geographic origin of users by state, the biggest number of users came from virginia, followed by washington, california, ohio, and idaho . the high number of users from virginia can be explained by the linking of the igdc site to one of the most popular geospatial data sites in the country-the united states geological survey (usgs) site. eighty-four percent of user sessions were from the united states; the rest originated from sweden, canada , and germany. the average number of hits per day on weekdays was around one hundred customers. the most popular retrievable information were digital raster graphics (drg) data that present scanned images of usgs standard series topographic maps at 1:24,000 scale. digital elevation models (dem) and digital line graphs (dlg) were less popular. the tiger boundary files for the state of idaho were in small demand . the popularity of drg-format maps and the fact that most of the users accessed igdc via the usgs web site makes plausible a speculation that most of the users were non-gis specialists interested in general reference geographic information about idaho including topography and basic land use information. geolibrary map browser and the basemap . the main difference between the geolibrary map browser and a web-based browser solution adopted by other spatial repositories is a client-side solution to geospatial data query and selection. spatial data query is done locally on the user's machine, using the library data base schema contained in the geolibrary map browser. this saves time by eliminating client-server communication delays during data searches, gives the user an experience of almost instantaneous response to queries , and reduces the network communication to the data download time . in comparison with th e geolibrary model, igdc is missing the gazetteer . this component can help improve the ease of user navigation through a geospatial data collection. the other useful component includes online mapping and spatial data visualization services. the idea of such services is to provide the user with a simple-tooperate mapping tool for visualizing and exploring the results of user-run queries . one such service, currently under implementation at igdc, includes thematic mapping of economic and demographic variables for idaho using descartes software .16 descartes is a knowledgebased system supporting users in the design and utilization of thematic maps. the knowledge base incorporates domain-independent visualization rules determining which map presentation technique to employ in response to the user selection of variables. an intelligent is this a geolibrary? i jankowska and jankowski 9 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i ,i distribution of igdc users (in %) by domain 40 30 20 10 0 . com .net org .edu web domain categories figure 5. distribution of igdc users in percent by origin domain map generator such as descartes can enhance the utility of a geolibrary by providing tools to transform georeferenced data into information. references and notes 1. l. covi and r. king, "organizational dimensions of effective digital library use: closed rational and open natural systems models," journal of the american society for information science 47, no. 9 (1996): 697. 2. k. musser, "interactive mapping on the world wide web." (1997) accessed march 6, 2000, www .min.net/-boggan/ mapping/thesis.htm. 3. t. bernhardsen, geographic information systems (arendal, norway: viak it and norwegian mapping authority, 1992), 2. 4. ibid., 4. 5. j. stone, "stocking your gis data library," issues in science and technology librarianship. (winter 1999). accessed march 6, 2000, www.library.ucsb .edu/istl/99-winter/articlel. html. 6. p. schroeder, "gis in public participation settings." (1997.) accessed june 2, 1999, www.spatial.maine.edu/ucgis/ testproc/ schroeder / ucgisdft.htm . 7. w. j. craig and others, "empowerment, marginalization, and public participation gis," report of a specialist meeting held under the auspices of the varenius project. santa barbara, california, oct. 15-17, 1998, ncgia, uc santa barbara. 8. b. plewe, gis online: information retrieval, mapping, and the internet (santa fe, n.m.: on word pr., 1997), 71-91 . 9. m. f. goodchild, "the geolibrary," in innovations in gis 5: selected papers from the fifth national conference on gis research uk (gisruk), ed. s. carver. (london: taylor and francis, 1998), 59. accessed march 6, 2000, www.geog.ucsb.edu/ -good/geolibrary.html . 10. b. p. buttenfield, "making the case for distributed geolibraries." (1998) accessed march 6, 2000, www.nap.edu/ html/ geolibraries/ app_b .html . 11. ibid . 12. m. rock, "monitoring user navigation through the alexandria digital library," (master's thesis abstract, 1998). accessed march 6, 2000, http :/ /greenwich.colorado.edu/projects/ rockm.htm. 13. l. l. hill and others, "geographic names the implementation of a gazetteer in a georeferenced digital library. d-lib magazine 5, no. 1 (1999). accessed march 6, 2000, www.dlib. org/ dlib/ january99 /hill/0lhill.html. 14. m. gluck and others, "public librarians' views of the public's geospatial information needs," library quarterly 66, no . 4 (1996): 409. 15. b. p. buttenfield, "user evaluation for the alexandria digital library project." (1995) accessed march 6, 2000, http://edfu.lis.uiuc.edu/allerton/95 /s 2/buttenfield .html. 16. g. andrienko and others, "thematic mapping in the internet: exploring census data with descartes," in proceedings of telegeo '99, first international workshop on telegeoprocessing, lyon, may 6-7, r. laurini, ed. (seiten, france: claude bernard univ. of lyon, 1999), 138--45. 10 information technology and libraries i march 2000 hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries annie wu, santi thompson, rachel vacek, sean watkins, and andrew weidner information technology and libraries | june 2016 5 abstract since 2009, tens of thousands of rare and unique items have been made available online for research through the university of houston (uh) digital library. six years later, the uh libraries’ new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. the uh libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. to answer the calls for technological change, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. this article focuses on the task force’s dams selection activities: needs assessment, systems evaluation, and systems testing. the authors also describe the task force’s dams recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. finally, the authors outline their dams implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development. introduction since the launch of the university of houston digital library (uhdl) in 2009, the uh libraries have made tens of thousands of rare and unique items available online for research using contentdm. as we began to explore and expand into new digital initiatives, we realize that the uh libraries’ digital aspirations require a more dynamic, flexible, scalable, and interoperable digital asset management system that can manage larger amounts of materials in a variety of formats. we plan to implement a new digital repository infrastructure that accommodates creative workflows and allows for the configuration of additional functionalities such as digital exhibits, data mining, cross-linking, geospatial visualization, and multimedia presentation. the annie wu (awu@uh.edu) is head of metadata and digitization services, santi thompson (sathompson3@uh.edu) is head of repository services, rachel vacek (evacek@uh.edu) is head of web services, sean watkins (slwatkins@uh.edu) is web projects manager, and andrew weidner (ajweidner@uh.edu) is metadata services coordinator, university of houston libraries. mailto:awu@uh.edu mailto:sathompson3@uh.edu mailto:evacek@uh.edu mailto:slwatkins@uh.edu mailto:ajweidner@uh.edu hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 6 new system will be designed with linked data in mind and will allow us to publish our digital collections as linked open data within the larger semantic web environment. the uh libraries strategic directions set forth a mandate for us to “work assiduously to expand our unique and comprehensive collections that support curricula and spotlight research. we will pursue seamless access and expand digital collections to increase national recognition.”1 to fulfill the uh libraries’ mission and the mandate of our strategic directions, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system that would provide multiple modes of access to the uh libraries’ unique collections and accommodate digital object production at a larger scale. the collaborative task force comprises librarians from four departments: metadata and digitization services (mds), web services, digital repository services, and special collections. the core charge of the task force is to: • perform a needs assessment and build criteria and policies based on evaluation of the current system and requirements for the new dams • research and explore dams on the market and identify the top three systems for beta testing in a development environment • generate preliminary recommendations from stakeholders' comments and feedback • coordinate installation of the new dams and finish data migration • communicate the task force work to uh libraries colleagues literature review libraries have maintained dams for the publication of digitized surrogates of rare and unique materials for over two decades. during that time, information professionals have developed evaluation strategies for testing, comparing, and evaluating library dams software. reviewing these models and associated case studies provided insight into common practices for selecting systems and informed how the uh libraries dams implementation task force conducted its evaluation process. one of the first publications of its kind, “a checklist for evaluating open source digital library software” by dion hoe-lian goh et al., presents a comprehensive list of criteria for library dams evaluation.2 the researchers developed twelve broad categories for testing (e.g., content management, metadata, and preservation) and generated a scoring system based on the assignment of a weight and a numeric value to each criterion.3 while the checklist was created to assist with the evaluation process, the authors note that an institution’s selection decision should be guided primarily by defining the scope of their digital library, the content being curated using the software, and the uses of the material.4 through their efforts, the authors created a rubric that can be utilized by other organizations when selecting a dams. information technology and libraries | june 2016 7 subsequent research projects have expanded upon the checklist evaluation model. in “choosing software for a digital library,” jody deridder outlines major issues that librarians should address when choosing dams software, including many of the hardware, technological, and metadata concerns that goh et al. identified.5 additionally, she emphasizes the need to account for personnel and service requirements with a variety of activities: usability testing and estimating associated costs; conducting a formal needs assessment to guide the evaluation process; and a tiered-testing approach, which calls upon evaluators to winnow the number of systems.6 by considering stakeholder needs, from users to library administrators, deridder’s contributions inform a more comprehensive dams evaluation process. in addition to creating evaluation criteria, the literature on dams selection has also produced case studies that reflect real-world scenarios and identify use cases that help determine user needs and desires. in “evaluation of digital repository software at the national library of medicine,” jennifer l. marill and edward c. luczak discuss the process that the national library of medicine (nlm) used to compare ten dams, both proprietary and open-source.7 echoing goh et al. and deridder, marill and luczak created broad categories for testing and developed a scoring system for comparing dams.8 additionally, marill and luczak enriched the evaluation process by implementing two testing phases: “initial testing of ten systems” and “in-depth testing of three systems.”9 this method allowed nlm to conduct extensive research on the most promising systems for their needs before selecting a dams to implement. the tiered approach appealed to the task force, and influenced how it conducted the evaluation process, because it balances efficiency and comprehensiveness. in another case study, dora wagner and kent gerber describe the collaborative process of selecting a dams across a consortium. in their article “building a shared digital collection: the experience of the cooperating libraries in consortium,”10 the authors emphasize additional criteria that are important for collaborating institutions: the ability to brand consortial products for local audiences; the flexibility to incorporate differing workflows for local administrators; and the shared responsibility of system maintenance and costs.11 while the uh libraries will not be managing a shared repository dams, the task force appreciated the article’s emphasis on maximizing customizations to improve the user experience. in “evaluation and usage scenarios of open source digital library and collection management tools,” georgios gkoumas and fotis lazarinis describe how they tested multiple open-source systems against typical library functions—such as acquisitions, cataloging, digital libraries, and digital preservation—to identify typical use cases for libraries.12 some of the use cases formulated by the researchers address digital platforms, including features related to supporting a diverse array of metadata schema and using a simple web interface for the management of digital assets.13 these use cases mirror local feature and functionality requests incorporated into the uh libraries’ evaluation criteria. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 8 in “digital libraries: comparison of 10 software,” mathieu andro, emmanuelle asselin, and marc maisonneuve discuss a rubric they developed to compare six open-source platforms (invenio, greenstone, omeka, eprints, ori-oai, and dspace) and four proprietary platforms (mnesys, digitool, yoolib, and contentdm) around six core areas: document management, metadata, engine, interoperability, user management, and web 2.0. 14 the authors note that each solution is “of good quality” and that institutions should consider a variety of factors when selecting a dams, including the “type of documents you will want to upload” and the “political criteria (open source or proprietary software)” desired by the institution.15 this article provided the uh libraries with additional factors to include in their evaluation criteria. finally, heather gilbert and tyler mobley’s article “breaking up with contentdm: why and how one institution took the leap to open source,” provides a case study for a new trend: selecting a dams for migration from an existing system to a new one.16 the researchers cite several reasons for their need to select a new dams, primarily their current system’s limitations with searching and displaying content in the digital library.17 they evaluated alternatives and selected a suite of open-source tools, including fedora, drupal, and blacklight, which combine to make up their new dams.18 gilbert and mobley also reflect on the migration process and identify several hurdles they had to overcome, such as customizing the open-source tools to meet their localized needs and confronting inconsistent metadata quality.19 gilbert and mobley’s article most closely matches the scenario faced by the uh libraries. our study adds to the limited literature on evaluating and selecting dams for migration in several ways. it demonstrates another model that other institutions can adapt to meet their specific needs. it identifies new factors for other institutions to take into account before or during their own migration process. finally, it adds to the body of evidence for a growing movement of libraries migrating from proprietary to open-source dams. dams evaluation and analysis methodology needs assessment the dams implementation task force fulfilled the first part of its charge by conducting a needs assessment. the goal of the needs assessment was to collect the key requirements of stakeholders, identify future features of the new dams, and gather data in order to craft criteria for evaluation and testing in the next phase of its work. the task force employed several techniques for information gathering during the needs assessment phase: • identified stakeholders and held internal focus group interviews to identify system requirement needs and gaps • reviewed scholarly literature on dams evaluation and migration • researched peer/aspirational institutions • reviewed national standards around dams information technology and libraries | june 2016 9 • determined both the current use of uhdl as well as its projected use of uhdl • identified uhdl materials and users task force members took detailed notes during each focus group interview session. the literature research on dams evaluation helped the task force to find articles with comprehensive dams evaluation criteria. the niso criteria for core types of entities in digital library collections were also listed and applied to the evaluation after reviewing the niso framework of guidance for building good digital collections.20 more than forty peer and aspirational institutions’ digital repositories were benchmarked to identify web site names, platform architecture, documentation, and user and system features. the task force analyzed the rich data gathered from needs assessment activities and built the dams evaluation criteria that prepared the task force for the next phase of evaluation. evaluation, testing, and recommendation the task force began its evaluation process by identifying twelve potential dams for consideration that were ultimately narrowed down to three systems for in-depth testing. using data from focus group interviews, literature reviews, and dams best practices, the group generated a list of benchmark criteria. these broad evaluation criteria covered features in categories of system functionality, content management, metadata, user interface, and search support. members of the task force researched dams documentation, product information, and related literature to score each system against the evaluation criteria. table 1 contains the scores of the initial evaluation. from this process, five systems emerged with the highest scores: ● fedora (and, closely associated, fedora/hydra and fedora/islandora) ● collective access ● dspace ● rosettacontentdm the task force eliminated collective access from the final systems for testing because of its limited functionality. it is based around archival content only, and is not widely deployed. the task force decided not to test contentdm because of the system’s known functionalities that we identified through firsthand experience. after the initial elimination process, fedora (including fedora/hydra and fedora/islandora), dspace, and rosetta remained for in-depth testing. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 10 dams evaluation score* fedora 27 fedora/hydra 26 fedora/islandora 26 collective access 24 dspace 24 rosetta 20 contentdm 20 trinity (ibase) 19 preservica 16 luna imaging 15 roda† 6 invenio‡ 5 table 1. evaluation scores of twelve dams using broad evaluation criteria the task force then created detailed evaluation and testing criteria by drawing from the same sources used previously: focus groups, literature review, and best practices. while the broad evaluation focused on high-level functions, the detailed evaluation and testing criteria for the final three systems closely analyzed the specific features of each dams in eight categories: ● system environment and function ● administrative access ● content ingest and management ● metadata ● content access ● discoverability ● report and inquiry capabilities ● system support * total possible score: 29. † removed from evaluation because the system does not support dublin core metadata. ‡ removed from evaluation because the system does not support dublin core metadata. information technology and libraries | june 2016 11 prior to the in-depth testing of the final three systems, the task force researched timelines for system setup. rosetta’s timeline for system setup proved to be prohibitive. consequently, the task force eliminated rosetta from the testing pool and moved forward with fedora and dspace. to conduct the detailed evaluation, the task force scored the specific features under each category utilizing systems testing and documentation. a score range from zero to three (0 = none, 1 = low, 2 = moderate, 3 = high) was assigned for each feature evaluated. after evaluating all features, the score was tallied for each category. our testing revealed that fedora outperformed dspace in over half of the testing sections: content ingest and management, metadata, content access, discoverability, and report and inquiry capabilities. see table 2 for the tallied scores in each testing section. testing sections dspace score fedora score possible score system environment and testing 21 21 36 administrative access 15 12 18 content ingest and management 59 96 123 metadata 32 43 51 content access 14 18 18 discoverability 46 84 114 report and inquiry capabilities 6 15 21 system support 12 11 12 total score: 205 300 393 table 2. scores of top two dams from testing using detailed evaluation criteria after review of the testing results, the task force conducted a facilitated activity to summarize the advantages and disadvantages of each system. based on this comparison, the dams task force recommended that the uh libraries implement a fedora/hydra repository architecture with the following course of action: ● adapt the uhdl user interface to fedora and re-evaluate it for possible improvements ● develop an administrative content management interface with the hydra framework ● migrate all uhdl content to a fedora repository hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 12 fedora/hydra advantages fedora/hydra disadvantages open source steep learning curve large development community long setup time linked data ready requires additional tools for discovery modular design through api no standard model for multi-file objects scalable, sustainable, and extensible batch import/export of metadata handles any file format table 3. fedora/hydra advantages and disadvantages the primary advantages of a dams based on fedora/hydra are: a large and active development community; a scalable and modular system that can grow quickly to accommodate large scale digitization; and a repository architecture based on linked data technologies. this last advantage, in particular, is unique among all systems evaluated, and will give the uh libraries the ability to publish our collections as linked open data. fedora 4 conforms to the world wide web consortium (w3c) recommendation for linked data platforms.21 the main disadvantage of a fedora/hydra system is the steep learning curve associated with designing metadata models and developing a customized software suite, which translates to a longer implementation time compared to off-the-shelf products. the uh libraries must allocate an appropriate amount of time and resources for planning, implementation, and staff training. the long-term return on investment for this path will be a highly skilled technical staff with the ability to maintain and customize an open-source, standards-based repository architecture that can be expanded to support other uh libraries content such as geospatial data, research data, and institutional repository materials. information technology and libraries | june 2016 13 dspace advantages dspace disadvantages open source flat file and metadata structure easy installation / ready out of box limited reporting capabilities existing familiarity through texas digital library limited metadata features user group / profile controls does not support linked data metadata quality module limited api batch import of objects not scalable / extensible poor user interface table 4. dspace advantages and disadvantages the main advantages of dspace are ease of installation, familiarity of workflows, and additional functionality not found in contentdm.22 installation and migration to a dspace system would be relatively fast, and staff could quickly transition to new workflows because they are similar to contentdm. dspace also supports authentication and user roles that could be used to limit content to the uh community only. commercial add-on modules, although expensive, could be purchased to provide more sophisticated content management tools than are currently available with contentdm. the disadvantages of a dspace system are the same long-term, systemic problems with the current contentdm repository. dspace uses a flat metadata structure, has a limited api, does not scale well, and is not customizable to the uh libraries’ needs. consultations with peers indicated that both contentdm and dspace institutions are exploring the more robust capabilities of fedora-based systems. migration of the digital collections in contentdm to a dspace repository would provide few, if any, long term benefits to the uh libraries. of all the systems considered, implementation of a fedora/hydra repository aligns most clearly with the uh libraries strategic directions of attaining national recognition and improving access to our unique collections. the fedora and hydra communities are very active, with project management overseen by duraspace and hydra respectively.23,24 over the long term, a repository based on fedora/hydra will give the uh libraries a low cost, scalable, flexible, and interoperable platform for providing online access to our unique collections. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 14 cost considerations to balance the current digital collections production schedule with the demands of a timely implementation and migration, the task force identified the following investments as cost effective for fedora/hydra and dspace, respectively: fedora/hydra dspace metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process @mire modules: $41,500 ● content delivery (3): $13,500 ● metadata quality: $10,000 ● image conversion suite: $9,000 ● content & usage analysis: $9,000 ● these modules require one-time fees to @mire that recur when upgrading to a new version of dspace table 5. start-up costs associated with fedora/hydra and dspace the task force determined that an investment in one librarian’s salary is the most cost-effective course of action. the new metadata librarian will manage daily operations of the metadata unit in metadata & digitization services while the metadata services coordinator, in close collaboration with the web projects manager, leads the dams implementation process. in contrast to fedora, migration to dspace would require a substantial investment in third party software modules from @mire to deliver the best possible content management environment and user experience. implementation strategies the implementation of the new dams will occur in a phased rollout comprised of the following stages: system installation, data migration, and interface development. mds and web services will perform the majority of the work, in consultation with key stakeholders from special collections and other units. throughout this process, the dams implementation task force will information technology and libraries | june 2016 15 consult with the digital preservation task force* to coordinate the preservation and access systems. phase one system installation phase two data migration phase three interface development set up production and server environment formulate content migration strategy and schedule reevaluate front-end user interface rewrite uhdl front-end application for fedora/solr migrate test collections and document exceptions rewrite uhdl front end as a hydra head or . . . create metadata models conduct the data migration . . . update current front end coordinate workflows with digital preservation task force create preservation metadata for migrated data establish interdepartmental production workflows begin development of administrative hydra head for content management continue development of the hydra administrative interface refine administrative hydra head for content management table 6. overview of dams phased implementation phase one: system installation during the first phase of dams implementation, web services and mds will work closely together to install an open-source repository software stack based on fedora, rewrite the current php front-end interface to provide public access to the data in the new system, and create metadata content models for the uhdl based on the portland common data model,25 in consultation with the coordinator of digital projects from special collections and other key stakeholders. the dams task force will consult with the digital preservation task force† to determine how closely the preservation and access systems will be integrated and at what points. the two groups will also jointly outline a dams migration strategy that aligns with the preservation system. web services and mds will collaborate on research and development of an administrative interface, based on the hydra framework, for day-to-day management of uhdl content. * an appointed task force to create a digital preservation policy and identify strategies, actions, and tools needed to sustain long-term access to digital assets maintained by uh libraries. † a working team at uh libraries that enforces the digital preservation policy and maintains the digital preservation system.[convert these footnotes to endnotes?] hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 16 phase two: data migration in the second phase, mds will migrate legacy content from contentdm to the new system and work with web services, special collections, and the architecture and art library to resolve any technical, metadata, or content problems that arise. the second phase will begin with the development of a strategy for completing the work in a timely fashion, followed by migration of representative sample collections to the new system to test and refine its capabilities. after testing is complete, all legacy content will be migrated from contentdm to fedora, and preservation metadata for migrated collections will be created and archived. development work on the hydra administrative interface will also continue. after the data migration is complete, all new collections will be ingested into fedora/hydra, and the current contentdm installation will be retired. phase three: interface development in the final phase, web services will reevaluate the current front-end user interface (ui) for the uhdl by conducting user tests to better understand how and why users are visiting the uhdl. web services will also analyze web and system analytics and gather feedback from special collections and other stakeholders. depending on the outcome of this research, web services may create a new ui based on the hydra framework or choose to update the current front-end application with modifications or new features. web services and mds will also continue to develop or adopt tools for the management of uhdl content and work with special collections and the branch libraries to establish production workflows in the new system. continued development work on the front-end and administrative interfaces, for the life of the new digital asset management system, is both expected and desirable as we maintain and improve the uhdl infrastructure and contribute to the open source software community in line with the uh libraries strategic directions. ongoing: assessment, enhancement, training, and documenting throughout the transition process mds and web services will undergo extensive training in workshops and conferences to develop the skills necessary for developing and maintaining the new system. they will also establish and document workflows to ensure the long-term viability of the system. regular consultation with special collections, the branch libraries, and other stakeholders will be conducted to ensure that the new system satisfies the requirements of colleagues and patrons. ongoing activities will include: ● assessing service impact of new system ● user testing on ui ● regular system enhancements ● establishing new workflows ● creating and maintaining documentation ● training: conferences, webinars, workshops, etc. information technology and libraries | june 2016 17 conclusion transitioning from contentdm to a fedora/hydra repository will place the uh libraries in a position to sustainably grow the amount of content in the uh digital library and customize the uhdl interfaces for a better user experience. publishing our data in a linked data platform will give the uh libraries the ability to more easily publish our data for the semantic web. in addition, the fedora/hydra architecture can be adapted to support a wide range of uh libraries projects, including a geospatial data portal, a research data repository, and a self-deposit institutional repository. over the long term, the return on investment for implementing an open-source repository architecture based on industry standard software will be: improved visibility of our unique collections on the web; expanded opportunities for aggregating our collections with highprofile repositories such as the digital public library of america; and increased national recognition for our digital projects and staff expertise. references 1. “the university of houston libraries strategic directions, 2013–2016,” accessed july 22, 2015, http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016libraries-strategic-directions-final.pdf. 2. dion hoe-lian goh et al., “a checklist for evaluating open source digital library software,” online information review 30, no. 4 (july 13, 2006): 360–79, doi:10.1108/14684520610686283. 3. ibid., 366. 4. ibid., 364. 5. jody l. deridder, “choosing software for a digital library,” library hi tech news 24, no. 9 (2007): 19–21, doi:10.1108/07419050710874223. 6. ibid., 21. 7. jennifer l. marill and edward c. luczak, “evaluation of digital repository software at the national library of medicine,” d-lib magazine 15, no. 5/6 (may 2009), doi:10.1045/may2009marill. 8. ibid. 9. ibid. 10. dora wagner and kent gerber, “building a shared digital collection: the experience of the cooperating libraries in consortium,” college & undergraduate libraries 18, no. 2–3 (2011): 272–90, doi:10.1080/10691316.2011.577680. 11. ibid., 280–84. http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://dx.doi.org/10.1108/14684520610686283 http://dx.doi.org/10.1108/07419050710874223 http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1080/10691316.2011.577680 hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 18 12. georgios gkoumas and fotis lazarinis, “evaluation and usage scenarios of open source digital library and collection management tools,” program: electronic library and information systems 49, no. 3 (2015): 226–41, doi:10.1108/prog-09-2014-0070. 13. ibid., 238–39. 14. mathieu andro, emmanuelle asselin, and marc maisonneuve, “digital libraries: comparison of 10 software,” library collections, acquisitions, & technical services 36, no. 3–4 (2012): 79–83, doi:10.1016/j.lcats.2012.05.002. 15. ibid., 82. 16. heather gilbert and tyler mobley, “breaking up with contentdm: why and how one institution took the leap to open source,” code4lib journal, no. 20 (2013), http://journal.code4lib.org/articles/8327. 17. ibid. 18. ibid. 19. ibid. 20. niso framework working group with support from the institute of museum and library services, a framework of guidance for building good digital collections (baltimore, md: national information standards organization (niso), 2007). 21 . “linked data platform 1.0”, w3c, accessed july 22, 2015, http://www.w3.org/tr/ldp/. 22. “dspace,” accessed july 22, 2015, http://www.dspace.org/. 23. “fedora repository home,” accessed july 22, 2015, https://wiki.duraspace.org/display/ff/fedora+repository+home. 24. “hydra project,” accessed july 22, 2015, http://projecthydra.org/. http://dx.doi.org/10.1108/prog-09-2014-0070 http://dx.doi.org/10.1016/j.lcats.2012.05.002 http://journal.code4lib.org/articles/8327 http://www.w3.org/tr/ldp/ http://www.dspace.org/ https://wiki.duraspace.org/display/ff/fedora+repository+home http://projecthydra.org/ introduction literature review dams evaluation and analysis methodology needs assessment evaluation, testing, and recommendation cost considerations implementation strategies phase one: system installation phase two: data migration phase three: interface development ongoing: assessment, enhancement, training, and documenting conclusion microsoft word 13063 20211217 galley.docx article bridging the gap using linked data to improve discoverability and diversity in digital collections jason boczar, bonita pollock, xiying mi, and amanda yeslibas information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13063 jason boczar (jboczar@usf.edu) is digital scholarship and publishing librarian, university of south florida. bonita pollock (pollockb1@usf.edu) is associate director of collections and discovery, university of south florida. xiying mi (xmi@usf.edu) is digital initiative metadata librarian, university of south florida. amanda yeslibas (ayesilbas@usf.edu) is e-resource librarian, university of south florida. © 2021. abstract the year of covid-19, 2020, brought unique experiences to everyone in their daily as well as their professional life. facing many challenges of division in all aspects (social distancing, political and social divisions, remote work environments), university of south florida libraries took the lead in exploring how to overcome these various separations by providing access to its high-quality information sources to its local community and beyond. this paper shares the insights of using linked data technology to provide easy access to digital cultural heritage collections not only for the scholarly communities but also for those underrepresented user groups. the authors present the challenges at this special time of the history, discuss the possible solutions, and propose future work to further the effort. introduction we are living in a time of division. many of us are adjusting to a new reality of working separated from our colleagues and the institutions that formerly brought us together physically and socially due to covid-19. even if we can work in the same physical locale, we are careful and distant with each other. our expressions are covered by masks, and we take pains with hygiene that might formerly have felt offensive. but the largest divisions and challenges being faced in the united states go beyond our physical separation. the nation has been rocked and confronted by racial inequality in the form of black lives matter, a divisive presidential campaign, income inequality exacerbated by covid-19, the continued reckoning with the #metoo movement, and the wildfires burning the west coast. it feels like we are burning both literally and metaphorically as a country. adding fuel to this fire is the consumption of unreliable information. ironically, even as our divisions become more extreme, we are increasingly more connected and tuned into news via the internet. sadly, fact checking and sources are few and far between on social media platforms, where many are getting their information. the pew foundation report the future of truth and misinformation online warns that we are on the verge of a very serious threat to the democratic process due to the prevalence of false information. lee raine, director of the pew research center’s internet and technology project, warns, “a key tactic of the new anti-truthers is not so much to get people to believe in false information. it’s to create enough doubt that people will give up trying to find the truth, and distrust the institutions trying to give them the truth.”1 libraries and other cultural institutions have moved very quickly to address and educate their populations and the community at large, trying to give a voice to the oppressed and provide information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 2 reliable sources of information. the university of south florida (usf) libraries reacted by expanding antiracism holdings. usf’s purchases were informed by work at other institutions, such as the university of minnesota’s antiracism reading lists, which has in turn grown into a rich resource that includes other valuable resources like the mapping prejudice project and a link to the umbra search.2 the triad black lives matter protest collection at the university of north carolina greensboro is another example of a cultural institution reacting swiftly to document, preserve, and educate.3 these new pages and lists being generated by libraries and cultural institutions seem to be curated by hand using tools that require human intervention to make them and keep them up to date. this is also a challenge the usf libraries faced when constructing its new african american experience in florida portal, a resource that leverages already existing digital collections at usf to promote social justice. another key challenge is linking new digital collections and tools to already established collections and holdings. beyond the new content being created in reaction to current movements, there is already a wealth of information established in rich archives of material, especially regarding african american history. digital collections need to be discoverable by a wide audience to achieve resource sharing and educational purposes. this is a challenge many digital collections struggle with, because they are often being siloed from library and archival holdings even within their own institutions. all the good information in the world is not useful if it is not findable. an example of a powerful discovery tool that is difficult to find and use is the umbra search (https://www.umbrasearch.org/) linked to the university of minnesota’s anti-racism reading list. umbra search is a tool that aggregates content from more than 1,000 libraries, archives, and museums.4 it is also supported by highprofile grants from the institute of museum and library services, the doris duke charitable foundation, and the council on library and information resources. however, the website is difficult to find in a web search. umbra search was named after society of umbra, a collective of black poets from the 1960s. the terms umbra and society of umbra do not return useful results for finding the portal, nor do broader searches of african american history the portal is difficult to find through basic web searches. one of the few chances for a user to find the site is if they came upon the human-made link in the university of minnesota anti-racism reading list. despite enthusiasm from libraries and other cultural institutions, new purchases and curated content are not going to reach the world as fully as hoped. until libraries adopt open data formats in favor of locking away content in closed records like marc, library and digital content will remain siloed from the internet. the library catalog and digital platforms are even siloed from each other. we make records and enter metadata that is fit for library use but not shareable to the web. as karen coyle asked in her lita keynote address a decade ago, the question is how can libraries move from being “on the web” to being “of the web”?5 the suggested answer and the answer the usf libraries are researching is with linked data. literature review the literature on linked data for libraries and cultural heritage resources reflects an implementation that is “gradual and uneven.” as national libraries across the world and the library of congress develop standards and practices, academic libraries are still trying to understand their role in implementation and identify their expertise.6 information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 3 in 2006 tim berners-lee, the creator of the sematic web concept, outlined four rules of linked data: 1. use uris as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf, sparql). 4. include links to other uris so that they can discover more things.7 it was not too long after this that large national libraries began exploring linked data and experimenting with uses. in 2010 the british library presented its prototype of linked data. this move was made in accordance with the uk government’s commitment to transparency and accountability along with the user’s expectation that the library would keep up with cutting edge trends.8 today the british library has released the british national bibliography as linked data instead of the entire catalog because it is authoritative and better maintained than the catalog.9 the national libraries of europe, spurred on by government edicts and europeana (https://www.europeana.eu/en), are leading the progress in implementation of linked data. national libraries are uniquely suited to the development and promotion of new technologies because of their place under the government and proximity to policy making, bridging communication between interested parties and the ability to make projects into sustainable services.10 a 2018 survey of all european national libraries found that 15 had implemented linked data, two had taken steps for implementation and three intended to implement it. even national libraries that were unable to implement linked data were contributing to the linked data open cloud by providing their data in datasets to the world.11 part of the difficulty with earlier implementation of linked data by libraries and cultural heritage institutions was the lack of a “killer example” that libraries could emulate.12 the relatively recent success of european national libraries might provide those examples. many other factors have slowed the implementation of linked data. a survey of norwegian libraries in 2009 found considerable gap in the semantic web literature between the research undertaken in the technological field and the research in the socio-technical field. implementing linked data requires reorganization of the staff, commitment of resources, education throughout the library and buy-in from the leadership to make it strategically important.13 the survey of european national libraries cited the exact same factors as limitations in 2018.14 outside of european national libraries the implementation of linked data has been much slower. many academic institutions have taken on projects that tend to languish in a prototype or proof of concept phase.15 the library-centric talis group of the united kingdom “embraced a vision of developing an infrastructure based on semantic web technologies” in 2006, but abandoned semantic web-related business activities in 2012.16 it has been suggested that it is premature to wholly commit to linked data, but it should be used for spin-off projects in an organization for experimentation and skill development.17 linked data is also still proving to be technologically challenging for implementation of cultural heritage aggregators. if many human resources are needed to facilitate linked data, it will remain an obstacle for cultural heritage aggregators. a study has shown automated interpretation of information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 4 ontologies is hampered by a lack of inter-ontology relations. cross-domain applications will not be able to use these ontologies without human intervention.18 aiding in the development and awareness of linked data practices for libraries is the creation and implementation of bibframe by the library of congress. the library of congress’s announcement in july 2018 that bibframe would be the replacement of marc definitively shows that the future of library records is focused on linking out and integrating into the web.19 the new rda (resource description and access) cataloging standards made it clear that marc is no longer the best encoding language for making library resources available on the web.20 while rda has adopted the cataloging rules to meet a variety of new library environments, the marc encoding language makes it difficult for computers to interpret and apply logic algorithms to the marc format. in response, the library of congress commissioned the consulting agency zepheria to create a framework that would integrate with the web and be flexible enough to work with various open formats and technologies, as well as be able to adapt to change. using the principles and technologies of the open web, the bibframe vocabulary is made of “resource description framework (rdf) properties, classes, and relationships between and among them.”21 eric miller, the ceo of zepheria, says bibframe “works as a bridge between the description component and open web discovery. it is agnostic with regards to which web discovery tool is employed” and though we cannot predict every technology and application bibframe can “rely on the ubiquity and understanding of uris and the simple descriptive power of rdf.”22 the implementation of linked data in the cultural heritage sphere has been erratic but seems to be moving forward. it is important to pursue though because bringing local data out of the “deep web” and making them open and universally accessible, means offering minority cultures a democratic opportunity for visibility.”23 linked data linked data is one way to increase the access and discoverability of critical digital cultural heritage collections. also referred to as semantic web technologies, linked data follows the w3c resource description framework (rdf) standards.24 according to tim berners-lee, the semantic web will bring structure and well-defined meaning to web content allowing computers to perform more automated processes.25 by providing structure and meaning to digital content, information can be more readily and easily shared between institutions. this provides an opportunity for digital cultural heritage collections of underrepresented populations to get more exposure on the web. following is a brief overview of linked data to illustrate how semantic web technologies function. linked data is created by forming semantic triples. each rdf triple contains uniform resource identifiers or uris. these identifiers allow computers (machines) to “understand” and interpret the metadata. each rdf triple consists of three parts: a subject, a predicate, and an object. the subject defines what the metadata rdf triple is about, while the object contains information about the subject which is further defined by the relationship link in the predicate. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 5 figure 1. example of a linked data rdf triple describing william shakespeare’s authorship of hamlet. for example, in figure 1, “william shakespeare wrote hamlet” is a triple. the subject and predicate of the triple are written as an uri containing the identifier information and the object of the triple is a literal piece of information. the subject of the triple, william shakespeare, has an identifier which in this example links to the library of congress name authority file for william shakespeare. the predicate of the rdf triple describes the relationship between the subject and object. the predicate also typically defines the metadata schema being used. in this example, dublin core is the metadata schema being used, so “wrote” would be identified by the dublin core creator field. the object of this semantic triple, hamlet, is a literal. literals are text that are not linked because they do not have a uri. subjects and predicates always have uris to allow the computer to make links. the object may have a uri or be a literal. together these uris, along with the literal, tell the computer everything it needs to know about this piece of metadata, making it self-contained. rdf triples with their uris are stored in a triple-store graph style database which functions differently from a typical relational database. relational databases rely on table headers to define the metadata stored inside. moving data between relational databases can be complex because tables must be redefined every time data is moved. graph databases don’t need tables since all the defining information is already stored in each triple. this allows for bidirectional flow of information between pieces of metadata and makes transferring data simpler and more efficient.26 information in a triple-store database is then retrieved using sparql, a query language developed for linked data. because linked data is stored as self-contained triples, machines have all the information needed to process the data and perform advanced reasoning and logic programming. this leads to better search functionality and lends itself well to artificial intelligence (ai) technologies. many of today’s modern websites make use of these technologies to enhance their displays and provide greater functionality for their users. the internet is an excellent avenue for libraries to un-silo their collections and make them globally accessible. once library collections are on the web, advanced keyword search functionalities and artificial intelligence machine learning algorithms can be developed to automate metadata creation workflows and enhance search and information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 6 retrieval of library resources. the use of linked data metadata in these machine-learning functions will add a layer of semantic understanding to the data being processed and analyzed for patron discovery. ai technology can also be used to create advanced graphical displays making connections for patrons between various resources on a research topic. sharing digital cultural heritage data with other institutions often involves transferring data and is considered one of the greatest difficulties in sharing digital collections. for example, if one institutional repository uses dublin core to store its metadata for a certain cultural heritage collection and another repository uses mods/mets to store digital collections, there must first be a data conversion before the two repositories could share information. dublin core and mods/mets are two completely different schemas with different fields and metadata standards. these two schemas are incompatible with each other and must be crosswalked into a common schema. this typically results in some data loss during the transformation process. this makes combining two collections from different institutions into one shared web portal difficult. linked data allows institutions to share collections more easily. because linked data triples are self-contained, there is no need to crosswalk metadata stored in triples from one schema into another when transferring data. the uris contained in the rdf triples allow the computer to identify the metadata schema and process the metadata. rdf triples can be harvested from one linked data system and easily placed into another repository or web portal. a variety of schemas can all be stored together in one graph database. storing metadata in this way increases the interoperability of digital cultural heritage collections. collections stored in triple-store databases have sparql endpoints that make harvesting the metadata in a collection more efficient. libraries can easily share metadata on important collections increasing the exposure and providing greater access for a wider audience. philip schreur, author of “bridging the worlds of marc and linked data,” sums this concept up nicely: “the shift to the web has become an inextricable part of our day-to-day lives. by moving our carefully curated metadata to the web, libraries can offer a muchneeded source of stable and reliable data to the rapidly growing world of web discovery.”27 linked data also makes it easier to harvest metadata and import collections into larger cultural heritage repositories like digital public library of america (dpla) which uses linked data to “empower people to learn, grow, and contribute to a diverse and better-functioning society by maximizing access to our shared history, culture, and knowledge.”28 europeana, the european cultural heritage database, uses semantic web technologies to support its mission which is to “empower the cultural heritage sector in its digital transformation.”29 using linked data to transfer data into these national repositories is more efficient and there is less loss of data because the triples do not have to be transformed into another schema. this increases the access of many cultural heritage collections that might not otherwise be seen. one of the big advantages to linked data is the ability to create connections between other cultural heritage collections worldwide via the web. incorporating triples harvested from other collections into the local datasets enables libraries to display a vast amount of information about cultural heritage collections in their web portals. libraries thus can provide a much richer display and allows users access to a greater variety of resources. linked data also allows web developers to use uris to implement advanced search technologies creating a multifaceted search environment for patrons. current research points to the fact that using sematic web technologies makes the creation of advance logic and reasoning functionalities possible. according to liyang yu in the book introduction to the semantic web and semantic web services, “the semantic web is an information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 7 extension of the current web. it is constructed by linking current web pages to a structured data set that indicates the semantics of this linked page. a smart agent, which is able to understand this structure data set, will then be able to conduct intelligent actions and make educated decisions on a global scale.”30 many digital cultural heritage collections in libraries live in siloed resources and are therefore only accessible to a small population of users. linked data helps to break down traditional library silos in these collections. by using linked data, an institution can expand the interoperability of the collection and make it more easily accessible. many institutions are starting to incorporate linked data technologies into digital collections, thereby increasing the ability for institutions to share collections. this allows for a greater audience to have access to critical cultural heritage collections for underrepresented populations. in the article “bridging the worlds of marc and linked data,” the author states, “the shift to linked data within this closed world of library resources will bring tremendous advantages to discovery both within a single resource … as well as across all the resources in your collections, and even across all of our collective collections. but there are other advantages to moving to linked data. through the use of linked data, we can connect to other trusted sources on the web.… we can also take advantage of a truly international web environment and reuse metadata created by other national libraries.”31 university of south florida libraries practice university of south florida libraries digital collections house a rich collection varying from cultural heritage objects to natural science and environment history materials to collections related to underrepresented populations. most of the collections are unique to usf and have significant research and educational value. the library is eager to share the collections as widely as possible and hopes the collections can be used at both document and data level. linked data creates a “web of data” instead of a “web of documents,” which is the key to bringing structure and meaning to web content, allowing computers to better understand the data. however, collections are mostly born at the document level. therefore, the first problem librarians need to solve is how to transform the documents to data. for example, there is a beautiful natural history collection called audubon florida tavernier research papers in usf libraries digital collections. the audubon florida tavernier research papers is an image collection which includes rookeries, birds, people, bodies of water, and man-made structures. the varied images come from decades of research and are a testament to the interconnectedness of bird population health and human interaction with the environment. the images reflect the focus of audubon’s work in research and conservation efforts both to wildlife and to the habitat that supports the wildlife.32 this was selected to be the first collection the authors experimented with to implement linked data at usf libraries. the lessons learned from working with this collection are applied to later work. when the collection was received to be housed in the digital platform, it was carefully analyzed to determine how to pull the data out of all the documents as much as possible. the authors designed a metadata schema of the combination of mods and darwin core (darwin core, abbreviated to dwc, is an extension of dublin core for biodiversity informatics) to pull out and properly store the data. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 8 figure 2. american kestrel. figure 3. american kestrel metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 9 figure 2 is one of the documents in the collection, which is a photo of an american kestrel. figure 3 shows the data collected from the document and the placement of the data in the metadata schema. the authors put the description of the image in free text in the abstract field. this field is indexed and searchable through the digital collections platform. location information is put in the hierarchical spatial field. the subject heading fields describe the “aboutness” of the image, that is, what is in the image. all the detailed information about the bird is placed in darwin core fields. thus, the document is dissembled into a few pieces of data which are properly placed into metadata fields where they can be indexed and searched. having data alone is not sufficient to meet linked data requirements. the first of the four rules of linked data is to name things using uris.33 to add uris to the data, the authors needed to standardize the data and reconcile it against widely-used authorities such as library of congress subject headings, wikidata, and the getty thesaurus of geographic names. standardized data tremendously increases the percentage of data reconciliation, which will lead to more links with related data once published. figure 4. amenaprkitch khachkar. figure 4 shows an example from the armenia heritage and social memory program. this is a visual materials collection with photos and 3d digital models. it was created by the digital heritage and humanities collection team at the library. the collection brings together comprehensive information and interactive 3d visualization of the fundamentals of armenian identity, such as their architectures, languages, arts, etc.34 when preparing the metadata for the items in this collection, the authors devoted extra effort to adding geographic location metadata. this effort serves two purposes: one is to respectfully and honestly include the information in the collection; and the second is to provide future reference to the location of each item as the physical items are in danger and could disappear or be ruined. the authors employed the getty thesaurus of geographic names because it supports a hierarchical location structure. the location names at each level can be reconciled and have their own uris. the authors also paid extra attention on the subject headings. figure 5 shows how the authors used library of congress subject headings, local subject headings assigned by the researchers, and the getty art and architecture thesaurus for this collection. in the data reconciliation stage, the metadata can be compared against both library of congress subject headings authority files and the getty aat vocabularies so that as many uris as possible can be fetched and added to the metadata. the focus information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 10 on geographic names and subject headings is to standardize the data and use controlled vocabularies as much as possible. once moving to the linked data world, the data will be ready to be added with uris. therefore, the data can be linked easily and widely. figure 5. amenaprkitch khachkar metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 11 one of the goals of building linked data is to make sense out of data and to generate new knowledge. as the librarians explored how to bring together multiple usf digital collections to highlight african american history and culture, three collections seemed particularly appropriate: • an african american sheet music collection from the early 20th century (https://digital.lib.usf.edu/sheet-music-aa) • the “narratives of formerly enslaved floridians” collection from 1930s (https://digital.lib.usf.edu/fl-slavenarratives) • the “otis r. anthony african american oral history collection” from 19781979(https://digital.lib.usf.edu/ohp-otisanthony) these collections are all oral expressions of african american life in the us. they span the first three-quarters of the 20th century around the time of the civil rights movement. creating linked data out of these collections will help shed light on the life of african americans through the 20th century and how it related to the civil rights movement. with semantic web technology support, these collections can be turned into machine actionable datasets to assist research and education activities on racism, anti-racism and to piece into the holistic knowledge base. usf libraries started to partner with dpla in 2018. dpla leverages linked data technology to increase discoverability of the collections contributed to it. dpla employs javascript object notation for linked data (json-ld) as its serialization for their data which is in rdf/xml format. json-ld has a method of identifying data with iris. the use of this method can effectively avoid data ambiguity considering dpla is holding a fairly large amount of data. json-ld also provides computational analysis in support of semantics services which enriches the metadata and in results, the search will be more effective.35 in the 18 months since usf began contributing selected digital collections to dpla, usf materials have received more than 12,000 views. it is exciting to see the increase in the usage of the collections and it is the hope that they will be accessed by more diverse user groups. usf libraries are exploring ways to scale up the project and eventually transition all the existing digital collections metadata to linked data. one possible way of achieving this goal would be through metadata standardization. a pilot project at usf libraries is to process one medium-size image collection of 998 items. the original metadata is in mods/mets xml files. we first decided to use the dpla metadata application profile as the data model. if the pilot project is successful, we will apply this model to all of our linked data transformation processes. in our pilot, we are examining the fields in our mods/mets metadata and identify those that will be meaningful in the new metadata schema. then we transport the metadata in those fields to excel files. the next step is to use openrefine to reconcile the data in these excel files to fetch uris for exact match terms. during this step, we are employing reconciliation services from the library of congress, getty tgn, and wikidata. after all the metadata is reconciled, we are transforming the excel file to triples. the column headers of the excel file become the predicates and the metadata as well as their uris will be the objects of the triples. next, these triples will be stored in an apache jena triple-store database so that we can start designing sparql queries to facilitate search. the final step will be designing a user-friendly interface to further optimize the user experiences. in this process, to make the workflow as scalable as possible, we are focusing on testing two processes: first, creating a universal metadata application profile to apply to the most, if not all, of the collections; and second, only fetching uris for exactly matching terms during the reconciliation information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 12 process. both of these processes aim to reduce human interactions with the metadata so that the process is more affordable to the library. conclusion and future work linked data can help collection discoverability. in the past six months, usf has seen an increase in materials going online. usf special collections department rapidly created digital exhibits to showcase their materials. if the trend in remote work continues, there is reason to believe that digital materials may be increasingly present and, given enough time and expertise, libraries can leverage linked data to better support current and new collections. the societal impact of covid-19 worldwide sheds light on the importance of technologies such as linked data that can help increase discoverability. when items are being created and shared online, either directly related to covid-19 or a result of its impact, linked data can help connect those resources. for instance, new covid-19 research is being developed and published daily. the publications office of the european union datathon entry “covid-19 data as linked data” states that “[t]he benefit of having covid-19 data as linked data comes from the ability to link and explore independent sources. for example, covid-19 sources often do not include other regional or mobility data. then, even the simplest thing, having the countries not as a label but as their uri of wikidata and dbpedia, brings rich possibilities for analysis by exploring and correlating geographic, demographic, relief, and mobility data.”36 the more institutions that contribute to this, the greater the discoverability and impact of the data. in 2020 there has been an increase in black lives matter awareness across the country. this affects higher education. usf libraries are not the only ones engaged in addressing racial disparities. many institutions have been doing this for years. others are beginning to focus on this area. no matter whether it’s a new digital collection or one that’s been around for decades, the question remains: how do people find these resources? perhaps linked data technologies can help solve that problem. linked data is a technology that can help accentuate the human effort put forth to create those collections. linked data is a way to assist humans and computers in finding interconnected materials around the internet. usf libraries faced many obstacles implementing linked data. there is a technological barrier that takes well-trained staff to surmount, i.e., creating a linked data triple store database and having linked data interact correctly on webpages. there is a time commitment necessary to create the triples and sparql queries. sparql queries themselves vary from being relatively simple to incredibly complicated. the authors also had the stumbling block of understanding how linked data worked together on a theoretical level. taking all of these considerations into account, we can say that creating linked data for a digital collection is not for the faint of heart. a cost/benefit analysis must be taken and assessed. the authors of this paper must continue to determine the need for linked data. at usf, the authors have taken the first steps in converting digital collections into linked data. we’ve moved from understanding the theoretical basis of linked data and into the practical side where the elements that make up linked data start coming together. the work to create triples, sparql queries, and uris has begun, and full implementation has started. our linked data group has learned the fundamentals of linked data. the next, and current, step is to develop workflows for existing metadata conversion into appropriate linked data. the group meets regularly and has created a triple store database and converted data into linked data. while the process is slow information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 13 moving due to group members’ other commitments, progress is being made by looking at the most relevant collections we would like to transform and moving forward from there. we’ve located the collections we want to work on, taking an iterative approach to creating linked data as we go. with linked data, there is a lot to consider. how do you start up a linked data program at your institution? how will you get the required expertise to create appropriate and high-quality linked data? how will your institution crosswalk existing data into triples format? is it worth the investment? it may be difficult to answer these questions but they’re questions that must be addressed. the usf libraries will continue pursuing linked data in meaningful ways and showcasing linked data’s importance. linked data can help highlight all collections but more importantly those of marginalized groups, which is a priority of the linked data group. endnotes 1 peter perl, “what is the future of truth?” pew trust magazine, february 4, 2019, https://www.pewtrusts.org/en/trust/archive/winter-2019/what-is-the-future-of-truth. 2 “anti-racism reading lists,” university of minnesota library, accessed september 24, 2020, https://libguides.umn.edu/antiracismreadinglists. 3 “triad black lives matter protest collection,” unc greensboro digital collections, accessed december 9, 2020, http://libcdm1.uncg.edu/cdm/blm. 4 “umbra search african american history,” umbra search, accessed december 10, 2020, https://www.umbrasearch.org/. 5 karen coyle, “on the web, of the web” (keynote at lita, october 1, 2011), https://kcoyle.net/presentations/lita2011.html. 6 donna ellen frederick, “disruption or revolution? the reinvention of cataloguing (data deluge column),” library hi tech news 34, no. 7 (2017): 6–11, https://doi.org/10.1108/lhtn-072017-0051. 7 tim berners-lee, “linked data,” w3, last updated june 18, 2009, https://www.w3.org/designissues/linkeddata.html. 8 neil wilson, “linked data prototyping at the british library” (paper presentation, talis linked data and libraries event, 2010). 9 diane rasmussen pennington and laura cagnazzo, “connecting the silos: implementations and perceptions of linked data across european libraries,” journal of documentation 75, no. 3 (2019): 643–66, https://doi.org/10.1108/jd-07-2018-0117. 10 jane hagerlid, “the role of the national library as a catalyst for an open access agenda: the experience in sweden,” interlending and document supply 39, no. 2 (2011): 115–18, https://doi.org/10.1108/02641611111138923. 11 pennington and cagnazzo, “connecting the silos,” 643–66. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 14 12 gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010): 2, https://doi.org/10.1045/november2010-byrne. 13 bendik bygstad, gheorghita ghinea, and geir-tore klæboe, “organisational challenges of the semantic web in digital libraries: a norwegian case study,” online information review 33, no. 5 (2009): 973–85, https://doi.org/10.1108/14684520911001945. 14 pennington and cagnazzo, “connecting the silos,” 643–66. 15 heather lea moulaison and anthony j. million, “the disruptive qualities of linked data in the library environment: analysis and recommendations,” cataloging & classification quarterly 52, no. 4 (2014): 367–87, https://doi.org/10.1080/01639374.2014.880981. 16 marshall breeding, “linked data: the next big wave or another tech fad?” computers in libraries 33, no. 3 (2013): 20–22. 17 moulaison and million, “the disruptive qualities of linked data,” 369. 18 nuno freire and sjors de valk, “automated interpretability of linked data ontologies: an evaluation within the cultural heritage domain,” (workshop, ieee conference on big data, 2019). 19 “bibframe update forum at the ala annual conference 2018,” (washington, dc: library of congress, july 2018), https://www.loc.gov/bibframe/news/bibframe-update-an2018.html. 20 jacquie samples and ian bigelow, “marc to bibframe: converting the pcc to linked data,” cataloging & classification quarterly 58, no. 3–4 (2020): 404. 21 oliver pesch, “using bibframe and library linked data to solve real problems: an interview with eric miller of zepheira,” the serials librarian 71, no. 1 (2016): 2. 22 pesch, 2. 23 gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library, archives & information science 4, no. 1 (2013): 25–49, http://dx.doi.org/10.4403/jlis.it-8587. 24 “resource description framework (rdf),” w3c, february 25, 2014, https://www.w3.org/rdf/. 25 tim berners-lee, james hendler, and ora lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43, https://www.jstor.org/stable/26059207. 26 dean allemang and james hendler, “semantic web application architecture,” in semantic web for the working ontologist: effective modeling in rdfs and owl, (saint louis: elsevier science, 2011): 54–55. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 15 27 philip e. schreur and amy j. carlson, “bridging the worlds of marc and linked data: transition, transformation, accountability,” serials librarian 78, no. 1–4 (2020), https://doi.org/10.1080/0361526x.2020.1716584. 28 “about us,” dpla: digital public library of america, accessed december 11, 2020. https://dp.la/about. 29 “about us,” europeana, accessed december 11, 2020, https://www.europeana.eu/en/about-us. 30 liyang yu, “search engines in both traditional and semantic web environments,” in introduction to semantic web and semantic web services (boca raton: chapman & hall/crc, 2007): 36. 31 schreur and carlson, “bridging the worlds of marc and linked data.” 32 “audubon florida tavernier research papers,” university of south florida libraries digital collections, accessed november 30, 2020, https://lib.usf.edu/?a64/. 33 berners-lee, “linked data,” https://www.w3.org/designissues/linkeddata.html. 34 “the armenian heritage and social memory program,” university of south florida libraries digital collections, accessed november 30, 2020, https://digital.lib.usf.edu/armenianheritage/. 35 erik t. mitchell, “three case studies in linked open data,” library technology reports 49, no. 5 (2013): 26-43. 36 “covid-19 data as linked data,” publications office of the european union, accessed december 11, 2020, https://op.europa.eu/en/web/eudatathon/covid-19-linked-data. usability test results for encore in an academic library megan johnson information technology and libraries | september 2013 59 abstract this case study gives the results a usability study for the discovery tool encore synergy, an innovative interfaces product, launched at appalachian state university belk library & information commons in january 2013. nine of the thirteen participants in the study rated the discovery tool as more user friendly, according to a sus (standard usability scale) score, than the library’s tabbed search layout, which separated the articles and catalog search. all of the study’s participants were in favor of switching the interface to the new “one box” search. several glitches in the implementation were noted and reported to the vendor. the study results have helped develop belk library training materials and curricula. the study will also serve as a benchmark for further usability testing of encore and appalachian state library’s website. this article will be of interest to libraries using encore discovery service, investigating discovery tools, or performing usability studies of other discovery services. introduction appalachian state university’s belk library & information commons is constantly striving to make access to libraries resources seamless and simple for patrons to use. the library’s technology services team has conducted usability studies since 2004 to inform decision making for iterative improvements. the most recent versions (since 2008) of the library’s website have featured a tabbed layout for the main search box. this tabbed layout has gone through several iterations and a move to a new content management system (drupal). during fall semester 2012, the library website’s tabs were: books & media, articles, google scholar, and site search (see figure 1). some issues with this layout, documented in earlier usability studies and through anecdotal experience, will be familiar to other libraries who have tested a tabbed website interface. user access issues include the belief of many patrons that the “articles” tab looked for all articles the library had access to. in reality the “articles” tab searched seven ebsco databases. belk library has access to over 400 databases. another problem noted with the tabbed layout was that patrons often started typing in the articles box, even when they knew they were looking for a book or dvd. this is understandable, since when most of us see a search box we just start typing, we do not read all the information on the page. megan johnson (johnsnm@appstate.edu) is e-learning and outreach librarian, belk library and information commons, appalachian state university, boone, nc. mailto:johnsnm@appstate.edu usability test results for encore in an academic library | johnson 60 figure 1. appalachian state university belk library website tabbed layout search, december 2012. a third documented user issue is confusion over finding an article citation. this is a rather complex problem, since it has been demonstrated through assessment of student learning that many students cannot identify the parts of a citation, so this usability issue goes beyond the patron being able navigate the library’s interface, it is partly a lack of information literacy skills. however, even sophisticated users can have difficulty in determining if the library owns a particular journal article. this is an ongoing interface problem for belk library and many other academic libraries. google scholar (gs) often works well for users with a journal citation, since on campus they can often simply copy and paste a citation to see if the library has access, and, if so, the full text it is often is available in a click or two. however, if there are no results found using gs, the patrons are still not certain if the library owns the item. background in 2010, the library formed a task force to research the emerging market of discovery services. the task force examined summon, ebsco discovery service, primo and encore synergy and found the products, at that time, to still be immature and lacking value. in april 2012, the library reexamined the discovery market and conducted a small benchmarking usability study (the results are discussed in the methodology section and summarized in appendix a). the library felt enough improvements had been made to innovative interface’s encore information technology and libraries | september 2013 61 synergy product to justify purchasing this discovery service. an encore synergy implementation working group was formed, and several subcommittees were created, including end-user preferences, setup & access, training, and marketing. to help inform the decision of these subcommittees, the author conducted a usability study in december 2012, which was based on, and expanded upon, the april 2012 study. the goal of this study was to test users’ experience and satisfaction with the current tabbed layout, in contrast to the “one box” encore interface. the library had committed to implementing encore synergy, but there are options in layout of the search box on the library’s homepage. if users expressed a strong preference for tabs, the library could choose to leave a tabbed layout for access to the articles part of encore, for the catalog part, and create tabs for other options like google scholar, and a search of the library’s website. a second goal of the study was to benchmark the user experience for the implementation of encore synergy so that, over time, improvements could be made to promote seamless access to appalachian state university library’s resources. a third goal of this study was to document problems users encountered and report them to innovative. figure 2. appalachian state university belk library website encore search, january 2013. usability test results for encore in an academic library | johnson 62 literature review there have been several recent reviews of the literature on library discovery services. thomsettscott and reese conclude that discovery tools are a mixed blessing. 1 users can easily search across abroad areas of library resources and limiting by facets is helpful. downsides include loss of individual database specificity and user willingness to look beyond the first page of results. longstanding library interface problems, such as patrons’ lack of understanding of holding statements, and knowing when to it is appropriate to search in a discipline specific database are not solved by discovery tools.2 in a recent overview of discovery services, hunter lists four vendors whose products have both a discovery layer and a central index: ebsco’s discovery service (eds); ex libris’ primo central index; serials solutions’ summon; and oclc’s worldcat local (wcl). 3 encore does not have currently offer a central index or pre-harvested metadata for articles, so although encore has some of the features of a discovery service, such as facets and connections to full text, it is important for libraries considering implementing encore to understand that the part of encore that searches for articles is a federated search. when appalachian purchased encore, not all the librarians and staff involved in the decision making were fully aware of how this would affect the user experience. further discussion of this in the “glitches revealed” section. fagan et al. discuss james madison university’s implementation of ebsco discovery service and their customizations of the tool. they review the literature of discovery tools in several areas, including articles that discuss the selection processes, features, and academic libraries’ decisions process following selection. they conclude, the “literature illustrates a current need for more usability studies related to discovery tools.” 4 the most relevant literature to this study are case studies documenting a library’s experience with implementing a discovery services and task based usability studies of discovery services. thomas and buck5 sought to determine with a task based usability study whether users were as successful performing common catalog-related tasks in worldcat local (wcl) as they are in the library’s current catalog, innovative interfaces’ webpac. the study helped inform the library’s decision, at that time, to not implement wcl. beecher and schmidt6 discuss american university’s comparison of wcl and aquabrowser (two discovery layers), which were implemented locally. the study focused on user preferences based on students “normal searching patterns” 7 rather than completion of a list of tasks. their study revealed undergraduates generally preferred wcl, and upperclassmen and graduates tended to like aquabrower better. beecher and schmidt discuss the research comparing assigned tasks versus user-defined searches, and report that a blend of these techniques can help researchers understand user behavior better.8 information technology and libraries | september 2013 63 this article reports on a task-based study, in which the last question asks the participant to research something they had looked for within the past semester, and the results section indicates that the most meaningful feedback came from watching users research a topic they had a personal interest in. having assigned tasks also can be very useful. for example, an early problem noted with discovery services was poor search results for specific searches on known items, such as the book “the old man and the sea.” assigned tasks also give the user a chance to explore a system for a few searches, so when they search for a topic of personal interest, it is not their first experience with a new system. blending assigned tasks with user tasks proved helpful in this study’s outcomes. encore synergy has not yet been the subject of a formally published task-based usability study. allison reports on an analysis of google analytic statistics at university of nebraska-lincoln after encore was implemented.9 the article concludes that encore increases the user’s exposure to all the library’s holdings, describes some of the challenges unl faced and gives recommendations for future usability studies to evaluate where additional improvements should be made. the article also states unl plans to conduct future usability studies. although there are not yet formal published task-based studies on encore, at least one blogger from southern new hampshire university documented their implementation of the service. singley reported in 2011, “encore synergy does live up to its promise in presenting a familiar, user-friendly search environment.10 she points out, “to perform detailed article searches, users still need to link out to individual databases.” this study confirms that users do not understand that articles are not fully indexed and integrated; articles remain, in encore’s terminology, in “database portfolios.” see the results section, task 2, for a fuller discussion of this topic. method this study included a total of 13 participants. these included four faculty members, and six students recruited through a posting on the library’s website offering participants a bookstore voucher. three student employees were also subjects (these students work in the library’s mailroom and received no special training on the library’s website). for the purposes of this study, the input of undergraduate students, the largest target population of potential novice users, was of most interest. table 3 lists demographic details of the student or faculty’s college, and for students, their year. this was a task-based study, where users were asked to find a known book item and follow two scenarios to find journal articles. the following four questions/tasks were handed to the users on a sheet of paper: 1. find a copy of the book the old man and the sea. 2. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. http://www.snhu.edu/ usability test results for encore in an academic library | johnson 64 3. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. 4. what is a topic you have written about this year? search for materials on this topic. the follow up questions where verbally asked either after a task, or asked as prompts while the subject was working. 1. after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask: “would you know where to find this book in the library?” 2. how much of the library’s holdings do you think appsearch/ articles quick search is looking across? 3. does “peer reviewed” mean the same as “scholarly article”? 4. what does the “refine by tag” block the right mean to you? 5. if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? participants were recorded using techsmith’s screen-casting software camtasia, which allows the user’s face to be recorded along with their actions on the computer screen. this allows the observer to not rely solely on notes or recall. if the user encounters a problem with the interface, having the session recorded makes it simple to create (or recreate) a clip to show the vendor. in the course of this study, several clips were sent to innovative interfaces, and they were responsive to many of the issues revealed. further discussion is in the “glitches revealed” section. seven of the subjects first used the library site’s tabbed layout (which was then the live site) as seen in figure 1. after they completed the tasks, participants filled in a system usability scale (sus) form. the users then completed the same tasks on the development server using encore synergy. participants next filled out a sus form to reflect their impression of the new interface. encore is locally branded as appsearch and the terms are used interchangeably in this study. the six other subjects started with the appsearch interface on a development server, completed a sus form, and then did the same tasks using the library’s tabbed interface. the time it took to conduct the studies was ranged from fifteen to forty minutes per participant, depending on how verbal the subject was, and how much they wanted to share about their impressions and ideas for improvement. jakob nielson has been quoted as saying you only need to test with five users: “after the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.”11 he argues for doing tests with a small number of users, making iterative improvements, and then retesting. this is certainly a valid and ideal approach if you have full control of the design. in the case of a vendor-controlled product, there are serious limitations to what the information technology and libraries | september 2013 65 librarians can iteratively improve. the most librarians can do is suggest changes to the vendor, based on the results of studies and observations. when evaluating discovery services in the spring of 2012, appalachian state libraries conducted a four person task based study (see appendix a), which used university of nebraska at lincoln’s implementation of encore as a test site to benchmark our students’ initial reaction to the product in comparison to the library’s current tabbed layout. in this small study, the average sus score for the library’s current search box layout was 62, and for unl’s implementation of encore, it was 49. this helped inform the decision of belk library, at that time, not to purchase encore (or any other discovery service), since students did not appear to prefer them. this paper reports on a study conducted in december 2012 that showed a marked improvement in users’ gauge of satisfaction with encore. several factors could contribute to the improvement in sus scores. first is the larger sample size of 13 compared to the earlier study with four participants. another factor is in the april study, participants were using an external site they had no familiarity with, and a first experience with a new interface is not a reliable gauge of how someone will come to use the tool over time. this study was also more robust in that it added the task of asking the user to search for something they had researched recently and the follow up questions were more detailed. overall it appears that, in this case, having more than four participants and a more robust design gave a better representation of user experience. the system usability scale (sus) the system usability scale has been widely used in usability studies since its development in 1996. many libraries use this tool in reporting usability results.12,13 it is simple to administer, score, and understand the results.14 sus is an industry standard with references in over 600 publications.15 an “above average” score is 68. scoring a scale involves a formula where odd items have one subtracted from the user response, and with even numbered items, the user response is subtracted from five. the total converted responses are added up, and then multiplied by 2.5. this makes the answers easily grasped on the familiar scale of 1-100. due to the scoring method, it is possible that results are expressed with decimals.16 a sample sus scale is included in appendix d. results the average sus score for the 13 users for encore was 71.5, and for the tabbed layout, the average sus score was 68. this small sample set indicates there was a user preference for the discovery service interface. in a relatively small study like this, these results do not imply a scientifically valid statistical measurement. as used in this study, the sus scores are simply a way to benchmark how “usable” the participants rated the two interfaces. when asked the subjective follow up question, “if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend?” 100% of the participants recommended the library change to appsearch, (although four users actually rated usability test results for encore in an academic library | johnson 66 the tabbed layout with a higher sus score). these four participants said things along the lines of, “i can get used to anything you put up.” participant sus sus year and major or college appsearch first encore tabbed layout student a 90 70 senior/social work/female no student b 95 57.5 freshman/undeclared/male yes student c 82.5 57.5 junior/english/male yes student d 37.5 92 sophomore/actuarial science/female yes student e 65 82.5 junior/psychology/female yes student f 65 77.5 senior/sociology/female no student g 67.5 75 junior/music therapy/female no student h 90 82.5 senior/dance/female no student i 60 32.5 senior/political science/female no faculty a 40 87.5 family & consumer/science/female yes faculty b 80 60 english/male no faculty c 60 55 education/male no faculty d 97.5 57.5 english/male yes average 71.5 68 table 1. demographic details and individual and average sus scores. discussion task 1: “find a copy of the book the old man and the sea.” all thirteen users had faster success using encore. when using encore, this “known item” is in the top three results. encore definitely performed better than the classic catalog in saving the time of the user. in approaching task 1 from the tabbed layout interface, four out of thirteen users clicked on the books and media tab, changed the drop down search option to “title,” and were (relatively) quickly successful. the remaining nine who switched to the books and media tab and used the default keyword search for “the old man and the sea” had to scan the results (using this search method, the book is the seventh result in the classic catalog), which took two users almost 50 seconds. this length of time, for an “average user” to find a well-known book is not considered to be acceptable to the technology services team at appalachian state university. when using the encore interface, the follow up question for this task was, “would you know where to find this book in the library?” nine out of 13 users did not know where the book would be, or information technology and libraries | september 2013 67 how to find it. the three faculty members and student d could pick out the call number and felt they could locate the book in the stacks. figure 3. detail of the screen of results for searching for “the old man and the sea”. the classic catalog that most participants were familiar with has a “map it” feature (from the third party vendor stackmap), and encore did not have that feature incorporated yet. since this study has been completed, the “map it” has been added to the item record in appsearch. further research can determine if students will have a higher level of confidence in their ability to locate a book in the stacks when using encore. figure 3 shows the search as it appeared in december 2012 and figure 4 has the “map it” feature implemented and pointed out with a red arrow. related to this task of searching for a known book, student b commented that in encore, the icons were very helpful in picking out media type. figure 4. book item record in encore. the red arrow indicates the “map it” feature, an add-on to the catalog from the vendor stackmap. browse results are on the right, and only pull from the catalog results. when using the tabbed layout interface (see figure 1), three students typed the title of the book into the “articles” tab first, and it took them a few moments figure out why they had a problem with the results. they were able to figure it out and re-do the search in the “correct” books & usability test results for encore in an academic library | johnson 68 media tab, but student d commented, “i do that every time!” this is evidence that the average user does not closely examine a search box--they simply start typing. task 2: “in your psychology class, your professor has assigned you a five-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem.” this question revealed, among other things, that seven out of the nine students did not fully understand the term scholarly or peer reviewed article are meant to be synonyms in this context. when asked the follow up question “what does ‘peer reviewed’ mean to you?” student b said, “my peers would have rated it as good on the topic.” this is the kind of feedback that librarians and vendors need to be aware of in meeting students’ expectations. users have become accustom to online ratings by their peers of hotels and restaurants, so the terminology academia uses may need to shift. further discussion on this is in the “changes suggested” section below. figure 5. typical results for task two. figure 5 shows a typical user result for task 2. the follow up question asked users “what does the refine by tag box on the right mean to you?” student g reported they looked like internet ads. other users replied with variations of, “you can click on them to get more articles and stuff.” in fact, the “refine by tag” box in the upper right column top of screen contains only indexed terms from the subject heading of the catalog. this refines the current search results to those with the specific subject term the user clicked on. in this study, no user clicked on these tags. information technology and libraries | september 2013 69 for libraries considering purchasing and implementing encore, a choice of skins is available, and it is possible to choose a skin where these boxes do not appear. in addition to information from innovative interfaces, libraries can check a guide maintained by a librarian at saginaw valley state university17 to see examples of encore synergy sites, and links to how different skins (cobalt, pearl or citrus) affect appearance. appalachian uses the “pearl” skin. figure 6. detail of screenshot in figure 5. figure 6 is a detail of the results shown in the screenshot for average search for task 2. the red arrows indicate where a user can click to just see article results. the yellow arrow indicates where the advanced search button is. six out of thirteen users clicked advanced after the initial search results. clicking on the advanced search button brought users to a screen pictured in figure 7. usability test results for encore in an academic library | johnson 70 figure 7. encore's advanced search screen. figure 7 shows the encore’s advanced search screen. this search is not designed to search articles; it only searches the catalog. this aspect of advanced search was not clear to any of the participants in this study. see further discussion of this issue in the “glitches revealed” section. information technology and libraries | september 2013 71 figure 8. the "database portfolio" for arts & humanities. figure 8 shows typical results for task 2 limited just to articles. the folders on the left are basically silos of grouped databases. innovative calls this feature “database portfolios.” in this screen shot, the results of the search narrowed to articles within the “database portfolio” of arts & humanities. clicking on the individual databases return results from that database, and moves the usability test results for encore in an academic library | johnson 72 user to the database’s native interface. for example, in figure 8, clicking on art full text would put the user into that database, and retrieve 13 results. while conducting task 2, faculty member a stressed she felt it was very important students learn to use discipline specific databases, and stated she would not teach a “one box” approach. she felt the tabbed layout was much easier than appsearch and rated the tabbed layout in her sus score with a 87.5 versus the 40 she gave encore. she also wrote on the sus scoring sheet “appsearch is very slow. there is too much to review.” she also said that the small niche showing how to switch results between “books & more” to article was “far too subtle.” she recommended bold tabs, or colors. this kind of suggestion librarians can forward to the vendor, but we cannot locally tweak this layout on a development server to test if it improves the user experience. figure 9. closeup of switch for “books & more” and “articles” options. task 3: “you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us.” most users did not have much difficulty finding an article using encore, though three users did not immediately see a way to limit only to articles. of the nine users who did narrow the results to articles, five used facets to further narrow results. no users moved beyond the first page of results. search strategy was also interesting. all thirteen users appeared to expect the search box to work like google. if there were no results, most users went to the advanced search, and reused the same terms on different lines of the boolean search box. once again, no users intuitively understood that “advanced search” would not effectively search for articles. the concept of changing search terms was not a common strategy in this test group. if very few results came up, none of the users clicked on the “did you mean” or used suggestions for correction in spelling or change in terms supplied by encore. during this task, two faculty members commented on load time. they said students would not wait, results had to be instant. but when working with students, when the author asked how they felt when load time was slow, students almost all said it was fine, or not a problem. they could “see it was working.” one student said, “oh, i’d just flip over to facebook and let the search run.” so perhaps librarians should not assume we fully understand student user expectations. it is also information technology and libraries | september 2013 73 worth noting that, for the participant, this is a low-stakes usability study, not crunch time, so attitudes may be different if load time is slow for an assignment due in a few hours. task 4: “what is a topic you have written about this year? search for materials on this topic.” this question elicited the most helpful user feedback, since participants had recently conducted research using the library’s interface and could compare ease of use on a subject they were familiar with. a few specific examples follow. student a, in response to the task to research something she had written about this semester, looked for “elder abuse.” she was a senior who had taken a research methods class and written a major paper on this topic, and she used the tabbed layout first. she was familiar with using the facets in ebsco to narrow by date, and to limit to scholarly articles. when she was using appsearch on the topic of elder abuse, encore held her facets “full text” and “peer reviewed” from the previous search on china and u.s. foreign relations. an example of encore “holding a search” is demonstrated in figures 10 and 11 below. student a was not bothered by the encore holding limits she had put on a previous search. she noticed the limits, and then went on to further narrow within the database portfolio of “health” which limited the results to the database cinahl first. she was happy with being able to limit by folder to her discipline. she said the folders would help her sort through the results. student g’s topic she had researched within the last semester was “occupational therapy for students with disabilities” such as cerebral palsy. she understood through experience, that it would be easiest to narrow results by searching for ‘occupational therapy’ and then add a specific disability. student g was the user who made the most use of facets on the left. she liked encore’s use of icons for different types of materials. student b also commented on “how easy the icons made it.” faculty b, in looking for the a topic he had been researching recently in appsearch, typed in “writing across the curriculum glossary of terms” and got no results on this search. he said, “mmm, well that wasn’t helpful, so to me, that means i’d go through here” and he clicked on the google search box in the browser bar. he next tried removing “glossary of terms” from his search and the load time was slow on articles, so he gave up after ten seconds and clicked on “advanced search” and tried putting “glossary of terms” in the second line. this led to another dead end. he said, “i’m just surprised appalachian doesn’t have anything on it.” the author asked if he had any other ideas about how to approach finding materials on his topic from the library’s homepage and he said no, he would just try google (in other words, navigating to the group of databases for education was not a strategy that occurred to him). usability test results for encore in an academic library | johnson 74 the faculty member d had been doing research on a relatively obscure historical event and was able to find results using encore. when asked if he had seen the articles before, he said, “yes, i’ve found these, but it is great it’s all in one search!” glitches revealed it is of concern for the user experience that the advanced search of encore does not search articles; it only searches the catalog. this was not clear to any participant in this study. as noted earlier, encore’s article search is a federated search. this affects load time for article results, and also puts the article results into silos, or to use encore’s terminology, “database portfolios.” encore’s information on their website definitely markets the site as a discovery tool, saying, it “integrates federated search, as well as enriched content—like first chapters—and harvested data… encore also blends discovery with the social web. 18” it is important for libraries considering purchase of encore that while it does have many features of a discovery service, it does not currently have a central index with pre-harvested metadata for articles. if innovative interfaces is going to continue to offer an advanced search box, it needs to be made explicitly clear that the advanced search is not effective for searching for articles, or innovative interfaces needs to make an advanced search work with articles by creating a central index. to cite a specific example from this study, when student e was using appsearch, with all the tasks, after she ran a search, she clicked on the advanced search option. the author asked her, “so if there is an advanced search, you’re going to use it?” the student replied, “yeah, they are more accurate.” another aspect of encore that users do not intuitively grasp is that when looking at the results for an article search, the first page of results comes from a quick search of a limited number of databases (see figure 8). the users in this study did understand that clicking on the folders will narrow by discipline, but they did not appear to grasp that the result in the database portfolios are not included in the first results shown. when users click on an article result, they are taken to the native interface (such as psych info) to view the article. users seemed un-phased when they went into a new interface, but it is doubtful they understand they are entering a subset of appsearch. if users try to add terms or do a new search in the native database they may get relevant results, or may totally strike out, depending on chosen database’s relevance to their research interest. information technology and libraries | september 2013 75 figure 10. changing a search in encore. another problem that was documented was that after users ran a search, if they changed the text in the “search” box, the results for articles did not change. figure six demonstrates the results from task 2 of this study, which asks users to find information on anorexia and self-esteem. the third task asks the user to find information on china and foreign relations. figure 10 demonstrates the results for the anorexia search, with the term “china” in the search box, just before the user clicks enter, or the orange arrow for new search. figure 11. search results for changed search. figure 11 show that the search for the new term, “china” has worked in the catalog, but the results for articles are still about anorexia. in this implementation of encore, there is no “new search button” (except in the advanced search page, there is a “reset search” button, see figure 7) and usability test results for encore in an academic library | johnson 76 refreshing the browser is had no effect on this problem. this issue was screencast19 and sent to the vendor. happily, as of april 2013, innovative interfaces appears to have resolved this underlying problem. one purpose of this study was to determine if users had a strong preference for tabs, since the library could choose to implement encore with tabs (one for access to articles, one for the catalog, and other tab options like google scholar). this study indicated users did not like tabs in general, they much preferred a “one box solution” on first encounter. a major concern raised was the user’s response to the question, “how much of the library’s holdings do you think appsearch/ articles quick search is looking across?” twelve out of thirteen users believed that when they were searching for articles from the quick search for articles tabbed layout, they were searching all the library databases. the one exception to this was a faculty member in the english department, who understood that the articles tab searched a small subset of the available resources (seven ebsco databases out of 400 databases the library subscribes to). all thirteen users believed appsearch (encore) was searching “everything the library owned.” the discovery service searches far more resources than other federated searches the library has had access to in the past, but it is still only searching 50 out of 400 databases. it is interesting that in the fagan et al. study of ebsco’s discovery service, only one out of ten users in that study believed the quick search would search “all” the library’s resources.20 a glance at james madison university’s library homepage21 suggests wording that may improve user confusion. figure 12. screenshot of james madison library homepage, accessed december 18, 2012. information technology and libraries | september 2013 77 figure 13. original encore interface as implemented in january 2013. given the results that 100% of the users believed that appsearch looked at all databases the library has access to, the library made changes to the wording in the search box. (see figure 7). future tests can determine if this has any positive effect on the understanding of what appsearch includes. figure 14. encore search box after this usability study was completed. the arrow highlights additions to the page as a result of this study. some other wording changes suggested were from the finding that only seven out of nine students fully understood that “peer reviewed” would limit to scholarly articles. a suggestion was made to innovative interfaces to change the wording to “scholarly (peer reviewed)” and they did so in early january. although innovative’s response on this issue was swift, and may help students, changing the wording does not address the underlying information literacy issue of what students understand about these terms. interestingly, encore does not include any “help” pages. appalachian’s liaison with encore has asked about this and been told by encore tech support that innovative feels the product is so intuitive; users will not need any help. belk library has developed a short video tutorial for users, and local help pages are available from the library’s homepage, but according to innovative, a link to these resources cannot be added to the top right area of the encore screen (where help is commonly located in web interfaces). although it is acknowledged that few users actually read “help” pages, it seems like a leap of faith to think a motivated searcher will understand things like the “database portfolios” (see figures 9) without any instruction at all. after implementation, the usability test results for encore in an academic library | johnson 78 librarians here at appalachian conducted internally developed training for instructors teaching appsearch, and all agreed that understanding what is being searched and how to best perform a task such as an advanced article search is not “totally intuitive,” even for librarians. finally, some interesting search strategy patterns were revealed. on the second and third questions in the script (both having to do with finding articles) five of the thirteen participants had the strategy of putting in one term, then after the search ran, adding terms to narrow results using the advanced search box. although this is a small sample set, it was a common enough search strategy to make the author believe this is not an unusual approach. it is important for librarians and for vendors to understand how users approach search interfaces so we can meet expectations. further research the findings of this study suggest librarians will need to continue to work with vendors to improve discovery interfaces to meet users expectations. the context of what is being searched and when is not clear to beginning users in encore one aspect of this test was it was the participants’ first encounter with a new interface, and even student d, who was unenthused about the new interface (she called the results page “messy, and her sus score was 37.5 for encore, versus 92 for the tabbed layout) said that she could learn to use the system given time. further usability tests can include users who have had time to explore the new system. specific tasks that will be of interest in follow up studies of this report are if students have better luck in being able to know where to find the item in the stacks with the addition of the “map it” feature. locally, librarian perception is that part of the problem with this results display is simply visual spacing. the call number is not set apart or spaced so that it stands out as important information (see figure 5 for a screenshot). another question to follow up on will be to repeat the question, “how much of the library’s holdings do you think appsearch is looking across?” all thirteen users in this study believed appsearch was searching “everything the library owned.” based on this finding, the library made small adjustments to the initial search box (see figures 14 and 15 as illustration). it will be of interest to measure if this tweak has any impact. summary all users in this study recommended that the library move to encore’s “one box” discovery service instead of using a tabbed layout. helping users figure out when they should move to using discipline specific databases will most likely be a long-term challenge for belk library, and for other academic libraries using discovery services, but this will probably trouble librarians more than our users. information technology and libraries | september 2013 79 the most important change innovative interfaces could make to their discovery service is to create a central index for articles, which would improve load time and allow for an advanced search feature for articles to work efficiently. because of this study, innovative interfaces made a wording change in search results for article to include the word “scholarly” when describing peer reviewed journal articles in belk library’s local implementation. appalachian state university libraries will continue to conduct usability studies and tailor instruction and e-learning resources to help users navigate encore and other library resources. overall, it is expected users, especially freshman and sophomores, will like the new interface but will not be able to figure out how to improve search results, particularly for articles. belk library & information commons’ instruction team is working on help pages and tutorials, and will incorporate the use of encore into the library’s curricula. references 1 . thomsett-scott, beth, and patricia e. reese. "academic libraries and discovery tools: a survey of the literature." college & undergraduate libraries 19 (2012): 123-43. 2. ibid, 138. 3. hunter, athena. “the ins and outs of evaluating web-scale discovery services” computers in libraries 32, no. 3 (2012) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scalediscovery-services.shtml (accessed march 18, 2013) 4. fagan, jody condit, meris mandernach, carl s. nelson, jonathan r. paulo, and grover saunders. "usability test results for a discovery tool in an academic library." information technology & libraries 31, no. 1 (2012): 83-112. 5. thomas, bob., and buck, stephanie. oclc's worldcat local versus iii's webpac. library hi tech, 28(4) (2010), 648-671. doi: http://dx.doi.org/10.1108/07378831011096295 6. becher, melissa, and kari schmidt. "taking discovery systems for a test drive." journal of web librarianship 5, no. 3: 199-219 [2011]. library, information science & technology abstracts with full text, ebscohost (accessed march 17, 2013). 7. ibid, p. 202 8. ibid p. 203 9. allison, dee ann, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed march 17, 2013) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://dx.doi.org/10.1108/07378831011096295 usability test results for encore in an academic library | johnson 80 10. singley, emily. 2011 “encore synergy 4.1: a review” the cloudy librarian: musings about library technologies http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-areview/ [accessed march 20, 2013]. 11 . nielson, jakob. 2000. “why you only need to test with 5 users” http://www.useit.com/alertbox/20000319.html (accessed december 18, 2012]. 12. fagan et al, 90. 13. dixon, lydia, cheri duncan, jody condit fagan, meris mandernach, and stefanie e. warlick. 2010. "finding articles and journals via google scholar, journal portals, and link resolvers: usability study results." reference & user services quarterly no. 50 (2):170-181. 14. bangor, aaron, philip t. kortum, and james t. miller. 2008. "an empirical evaluation of the system usability scale." international journal of human-computer interaction no. 24 (6):574-594. doi: 10.1080/10447310802205776. 15. sauro, jeff. 2011. “measuring usability with the system usability scale (sus)” http://www.measuringusability.com/sus.php. [accessed december 7, 2012]. 16. ibid. 17. mellendorf, scott. “encore synergy sites” zahnow library, saginaw valley state university. http://librarysubjectguides.svsu.edu/content.php?pid=211211 (accessed march 23, 2013). 18. encore overview, “http://encoreforlibraries.com/overview/” (accessed march 21, 2013). 19. johnson, megan. videorecording made with jing on january 30, 2013 http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf96cb-26920f71014b 20. fagan et al. 91. 21. james madison university libraries, “http://www.lib.jmu.edu” (accessed december 18, 2012). http://emilysingley.wordpress.com/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://www.useit.com/alertbox/20000319.html http://www.measuringusability.com/sus.php http://librarysubjectguides.svsu.edu/content.php?pid=211211 http://encoreforlibraries.com/overview/ http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.lib.jmu.edu/ information technology and libraries | september 2013 81 appendix a pre-purchase usability benchmarking test in april 2012, before the library purchased encore, the library conducted a small usability study to serve as a benchmark. the study outlined in this paper follows the same basic outline, and adds a few questions. the purpose of the april study was to measure student perceived success and satisfaction with the current search system of books and articles appalachian uses compared with use of the implementation of encore discovery services at university of nebraska lincoln (unl). the methodology was four undergraduates completing a set of tasks using each system. two started with unl, and two started at appalachian’s library homepage. in the april 2012 study, the participants were three freshman and one junior, and all were female. all were student employees in the library’s mailroom, and none had received special training on how to use the library interface. after the students completed the tasks, they rated their experience using the system usability scale (sus). in the summary conclusion of that study, the average sus score for the library’s current search box layout was 62, and for unl’s encore search it was 49. even though none of the students was particularly familiar with the current library’s interface, it might be assumed that part of the higher score for appalachian’s site was simply familiarity. student comments from the small april benchmarking study included the following. the junior student said the unl site had "too much going on" and appalachian was "easier to use; more specific in my searches, not as confusing as compared to unl site." another student (a freshman), said she has "never used the library not knowing if she needed a book or an article." in other words, she knows what format she is searching for and doesn’t perceive a big benefit to having them grouped. this same student also indicated she had no real preference between appalachian or the unl. she believed students would need to take time to learn either and that unl is a "good starting place." usability test results for encore in an academic library | johnson 82 appendix b instructions for conducting the test notes: use firefox for the browser, set to “private browsing” so that no searches are held in the cache (search terms to not pop into the search box from the last subject’s search). in the bookmark toolbar, the only two tabs should be available “dev” (which goes to the development server) and “lib” (which goes to the library’s homepage). instruct users to begin each search from the correct starting place. identify students and faculty by letter (student a, faculty a, etc). script hi, ___________. my name is ___________, and i'm going to be walking you through this session today. before we begin, i have some information for you, and i'm going to read it to make sure that i cover everything. you probably already have a good idea of why we asked you here, but let me go over it again briefly. we're asking students and faculty to try using our library's home page to conduct four searches, and then ask you a few other questions. we will then have you do the same searches on a new interface. (note: half the participants to start at the development site, the other half start at current site). after each set of tasks is finished, you will fill out a standard usability scale to rate your experience. this session should take about twenty minutes. the first thing i want to make clear is that we're testing the interface, not you. you can't do anything wrong here. do you have any questions so far? ok. before we look at the site, i'd like to ask you just a few quick questions. what year are you in college? what are you majoring in? roughly how many hours a week altogether--just a ballpark estimate--would you say you spend using the library website? ok, great. hand the user the task sheet. do not read the instructions to the participant, allow them to read the directions for themselves. allow the user to proceed until they hit a wall or become frustrated. verbally encourage them to talk aloud about their experience. usability test results for encore in an academic library | johnson 83 written instructions for participants. find the a copy of the book the old man and the sea. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. what is a topic you have written about this year? search for materials on this topic. usability test results for encore in an academic library | johnson 84 appendix c follow up questions for participants (or ask as the subject is working) after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask “would you know where to find this book in the library?” how much of the library’s holdings do you think appsearch/ articles quick search is looking across? does “peer reviewed” mean the same as “scholarly article”? what does the “refine by tag” block the right mean to you? if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? do you have any questions for me, now that we're done? thank subject for participating. usability test results for encore in an academic library | johnson 85 appendix d sample system usability scale (sus) strongly strongly disagree agree i think that i would like to use this system frequently 1 2 3 4 5 i found the system unnecessarily complex 1 2 3 4 5 i thought the system was easy to use 1 2 3 4 5 i think that i would need the support of a technical person to be able to use this system 1 2 3 4 5 i found the various functions in this system were well integrated 1 2 3 4 5 i thought there was too much inconsistency in this system 1 2 3 4 5 i would imagine that most people would learn to use this system very quickly 1 2 3 4 5 i found the system very cumbersome to use 1 2 3 4 5 i felt very confident using the system 1 2 3 4 5 i needed to learn a lot of things before i could get going with this system 1 2 3 4 5 comments: 16 information technology and libraries | march 2009 mathew j. miles and scott j. bergstrom classification of library resources by subject on the library website: is there an optimal number of subject labels? the number of labels used to organize resources by subject varies greatly among library websites. some librarians choose very short lists of labels while others choose much longer lists. we conducted a study with 120 students and staff to try to answer the following question: what is the effect of the number of labels in a list on response time to research questions? what we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. at that point, response time increases significantly. no association between response time and relevance was found. i t is clear that academic librarians face a daunting task drawing users to their library’s web presence. “nearly three-quarters (73%) of college students say they use the internet more than the library, while only 9% said they use the library more than the internet for information searching.”1 improving the usability of the library websites therefore should be a primary concern for librarians. one feature common to most library websites is a list of resources organized by subject. libraries seem to use similar subject labels in their categorization of resources. however, the number of subject labels varies greatly. some use as few as five subject labels while others use more than one hundred. in this study we address the following question: what is the effect of the number of subject labels in a list on response times to research questions? n literature review mcgillis and toms conducted a performance test in which users were asked to find a database by navigating through a library website. they found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which database to select.”2 a review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. jeng in particular analyzed a great deal of the usability testing published concerning the digital library. the following are some of the points she summarized that were highly relevant to our study: n user “lostness”: users did not understand the structure of the digital library. n ambiguity of terminology: problems with wording accounted for 36 percent of usability problems. n finding periodical articles and subject-specific databases was a challenge for users.3 a significant body of research not specific to libraries provides a useful context for the present research. miller’s landmark study regarding the capacity of human shortterm memory showed as a rule that the span of immediate memory is about 7 ± 2 items.4 sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of elements having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. larson and czerwinski provide important insights into menuing structures. for example, increasing the depth (the number of levels) of a menu harms search performance on the web. they also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 however, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 this finding is somewhat contrary to a previous study by snowberry, parkinson, and sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 larson and czerwinksi recommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 zaphiris also corroborated that previous research concerning depth and breadth of the tree structure was true for the web. the deeper the tree structure, the slower the user performance.10 he also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 both the research and current practices are clear concerning the efficacy of hierarchical menu structures. thus it was not a focus of our research. the focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n background in preparation for this study, library subject lists were collected from a set of thirty library websites in the united mathew j. miles (milesm@byui.edu) is systems librarian and scott j. bergstrom (bergstroms@byui.edu) is director of institutional research at brigham young university–idaho in rexburg. classification of library resources by subject on the library website | miles and bergstrom 17 states, canada, and the united kingdom. we selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. to render some of these lists more usable, we made slight modifications. there were many similarities between label names. n research design participants were randomly assigned to one of twelve experimental groups. each experimental group would be shown one of the twelve lists that were selected for use in this study. roughly 90 percent of the participants were students. the remaining 10 percent of the participants were full-time employees who worked in these same departments. the twelve lists ranged in number of labels from five to seventy-two: n group a: 5 subject labels n group b: 9 subject labels n group c: 9 subject labels n group d: 23 subject labels n group e : 6 subject labels n group f: 7 subject labels n group g: 12 subject labels n group h: 9 subject labels n group i: 35 subject labels n group j: 28 subject labels n group k: 49 subject labels n group l: 72 subject labels each participant was asked to select a subject label from a list in response to eleven different research questions. the questions are listed below: 1. which category would most likely have information about modern graphical design? 2. which category would most likely have information about the aztec empire of ancient mexico? 3. which category would most likely have information about the effects of standardized testing on high school classroom teaching? 4. which category would most likely have information on skateboarding? 5. which category would most likely have information on repetitive stress injuries? 6. which category would most likely have information about the french revolution? 7. which category would most likely have information concerning walmart’s marketing strategy? 8. which category would most likely have information on the reintroduction of wolves into yellowstone park? 9. which category would most likely have information about the effects of increased use of nuclear power on the price of natural gas? 10. which category would most likely have information on the electoral college? 11. which category would most likely have information on the philosopher emmanuel kant? the questions were designed to represent a variety of subject areas that library patrons might pursue. each subject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. we did not attempt to test the subject lists in the context of any web design. we were more interested in observing the effect of the number of labels in a list on response time independent of any web design. each participant was asked the same eleven questions in the same order. the order of questions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. for each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. participants were encouraged to ask for clarification if they did not fully understand the question being asked. recording of response times did not begin until clarification of the question had been given. response times were recorded unbeknownst to the participant. if the participant was simply unable to make a selection, that was also recorded. two people administered the exercise. one recorded response times; the other asked the questions and recorded label selections. relevance rankings were calculated for each possible combination of labels within a subject list for each question. for example, if a subject list consisted of five labels, for each question there were five possible answers. two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance ranking to every possible combination of question and labels within a subject list. the rankings were then averaged for each question–label combination. n results the analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were significantly different across the different lists. in other words, would the response times of participants using a particular list, for whom the labels in the list were highly relevant 18 information technology and libraries | march 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? a separate univariate general linear model analysis was conducted for each of the eleven questions. the analyses were conducted separately because each question represented a unique search domain. the univariate general linear model provided a technique for testing whether the average response times associated with the different lists were significantly different from each other. this technique also allowed for the inclusion of a covariate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. in the analysis model, the dependent variable was response time, defined as the time needed to select a subject list label. the covariate was relevance, defined as the perceived match between a label and the question. for example, a label of “economics” would be assessed as highly relevant to the question, what is the current unemployment rate? the same label would be assessed as not relevant for the question, what are the names of four moons of saturn? the main factor in the model was the actual list being presented to the participant. there were twelve lists used in this study. the statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error the general linear model required that the following conditions be met: first, data must come from a random sample from a normal population. second, all variances with each of the groupings are the same (i.e., they have homoscedasticity). an examination of whether these assumptions were met revealed problems both with normality and with homoscedasticity. a common technique— logarithmic transformation—was employed to resolve these problems. accordingly, response-time data were all converted to common logarithms. an examination of assumptions with the transformed data showed that all questions but three met the required conditions. the three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 avg log performance trend figure 1. the overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n conclusions the series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). three of the eleven questions were excluded from the analysis because of heteroscedascity. an inspection of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. it seems that for a single question, characteristics of the categories themselves are more important than the quantity of categories in the list. the response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. a statistical comparison of the mean response time for each classification of library resources by subject on the library website | miles and bergstrom 19 group with that of each of the other groups for each of the questions largely confirms this. there were very few statistically significant different comparisons. the spikes and valleys of the graphs in the appendix are generally not significantly different. however, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. at that point, response time increases significantly. no association was found between response time and relevance. a fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n observations we observed that there were two basic patterns exhibited when participants made selections. the first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. nevertheless, a quick selection did not always mean a relevant selection. the second pattern was the delayed selection. if participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclassified to make one of the labels fit. we did not have access to a high-tech lab, so we were unable to track eye movement, but it appeared that the participants began scanning up and down the list of available items in an attempt to make a selection. the delayed selection seemed to be a combination of two problems: first, none of the available labels seemed to fit. second, the delay in scanning increased as the list grew larger. it’s possible that once the list becomes large enough, scanning begins to slow the selection process. a delayed selection did not necessarily yield an irrelevant selection. the label names themselves did not seem to be a significant factor affecting user performance. we did test three lists, each with nine items and each having different labels, and response times were similar for the three lists. a future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. this is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. creating eleven questions to span a broad range of subjects is also a possible weakness of the study. we had to throw out three questions that violated the assumptions of the statistical model. we tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. we also attempted to vary the difficulty of the questions. a different set of questions may yield different results. references 1. steve jones, the internet goes to college, ed. mary madden (washington, d.c.: pew internet and american life project, 2002): 3, www.pewinternet.org/pdfs/pip_college_report.pdf (accessed mar. 20, 2007). 2. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 361. 3. judy h. jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, new brunswick, new jersey): 38–42. 4. george a. miller, “the magical number seven plus or minus two: some limits on our capacity for processing information,” psychological review 63, no. 2 (1956): 81–97. 5. fernand gobet et al., “chunking mechanisms in human learning,” trends in cognitive sciences 5, no. 6 (2001): 236–43. 6. kevin larson and mary czerwinski, “web page design: implications of memory, structure and scent for information retrieval” (los angeles: acm/addison-wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed nov. 1, 2007). 7. ibid. 8. kathleen snowberry, mary parkinson, and norwood sisson, “computer display menus,” ergonomics 26, no 7 (1983): 705. 9. larson and czerwinski, “web page design,” 26. 10. panayiotis g. zaphiris, “depth vs. breath in the arrangement of web links,” www.soi.city.ac.uk/~zaphiri/papers/hfes .pdf (accessed nov. 1, 2007). 11. panayiotis g. zaphiris, ben shneiderman, and kent l. norman, “expandable indexes versus sequential menus for searching hierarchies on the world wide web,” http:// citeseer.ist.psu.edu/rd/0%2c443461%2c1%2c0.25%2cdow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zszzszagrino.orgzszpzaphiriz szpaperszszexpandableindexes.pdf/zaphiris99expandable.pdf (accessed nov. 1, 2007). 20 information technology and libraries | march 2009 appendix. response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 gr p a (5 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p d (2 3 ite m s) gr p e (6 it em s) gr p f (7 it em s) gr p g (1 2 ite m s) gr p h (9 it em s) gr p i (3 5 ite m s) gr p j (2 8 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) question 1 question 8 question 2 question 9 question 3 question 10 question 4 question 11 user testing with microinteractions: enhancing a next-generation repository communication user testing with microinteractions enhancing a next generation repository sara gonzales, matthew b. carson, guillaume viger, lisa o'keefe, norrina b. allen, joseph p. ferrie, and kristi holmes information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12341 sara gonzales (sara.gonzales2@northwestern.edu) is data librarian, galter health sciences library & learning center, northwestern university feinberg school of medicine. matthew b. carson (matthew.carson@northwestern.edu) is head, digital systems/senior research data scientist, galter health sciences library & learning center, northwestern university feinberg school of medicine. guillaume viger (guillaume.viger@northwestern.edu) is senior developer, galter health sciences library & learning center, northwestern university feinberg school of medicine. lisa o’keefe (lisa.okeefe@northwestern.edu) is senior program administrator, galter health sciences library & learning center, northwestern university feinberg school of medicine. norrina b. allen (norrina-allen@northwestern.edu) is associate professor of preventive medicine (epidemiology) and pediatrics, northwestern university feinberg school of medicine. joseph p. ferrie (ferrie@northwestern.edu) is professor and department chair of economics, northwestern university. kristi holmes (kristi.holmes@northwestern.edu) is director, galter health sciences library & learning center, and professor of preventive medicine (health and biomedical informatics) and medical education at northwestern university feinberg school of medicine. © 2021. abstract enabling and supporting discoverability of research outputs and datasets are key functions of university and academic health center institutional repositories. yet adoption rates among potential repository users are hampered by a number of factors, prominent among which are difficulties with basic usability. in their efforts to implement a local instance of inveniordm, a turnkey next generation repository, team members at northwestern university’s galter health sciences library & learning center supplemented agile development principles and methods and a user experience design-centered approach with observations of users’ microinteractions (interactions with each part of the software’s interface that requires human intervention). microinteractions were observed through user testing sessions conducted in fall 2019. the result has been a more user-informed development effort incorporating the experiences and viewpoints of a multidisciplinary team of researchers spanning multiple departments of a highly ranked research university. introduction galter health sciences library & learning center facilitates and supports the discoverability of knowledge for the faculty, students, and staff of the feinberg school of medicine at northwestern university. as an integrated unit in northwestern university’s clinical and translational sciences institute (nucats) and a key partner to other institutes across northwestern university’s two campuses, enabling maximum ease of use of library resources and support for meaningful information discoveries for researchers at all stages has been a prime motivator. these motivators helped drive the selection and development of an upgraded institutional repository infrastructure at galter, a project which began in 2018. discovery of resources through repository tools depends upon many factors: metadata and controlled vocabularies used, storage and retrieval capacity, and familiarity and comfort level with mailto:sara.gonzales2@northwestern.edu mailto:matthew.carson@northwestern.edu mailto:guillaume.viger@northwestern.edu mailto:lisa.okeefe@northwestern.edu mailto:norrina-allen@northwestern.edu mailto:ferrie@northwestern.edu mailto:kristi.holmes@northwestern.edu information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 2 the tool on the part of researchers and students. is the institutional repository link easy to find on the website? more importantly, is it easy to use? can records be created and files uploaded with ease? do searches bring meaningful results, and can they be filtered and organized for maximum impact? from early on in the repository upgrade project, galter library partnered with northwestern’s institute for innovations in developmental sciences (devsci), both to answer these questions and to find practical ways to serve researchers who aimed to discover relevant datasets through a repository. the work of the interdisciplinary devsci group is focused on human development across the lifespan in all areas, including physical, emotional, psychological, and socioeconomic, providing a multidisciplinary perspective for the collaboration. through this partnership, devsci’s goal was to develop a data repository or index through which they could discover the datasets of their fellow researchers and find new collaborators, thus providing an ideal perspective from which to provide critical feedback. galter health sciences library & learning center selected the inveniordm (research data management) extensible institutional repository (ir) platform as its local ir code upgrade. inveniordm is a python-based, modular and scalable ir developed by cern (the european organization for nuclear research) and collaborators.1 the first version of the invenio framework was developed in 2000. in 2018, invenio 3.02 was released with significantly improved software and code rewritten to make it a modular framework. this new framework now serves as a foundation for modern research data management and scholarly communications through a trusted digital repository. inveniordm is being collaboratively developed, with its many partners and robust developer community ensuring the framework’s maintenance, improvement, and preservation capacity into the foreseeable future. to galter library and its partners at devsci, the development timespan required to build a local instance of inveniordm presented the perfect opportunity to address one of the major stumbling blocks of ir adoption: the user experience. if a repository were designed with users’ needs in mind, and took into account their behaviors and interactions with every aspect of the tool, it had the potential to increase adoption and usability far beyond numbers generally observed for university irs. designing for user behaviors was the goal of galter library’s repository development team as we launched a round of user testing of the repository’s alph a version in fall 2019. literature review it is an exciting time for irs serving researchers in the sciences and particularly in the field of biomedical research. new robust repository frameworks capable of storing and preserving data for decades into the future are being developed to meet widely articulated researcher needs, including those user behaviors and technologies highlighted by the confederation of open access repositories (coar) in detailed guidelines for next generation repository features.3 these features include interoperable resource transfer, metadata-enhanced discovery by navigation, and exposure of permanent identifiers. researcher-focused organizations such as coar endorse and promulgate fair principles to make deposited and shared data findable, accessible, interoperable, and reusable.4 meanwhile, federal agencies such as the national institutes of health are increasingly incorporating policies to encourage best practices for data management and data sharing of grant funded research.5 these policies often recommend depositing data in a robust, secure, and accessible ir which can be maintained by the researchers’ own institution. in recent years, the majority of the deposited products of research have been stored in subject repositories information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 3 or made available via social media platforms which carry no guarantee of long-term curation and preservation of shared resources. this happens even though institutional repositories are wellrepresented on the overall repository landscape, suggesting that these institutional assets are underutilized in critical data workflows.6 the reasons for slow adoption and use of institutional repositories (ir) by researchers in routine workflows are many: irs can be perceived as adding to researchers’ administrative work burden through the need to clean, deposit, and catalog data and other research outputs; many feel trepidation about open science practices and their effects on citation counts; and researchers may feel unsure about copyright restrictions on materials they might deposit.7 narayan and luca, in their study of one university’s ir adoption challenges, outline some of the deep-seated motivations behind this trepidation, such as the social and psychological barriers imposed by researchers’ own, and university-encouraged, traditional views of scholarly publishing, as well as the ways in which these views are heavily supported by university systems in their tenure and promotion policies. in addition, many researchers perceive the content contained in irs as restricted or of limited use compared to the volume of resources that can be found through a google search.8 the ir is often seen as a small island within the larger digital research landscape. the degree to which repository managers are attuned to their local users’ professional and personal needs with regard to a repository will have a large impact on adoption rates among hesitant user populations. as witt and betz & hall point out, professional motivators have arisen in multiple disciplines to deposit in irs not only preprints, but datasets, data dictionaries, readme files, and other reproducibility-supporting resources, in order to provide open access to the products of federally funded research.9 building on funder and publisher mandates for making both publications and datasets open access, ir builders and maintainers can employ various methods to increase the motivation momentum towards ir adoption. they can highlight repository champions, faculty users of ir tools who can provide use cases and success stories about the benefits the ir brings to them, as both depositors and searchers.10 they can help to allay fears and confusion around depositors’ rights with regard to deposited materials by carefully explicating license types and definitions and by consulting with researchers on the correct licenses to choose for their deposits. they can work with their repository’s developers to create value-added modifications to the repository, including user-friendly browsing, featured collections, and researcher pages, which highlight the most current research at the institution.11 importantly, if the ir maintainers are able to modify the repository’s interface to suit local needs, they can help ensure that the majority of users have a positive experience with the repository’s interface, one in which every interaction is intuitive and in which there are no wasted steps or unnecessary clutter. such usability can be achieved through examining repository users’ microinteractions, that is, interactions with each small part of the software’s interface that requires human intervention. for those engaged in library technology projects in recent years, user experience (ux) design will be a familiar concept. ux design seeks to make a user’s interaction with a product—often a webbased tool—easier and more intuitive, frequently through manipulating the behavior of the user.12 however, in the recent trend toward designing with a focus on microinteractions, software developers are influenced by the data they glean from observing users’ interactions with each part of the software’s interface that requires human intervention, noting from the users themselves the intuitive and non-intuitive parts of an interaction in order to determine where changes should be made.13 the development team for inveniordm has taken an approach that combines traditional information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 4 user-experience design based around collaborative, open source code and tools and common dataset metadata standards such as datacite (https://schema.datacite.org); observations from invenio’s 20 years of serving as cern’s ir; and examination of microinteractions at certain key stages of development. these microinteractions revolve around common user actions within an ir, including depositing items, searching, browsing, and creating a user account.14 the result of combining these approaches has allowed galter health sciences library & learning center to put users’ needs at the forefront of its new ir. inveniordm: a next-generation repository galter library’s selection of a new ir solution was carefully considered and motivated by the organization’s need for a robust, forward-looking, and feature-rich repository that could support best practices in research data management and sharing as realized through the invenio framework. the python framework incorporates community-built python libraries, while also leveraging flask, a postgresql or mysql back-end database, the react js user interface, and the extremely fast elasticsearch json-native distributed search engine. the resulting tool is eminently scalable, securely housing petabytes’ worth of easily discoverable records. galter library began our collaboration with cern to build a local instance of inveniordm, while contributing to the overall repository source code, in late 2018. since that time, a local developer has worked on the code and contributed to the repository’s project roadmap, updating github issues and pushing releases.15 a project manager, data librarian, and the library leadership have also been involved throughout the project in the areas of general guidance and management, oversight, assessment, dissemination and outreach, and requirements gathering. many requirements were gleaned from the devsci community through conversations and informal interviews around data storage practices. in early 2019, to ensure that the repository was meeting the initially envisioned requirements, the galter library repository team analyzed the requirements thus far gathered for the project, which had been translated into github issues and added to by team members and collaborators. the requirements gathered from devsci collaborators and galter librarians were found to map directly into key ir functional categories outlined separately by ir stakeholders throughout the globe such as the national institutes of health, the confederation of open access repositories, the digital repository of ireland, the department of computer and information sciences at covenant university, nigeria, and others. those requirements included record creation and ingest, robust metadata for accessing a record, user account and permissions, user authentication, search functionality, resource access/download, and community pages and features (see fig. 1).16 https://schema.datacite.org/ information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 5 figure 1. local repository requirements mapped to ir functional categories. with the repository’s key functional requirements defined, the development team needed realworld data to help inform the microinteractions that would bring the functions to life, both for repository managers and users. to acquire microinteraction data, galter library’s data librarian designed and organized a round of user testing of the alpha release of the repository in autumn 2019. the alpha release of inveniordm was completed by september 2019, meeting a deadline established by one of the project’s key funders, the national center for data to health (cd2h), information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 6 through a grant funded by the national center for advancing translational sciences (ncats). this early alpha release enabled record creation and file upload, application of seven metadata elements (title, authors, description, resource type, subjects, visibility, and license), user authentication, search, faceting/filtering, and download of resources. to make the experience of searching the alpha release repository as realistic as possible, the data librarian asked colleagues from devsci to provide data for seed records for the repository, based on their own research. colleagues willingly obliged, and over a dozen seed records based on realworld clinical studies and other studies focused on human development were created in early october 2019. while conducting email and word-of-mouth recruitment with members of the devsci community, the data librarian worked on a testing script designed to require the maximum amount of microinteractions possible as each user worked with the repository. the script asked users to complete the following list of tasks (see fig. 2), while thinking aloud and noting anything that they found unusual or anything that they would have expected to see in the user interface of the repository. figure 2. ir user testing script tasks. the user testing tasks conform to many of the functional requirements for institutional repositories identified from our requirements gathering (fig. 1), including user authentication and account, search functionality, resource access/download, record creation/ingest, and robust metadata for accessing a record (detailed record page). by october 2019, ten northwestern university faculty members, mainly from devsci, and two information professionals had agreed to test the alpha version of inveniordm. the data librarian arranged to securely host testing sessions through web conferencing software. testers agreed to have the sessions recorded and shared their screens as they worked through the test scenarios, allowing the data librarian to observe their movements through the repository and to review the recordings later in case anything was missed. testing sessions generally lasted between twenty and thirty minutes, although some lasted from forty-five minutes to one hour. after sessions were completed, the data librarian recorded in text documents a description of all microinteractions and verbal observations testers made about the repository. she later transferred that data to a spreadsheet, listing each criterion individually and manually adding a count of how many testers either reported or were observed to experience the same phenomenon information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 7 (appendix 1). reported phenomena and observed difficulties that users experienced in testing the repository were aggregated and included in the final reported data if at least two testers reported or had the same experience. through these counts the data librarian was able to identify which microinteractions proved most challenging to the testers. discussion manual, qualitative analysis of the user testing data revealed challenges that users faced with inveniordm that were best captured and expressed when observed as microinteractions. though user experience design had been employed extensively in the design of the database, it was the nuances of the interactions that showed where improvements in the design could still be made. almost every functional area of the repository demonstrated a need for increased user input in its design. the results of user exercises testing the various functional areas are described below. user profile screens ease-of-use exercise while most testers did not experience difficulties in locating the login button on the repository’s home page that allowed them to access the user profile portions of the site (9/12 located the button in less than three seconds), most testers (7/12) requested clearer instructions for which username and password to use (e.g., ldap or shibboleth-based credentials), and three-quarters of testers (9/12) inquired about where and how to add information about themselves to their profiles (e.g., professional title, contact information, department and other affiliations, etc.). while the required task consisted simply of logging successfully into one’s profile, the testers rightly discovered and acknowledged that the robust, cris (current research information system)-like features of many repositories’ user profile pages had not yet been fully implemented in inveniordm. finding datasets exercise the next task required testers to perform a search any way they liked in the repository, locate a dataset record, and download the associated data file. whether searching using filters or by entering keywords to find a known item, users were always able to easily identify a data file within a record and download it. a special feature of inveniordm that occasionally made finding a data file to download challenging was that the repository was designed to serve as both a repository for digital files and a data index. the main feature of a data index is that it will store records representing datasets without necessarily storing the datasets themselves. the data index option is crucial for health sciences researchers, who are often motivated to share data files as openly as possible for motivations of reproducibility, open access to scientific data, and compliance with funder mandates, but who cannot always safely deposit data files due to the presence of personally identifiable information (pii) or protected health information as defined by hipaa17 of the human subjects who are involved in their studies. this phenomenon spurred us to create seed records in the repository that represent real clinical studies, for which the data could be made available upon request from the researchers, but for which a data file was not uploaded to the repository. by following the protocol of making this clinical data public as safely as possible, these records were created and tagged with a visibility level of open access. two of the testers stated that they believed that a record tagged as open access implied that a data file was available for download, and many others (10/12) expected either a visual cue or another filter to allow users to hone in on only the dataset record results that contained a deposited data file. information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 8 filter searching exercise searching the repository, particularly via the filters, resulted in some of the most interesting results of the entire testing process. in the testing exercises, users were asked to search for anything they wanted, either narrowing their results from a direct term search or beginning a search from the full record set using the filter options on the left side of the screen. opinions differed among the repository team members and some testers as to whether applying two filters at once to the results of a search would result in an and or or union of the two subsets of results. another way to phrase this scenario would be, that upon the application of one filter, would the search results and other filters update in real time? for instance, if i filtered my search results to include only the deposits of one particular author, and if the file type filter choices still contained, after the filtering action, types including pdfs, xlsx files, doc files, mov files, etc., is it safe to assume that my chosen author deposited all those types of files? (see fig. 3.) or do the filters behave independently of each other? one third of the testers (4/12) said they expected the filtering choices to update in real time and that the application of two filters should result in an and union of results. figure 3. filters available in inveniordm. seven of the twelve testers said that the most helpful of the of the five filters available in the repository at the time were resource type, file type, and subjects, while slightly less found author and license filters helpful. three of the twelve asserted that it would not occur to them to filter on a resource’s license. information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 9 specific record search exercise in the specific record search, users were asked to find a specific paper by a particular author. none of the testers experienced any difficulties or significant delays in bringing up the requested record, which served as a testament both to their searching abilities and to the robustness of the elasticsearch framework utilized in the repository’s infrastructure. ten of the twelve testers mentioned that they found the preformatted citations available with each record helpful, and two of these testers requested an easy way to export the citation in their desired format to endnote. three of the twelve testers experienced a brief initial delay in locating the known record because they still had filters applied in the repository from a previous search. they requested an easy way to clear all previous filters when starting a new search. creating a record exercise as one of the larger user testing tasks, users were provided with a dummy file representing a dataset and asked to deposit and describe it with appropriate metadata using the repository’s cataloging form. the first part of this task involved finding the button that brings the user to the cataloging form, a button which is available from several places in the inveniordm layout (home page, search results page, and profile page). eight of the twelve testers took longer than three seconds to locate this button, termed the “catalog your research” button. two of these eight reported that “catalog” was not the verb they would associate with depositing and describing a data file; to some, “catalog” seemed a library-centric term. on the repository’s record creation page, a space exists to either upload or drag and drop a file, and when this task is done, a large, blue “start upload” button appears that the user must click to begin the file upload. (see fig. 4.) yet despite its size and color, almost half the testers (5/12) did not notice that they had to click it in order to complete the upload of their file and, worse, they often completed the record creation process and published their record without noticing that the file had not uploaded. visual cues were needed to confirm for the user whether a file was successfully uploaded or not. in addition, automatic upload upon browsing and attaching a file or dragging and dropping a file was reported as an expected behavior by many users. information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 10 figure 4. users often missed the blue “start upload” button just beneath the file name. information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 11 most users applied descriptive metadata successfully and easily, but some experienced trouble while appending subject metadata to describe the subject matter of their deposits. as the repository is being customized for a health sciences library, subject fields are offered in inveniordm to allow appending both medical subject heading (mesh) terms and terms from faceted application of subject terminology (fast), a vocabulary derived from the library of congress subject headings (lcsh). since the two vocabularies’ terms are offered in separate fields (fast serving as a more universal set of terms to complement the biomedically oriented mesh), users became confused, not knowing which, if either, subject field they should complete. a solution involving a single subject field that queries both the mesh and fast apis is warranted in order to simplify the subject-tagging experience for the user. editing a record exercise editing the record they had just created, the final testing task, proved to be unproblematic for testers. eleven of the twelve found the edit button on their record pages in less than three seconds, and the editing process was reported to be straightforward. one tester observed that they were unable to change their file (i.e., make a version-level change to their record), but this was only because version change functionality had not yet been implemented in the repository. reflection/unguided feedback once all the testing exercises outlined above were completed, testers were asked to talk about the uses for which they might employ a tool like inveniordm. without prompting or suggestions of uses, the testers overwhelmingly stated that they would use the completed repository for the very functions for which most irs are built: storing data files (two users), sharing data for open science or to fulfill funders’ mandates (two users), searching for others’ datasets (four users), creating gray literature collections to showcase their conference presentations and posters (three users), embedding repository-issued dois from their datasets into manuscripts and posters (three users), and storing data in private collections to share with trusted collaborators (four users). this data shows that university faculty and researchers have various specific needs for repository solutions, which can be met if the repositories are designed with these needs in mind. after testing: next steps the user testing experience for inveniordm proved to be a highly enjoyable process for all involved. the tester participants expressed enthusiasm in the process and appreciated the opportunity to share their ideas about the functionality of an ir while it was still in the design stages. the testers’ enthusiasm reinforced the notion that many university faculty members are eager for an intuitive, user-friendly tool that will allow them to store, retrieve, and share their research outputs, as long as the tool is designed with their needs in mind. observation of testers’ microinteractions with galter library’s new institutional repository has helped the local development team to better understand what those needs are. the results of the user testing were presented at the inveniordm product meeting at cern in january 2020. the results were well received and resulted in immediate adjustments to the repository’s development. as development continues into 2021, the repository team at galter health sciences library & learning center will design and manage at least one additional user testing round to ensure that the repository continues to meet its goals of serving key functional requirements of irs while also providing users the best possible experience through each interaction they have with the tool. as user testing sessions demonstrate, there is much room for growth in the achievement of a truly intuitive interface design in even some of the seemingly information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 12 simplest functions of the repository, such as intuitively placing a deposit button or honing in on the right combination and placement of filters. the galter library development team is committed to continuing to work toward an intuitive and seamless user experience. on this journey the repository team acknowledges and thanks the testers and future users of its repository and the researchers and support staff for whom the tool is being built and without whom it could not be built half as well. acknowledgements the project team would like to acknowledge northwestern university’s institute for innovations in developmental sciences and the northwestern university clinical and translational sciences institute (nucats). inveniordm project team members sara gonzales, guillaume viger, matthew b. carson, lisa o’keefe, and kristi holmes were partially funded by the ctsa program national center for data to health, grant u24tr002306 and nucats, grant ul1tr001422. appendix 1. inveniordm user testing aggregate data, divided by task reporting criteria a phenomenon or observation was noted if it was reported by, or observed in the behavior of, two or more testers. for four tasks seconds were counted as a mark of how easy it was to find a repository element that enabled the task: 1. finding the login button to access user account 2. finding the citation button after the search for a specific record 3. finding the "catalog your research" button 4. finding the "edit record" button. counting of seconds was done with an iphone stopwatch while reviewing the recorded sessions. if finding the required button took the user longer than a generous count of three seconds, it was deemed that the user had a hard time locating the item. user profile screens results 9/12 testers wanted to add information about themselves and their appointment (dep artment, title, contact information, etc.) 7/12 wanted clearer instructions for the username and password they use to log in 3/12 testers took three seconds or more to locate the user login button on the home page finding datasets exercise results 10/12 testers expected a (sortable, filterable) cue on the search results screen to show whether record has a file to download 3/12 testers wanted grayed out instructions or search tips/suggestions in the search box 2/12 testers believed that the open access pill in search results implied there would be a file to download information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 13 2/12 testers believed that the subject pills in full record view should be clickable to enable direct search on the subjects filter searching exercise results 7/12 testers said the most helpful filters were resource type, file type, and subjects, followed by author, then license 4/12 testers expected filter choices to update in real time based on initial filter chosen 3/12 testers expected option to expand beyond the top 10 authors in the authors filter 3/12 testers were not familiar with the choices of mesh and fast terms 3/12 testers would not think of filtering on license 2/12 testers expected guidance on the licenses' meanings if browsing/filtering by license is offered 2/12 testers expected greater filter collapsing/expanding options than what was offered 2/12 testers expected to apply two filters at once 2/12 testers wanted to filter on sample or demographic information of study subjects specific record search exercise results 10/12 testers found the preformatted citations helpful 7/12 testers found the citation button in less than three seconds 3/12 testers had trouble with their known-record searching because filters were on when they started; needed an easy way to clear all filters 2/12 testers expected an option to download the found record’s citation to endnote creating a record exercise results 8/12 testers took longer than three seconds to find the “catalog your research” button; of those two would have used a different phrase [“’catalog’ is too library-centric”] 5/12 testers did not see “start upload” button after dropping their files, and an additional two said they expected auto-upload immediately upon dropping their files, with no “start upload” button necessary 3 of the 5 testers who missed the “start upload” button did not notice that their file did not get saved to their record 5/12 testers did not notice at first that the “save draft” step was needed before clicking publish, and one additional tester said they expected record auto-save, which would help in filling out a longer record 5/12 testers wanted guidance on which license to choose 4/12 testers expected some kind of instructions for filling out the cataloging page, even if only for specific fields like description or title information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 14 3/12 testers found the resource type interface intuitive 3/12 testers thought the arrow in the mesh (subject) field implied the availability of a drop-down list of options 2/12 testers did not see the drop-down choices under resource type umbrella categories at first 2/12 testers expected terms entered in the mesh (subject) fields to stay there, or a warning that they will disappear if there is no match 2/12 testers wanted more guidance on choosing visibility level (private, public, etc.) 2/12 testers wanted more definitions/assistance about difference between medical and topical subject terms 2/12 testers wanted more definitions/assistance with fast terms 2/12 testers said they would prefer a default license option editing a record exercise results 11/12 testers found the edit button in less than three seconds reflection/unguided feedback results 4/12 testers would use the repository to search for data 4/12 testers would store data files in private collections to be shared only with trusted collaborators 3/12 would embed the repository-issued dois from their datasets into their manuscripts and papers 3/12 testers would create their own grey literature collections of conference abstracts and posters 2/12 testers would use the repository for storing data files 2/12 testers would use the repository for open access/open science/data sharing complian ce motivations information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 15 references 1 “inveniordm: the turn-key research data management repository,” cern (european organization for nuclear research), accessed march 11, 2020, https://inveniosoftware.org/products/rdm/. 2 lars holm nielsen, “invenio v3.0.0 released,” invenio blog (blog), invenio, june 7, 2018, https://invenio-software.org/blog/invenio-v300-released/. 3 “next generation repositories: behaviours and technical recommendations of the coar next generation repositories working group,” confederation of open access repositories (coar), november 28, 2017, https://www.coar-repositories.org/files/ngr-final-formatted-reportcc.pdf. 4 “the fair data principles,” force11, accessed march 11, 2020, https://www.force11.org/group/fairgroup/fairprinciples. 5 national institutes of health, “final nih policy for data management and sharing,” nih office of extramural research, accessed january 15, 2021, https://grants.nih.gov/grants/guide/noticefiles/not-od-21-013.html. 6 gary e. gorman, jennifer rowley, and stephen pinfield, “making open access work: the ‘stateof-the-art’ in providing open access to scholarly literature,” online information review 39, no. 5 (september 2015): 604–36. 7 bhuva narayan and edward luca, “issues and challenges in researchers’ adoption of open access and institutional repositories: a contextual study of a university repository,” in proceedings of rails – research applications, information and library studies, 2016, school of information management, victoria university of wellington, new zealand, 6–8 december (2016); information research: an international electronic journal, 22, no. 4 (december 2017), http://hdl.handle.net/10453/121438. 8 beth st. jean, soo young rieh, elizabeth yakel, and karen markey, “unheard voices: institutional repository end-users,” college & research libraries 72, no. 1 (january 2011): 21–42. 9 michael witt et al., “connecting researchers to data repositories in the earth, space, and environmental sciences,” in digital libraries: supporting open science, ircdl 2019, ed. leonardo candela and gianmaria silvello (2019); communications in computer and information science 988, 86–96; sonya betz and robyn hall, “self-archiving with ease in an institutional repository: microinteractions and the user experience,” information technology and libraries 34, no. 3 (september 2015): 43–58. 10 betz and hall, “self-archiving with ease,” 43–58. 11 st. jean, rieh, yakel, and markey, “unheard voices,” 21–42. 12 “user experience design,” wikipedia, last modified january 12, 2021, https://en.wikipedia.org/wiki/user_experience_design. 13 betz and hall, “self-archiving with ease,” 43–58. https://invenio-software.org/products/rdm/ https://invenio-software.org/products/rdm/ https://invenio-software.org/blog/invenio-v300-released/ https://www.coar-repositories.org/files/ngr-final-formatted-report-cc.pdf https://www.coar-repositories.org/files/ngr-final-formatted-report-cc.pdf https://www.force11.org/group/fairgroup/fairprinciples https://grants.nih.gov/grants/guide/notice-files/not-od-21-013.html https://grants.nih.gov/grants/guide/notice-files/not-od-21-013.html http://hdl.handle.net/10453/121438 https://en.wikipedia.org/wiki/user_experience_design information technology and libraries march 2021 user testing with microinteractions | gonzales, carson, viger, o’keefe, allen, ferrie, and holmes 16 14 a. o. adewumi, n. a. omoregbe, and sanjay misra, “usability evaluation of mobile access to institutional repository,” international journal of pharmacy and technology 8, no. 4 (december 2016): 22892–905. 15 “inveniordm project roadmap,” cern (european organization for nuclear research), accessed march 17, 2020, https://invenio-software.org/products/rdm/roadmap/. 16 national institutes of health, office of the director, “supplemental information to the nih policy for data management and sharing: selecting a repository for data resulting from nihsupported research,” last modified october 29, 2020, https://grants.nih.gov/grants/guide/notice-files/not-od-21-016.html; “coar community framework for good practices in repositories, public version 1,” confederation of open access repositories (coar), last modified october 8, 2020, https://www.coarrepositories.org/coar-community-framework-for-good-practices-in-repositories/; sharon webb and charlene mcgoohan, the digital repository of ireland: requirements specification (national university of ireland maynooth, 2015), https://doi.org/10.3318/dri.2015.6; adewumi, omoregbe, and misra, “usability evaluation of mobile access to institutional repository,” 22892–905; suntae kim, “functional requirements for research data repositories,” international journal of knowledge content development & technology 8, no. 1 (march 2018): 25–36. 17 u.s. department of health and human services, “summary of the hipaa security rule,” last modified july 26, 2013, https://www.hhs.gov/hipaa/for-professionals/security/lawsregulations/index.html. https://grants.nih.gov/grants/guide/notice-files/not-od-21-016.html https://www.coar-repositories.org/coar-community-framework-for-good-practices-in-repositories/ https://www.coar-repositories.org/coar-community-framework-for-good-practices-in-repositories/ https://doi.org/10.3318/dri.2015.6 https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html abstract introduction literature review inveniordm: a next-generation repository discussion user profile screens ease-of-use exercise finding datasets exercise filter searching exercise specific record search exercise creating a record exercise editing a record exercise reflection/unguided feedback after testing: next steps acknowledgements appendix 1. inveniordm user testing aggregate data, divided by task reporting criteria user profile screens results finding datasets exercise results filter searching exercise results specific record search exercise results creating a record exercise results editing a record exercise results reflection/unguided feedback results references aliprand rural public libraries and digital inclusion: issues and challenges brian real, john carlo bertot, and paul t. jaeger information technology and libraries | march 2014 6 abstract rural public libraries have been relatively understudied when compared to public libraries as a whole. data are available to show that rural libraries lag behind their urban and suburban counterparts in technology service offerings, but the full meaning and effect of such disparities is unclear. the authors combine data from the public library technology and access study with data from smaller studies to provide greater insight to these issues. by filtering these data through the digital inclusion framework, it becomes clear that disparities between rural and nonrural libraries are not merely a problem of weaker technological infrastructure. instead, rural libraries cannot reach their full customer service potential because of lower staffing (but not lower staff dedication) and funding mechanisms that rely primarily on local monies. the authors suggest possible solutions to these disparities while also discussing the barriers that must be overcome before such solutions can be implemented. introduction the large number of rural public libraries in the united states is surprisingly understudied, particularly in terms of technology access. the american library association (ala) and other professional organizations consider a public library to be small or rural if its population of legal service area is 25,000 or less. when viewed through this lens, rural public libraries1 • have on average less than one (.75) librarian with a master’s degree from an alaaccredited institution; • have an average of 1.9 librarians, defined as an employee holding the title of librarian; • have an average total of 4.0 staff, including both fulland part-time employees; • have a median annual income (from all sources) of $118,704.50; • have an average of 41,425 visits annually; and • typically have one building or branch that is open an average of 40 hours/week. brian real (breal@umd.edu) is a phd candidate in the college of information studies, john carlo bertot (jbertot@umd.edu) is co-director of the information policy and access center and professor in the college of information studies, and paul t. jaeger (pjaeger@umd.edu) is codirector of the information policy and access center and associate professor and diversity officer of the college of information studies, university of maryland, college park, maryland. mailto:breal@umd.edu mailto:jbertot@umd.edu mailto:pjaeger@umd.edu rural public libraries and digital inclusion | real, bertot, and jaeger 7 while these data suggest rural libraries operate on a smaller and less financially robust scale than their suburban and urban counterparts, the full effect of these discrepancies on service levels is unclear. this article uses various information sources to analyze the effect of these discrepancies on the ability of rural libraries to offer technology-based services. since the advent of the internet in the mid-1990s, public libraries have been key internet-access and technology-training providers for their communities. the ability to offer internet access alongside support and training for patrons using such technology are primary indicators of libraries’ value to their communities. by analyzing data from the 2012 public library funding and technology access survey (plftas), the authors found that rural libraries, on average, have weaker technological infrastructure (such as fewer average numbers of computers and slower broadband connections) and are able to offer fewer support services, such as training classes, than urban and suburban public libraries. with public libraries being many patrons’ only source of broadband access in many rural communities, limitations for rural libraries may affect patrons’ ability to fully participate in employment, education, government, and other central aspects of society. through analysis of the plftas data2 about technology access in rural public libraries in conjunction with other studies of rural libraries and librarians, this article explores the causes and effects of the relatively more limited technological and support infrastructures for rural patrons and communities. method as documented since 1994,3 public libraries were early adopters of internet-based technologies. the purpose of the pltas survey, and its previous iterations, is to identify public library internet connectivity; propose and promote public library internet policies at the federal level; maintain selected longitudinal data as to the connectivity, services, and deployment of the internet in public libraries; and provide national estimates regarding public library internet connectivity. through changes in funding sources and frequency of administration over the past two decades, the survey has maintained core longitudinal questions (e.g., numbers of public access workstations, bandwidth), but consistently explored a range of emerging topics (e.g., jobs assistance, e-government, emergency roles). the survey’s method has evolved over time to meet changing survey data goals. the 2012 survey provides both national and state estimates of public library internet connectivity, public access technologies, and internet-enabled services and resources. the survey used a stratified “proportionate to size sample” to ensure a proportionate national sample using the fy2009 imls public library dataset (formerly maintained by the us national center for education statistics) to draw its sample. strata included states in which libraries resided and metropolitan status (urban, suburban, rural) designations. bookmobile and books by mail service outlets were removed from the file, leaving 16,776 library outlets. information technology and libraries | march 2014 8 the study team drew a sample with replacement of 8,790 outlets stratified and proportionate by state and metropolitan status state.4 the survey received 7,252 responses for a response rate of 82.5%. using weighted analysis to generate national and state data estimates, the analysis uses the responses to estimate to all public library outlets (minus bookmobiles and books by mail) in the aggregate as well as by metropolitan status designations. unless otherwise noted, all data discussed in the article are from the 2012 study. that study, along with all previous public libraries and the internet and public library funding and technology access studies, additional analysis, and data products are available at http://www.plinternetsurvey.org. digital inclusion and the value of public libraries digital inclusion is a useful framework through which one can understand the importance of ensuring individuals have access to digital technologies as well as the means to learn how to use them.5 digital inclusion comprises policies and actions that mitigate the significant, interrelated problems of the digital divide and digital literacy: • digital divide implies the gap—whether based in socioeconomic status, education, geography, age, ability, language, or other factors—between individuals for whom internet access is readily available and those for whom it is not. indeed, even those with basic, dialup internet access are losing ground as internet and computer technologies continue to advance, using increasing bandwidth and demanding high-speed (“broadband”) internet access. • digital literacy encompasses the skills and abilities necessary for access once the technology is available, including understanding the language and component hardware and software required to successfully navigate the technology. • digital inclusion is policies developed to close the digital divide and promote digital literacy. it marries high-speed internet access (as dial-up access is no longer sufficient) and digital literacy in ways that reach various audiences, many of whom parallel those mentioned within the digital divide debate. to match the current policy language, digital inclusion will signify outreach to unserved and underserved populations. since virtually every public library in the united states offers public internet access, these institutions are invaluable in promoting digital inclusion. however, the plftas data shows that not all libraries are equal, with rural public libraries lagging behind libraries in more populated areas in providing technology services. therefore this article focuses on the following issues and questions: • digital divide: why do rural individuals have less access to broadband technologies than their suburban and urban counterparts? how are rural libraries currently compensating for this deficit? rural public libraries and digital inclusion | real, bertot, and jaeger 9 • digital literacy: why do rural libraries offer less digital literacy training and patron support? how do rural libraries compare to libraries in more populated areas on key issues in digital literacy, such as employment and government information? • digital inclusion: what policies have been developed to help rural libraries close the digital divide and promote digital literacy, and what policies—including funding structures and decisions—hinder these libraries from adequately addressing these concerns? what governmental and extra-governmental policies can be enacted to help rural libraries to better promote digital inclusion? the following section describes the differences between rural libraries and their urban and suburban counterparts, combining plftas data with information from other studies to demonstrate how rural libraries are more essential in bridging the digital divide yet are seemingly doing less to promote digital literacy. following this, the authors discuss why rural libraries trail suburban and urban libraries in these areas, with studies suggesting the issue is a result of inadequate resources, not a lack of staff dedication. finally, the authors present a review of some of the initiatives that are attempting to bridge these divides, including suggestions that may help rural librarians to act as better advocates for their patrons’ needs. rural challenges to digital inclusion numerous studies, including plftas, show that rural libraries offer less technology access with slower connection speeds than libraries in more populated areas. these libraries also offer comparatively less formalized digital literacy training, although rural libraries still provide invaluable informal training in this area. this section highlights discrepancies between rural libraries and those in more populated areas. technology and service disparities between rural and nonrural libraries while almost every public library offers patrons internet access, 70.3% of rural libraries are the only free internet and computer terminal access providers in their service communities, compared to 40.6% of urban and 60.0% of suburban libraries.6 the disparity between these categories becomes more striking when one considers the difference between home broadband adoption in rural and nonrural areas. according to the pew research center’s home broadband 2010 survey, only 50% of rural homes have broadband internet access, compared to 70% of nonrural homes.7 this disparity is due in large part to the greater difficulty and cost of creating the infrastructure to support broadband internet access in more sparsely populated areas.8 with broadband access provided primarily by for-profit companies, little profit motive exists to expand services to areas where the infrastructure cost would not allow for a quick and efficient recouping of costs. the us government has attempted to address this problem in a numerous ways, including dedicating $7.2 billion to improving broadband access throughout the country through grants (broadband information technology and libraries | march 2014 10 technology opportunity programs; btop) and loans (broadband infrastructure projects; bip) as part of the american recovery and reinvestment act (arra) of 2009.9 expanding this infrastructure will take time, and at this time it is unknown as to the extent to which broadband access in rural communities, both in general and for public libraries, will increase. as the arra projects near completion, it will be important to conduct follow-up analysis of the effect in terms of access to broadband in the home and in anchor institutions such as public libraries, as well as the extent to which broadband subscriptions increased. at present, however, public libraries—and rural public libraries in particular—are still the primary source of broadband access for many americans, and this will likely remain true for large portions of the population for the foreseeable future. individuals in need of internet access have few options in many communities. though there are increasing free wireless (wi-fi) internet access sources in communities (e.g., coffee shops, food outlets), one needs to have a device (e.g., tablet, laptop) to use these options. in two-thirds of american communities, the public library is the only source of freely available public internet access inclusive of public access computers.10 specific government efforts to increase internet access, broadband networks, and digital literacy of the population, however, fail to involve public libraries in a meaningful way, if at all.11 to be fair, public libraries were eligible to compete for the grants or submit loan applications for the arra broadband funding initiatives, and public libraries in states such as alaska, arizona, colorado, idaho, maine, montana, nebraska, and others have benefited from this, primarily through inclusion in applications with multiple beneficiaries.12 since btop works as a grantmaking process, relatively few us public libraries (approximately 20%) have benefited from btop funding, but the results have been encouraging. for example, 85 libraries in mostly rural nebraska have upgraded their broadband capacity using btop funds, with broadband capacity for these locations increasing from an average of 2.9 mbps to an average of 18.2 mbps. other states have tried innovative ideas, such as the idaho department of labor’s youth corps program to train high school and college students to work as digital literacy coaches, and then deploy them to libraries around the state. indeed, the btop program has certainly created some encouraging results, but this is not a permanently funded program and it targets a limited number of libraries, so it cannot be considered as a primary, widespread solution to the digital inclusion gap between rural and more populated areas. the authors of a recent btop report note, “unless strategic investments in u.s. public libraries are broadened and secured, libraries will not be able to provide the innovative and critical services their communities need and demand.”13 thus btop may provide a good model to addressing gaps in digital inclusion, but it was never designed to be a permanent solution. this role of ensuring digital inclusion in communities has accelerated at a time of unprecedented austerity nationally and at the state and local levels of government in particular. based on bureau of labor statistics (http://www.bls.gov) data, the united states lost 584,000 public-sector jobs between june 2009 and april 2012, or 2.5% of the local, state, and federal government jobs that rural public libraries and digital inclusion | real, bertot, and jaeger 11 existed before the prolonged economic downturn began. according to the center on budget and policy priorities, state budget shortfalls have ranged from $107 billion to $191 billion between 2009 and 2012, and current projections place state budget shortfalls at $55 billion for 2013.14 the prolonged economic downturn, in part, has driven up library usage in some communities.15 even before the downturn began, public libraries in the rural areas typically had the oldest computing equipment, the slowest internet access speeds, and the lowest support levels from the federal government.16 as a part of becoming the main source of digital literacy training and digital inclusion, public libraries have also become a primary training provider for in-demand, technology-based job skills.17 the resulting situation forces public libraries to balance reduced support, increased demand, and a growing centrality in helping their communities recover from the economic downturn. at the center of both increased demand and increased support of digital literacy and inclusion lies sufficient internet access. in a survey of rural librarians in tennessee, respondents reported that their patrons’ most critical information need was broadband internet access.18 the respondents also ranked access to recent hardware technology and software, technology training, and help with specific tasks like applying for jobs or government benefits as highly critical. by comparison, the respondents ranked traditional services such as book loaning as the least critical duty, significantly trailing the abovementioned and other technology services. despite rural librarians viewing technology-based services as their most important function, however, rural libraries lack the resources to meet the same service quality as nonrural libraries. the ensuing section discusses the nature of those disparities. technology infrastructure and technology training virtually all public libraries offer their patrons access to the internet. there is no statistical difference between rural, suburban, and urban libraries in this regard.19 likewise, rural libraries only lag slightly in wireless internet availability, which is becoming increasingly important with the ubiquity of mobile technology devices; 86.3% of rural libraries have wireless access available for patrons, compared to an average of 90.5% across all three categories.20 and, in one of the few technological areas where rural libraries lead their nonrural counterparts, 42.3% of rural libraries reported they had sufficient public access computer terminals at all times, compared to 33.5% of suburban and 12.9% of urban libraries. while the number of rural library computer terminals may be adequate in many locations, hardware quality suffers; 69.5% of rural libraries replace their public access computer terminals as needed while, 66.4% of urban libraries have a technology replacement schedule.21 for many small libraries with only a single full-time librarian, that employee also serves as the it specialist for the location.22 therefore many rural libraries have less up-to-date technologies and less technical support than their nonrural counterparts. even if the librarians who also provide it support for their locations are qualified to fulfill this role, the greater issue is limited time for information technology and libraries | march 2014 12 librarians to work on these issues in addition to other duties. in addition to less recent hardware, rural libraries also have limited bandwidth; 31.1% of rural libraries operate on bandwidths of 1.5 mbps (t1) or less, compared to only 18.3% of suburban libraries and a mere 9.7% of urban libraries.23 the greatest issues facing rural libraries are not well represented by the broader categories of internet access but instead in the implementation of services to make these technologies highly useful and effective for patrons. only 31.8% of rural libraries offer formal technology training classes, as compared to 63.2% of urban and 54.0% of suburban libraries.24 this comparison alone does not present a problem, since more populated areas have larger customer bases that justify training patrons in groups rather than in one-on-one sessions. however, rural libraries also trail significantly in offering one-on-one technology training, with only 30.1% of rural libraries providing such programs, compared to 43.4% of urban and 37.9% of suburban libraries. only 21.9% of rural libraries have online training materials, compared to 36.3% and 33.7% of urban and suburban libraries, respectively. in fact, 12.5% of rural libraries do not offer planned technology training at all, compared to a mere 5.1% of urban libraries and 8.0% of suburban libraries. therefore, while most patrons in nonrural areas who have limited technology skills can go to their local library and acquire such skills for free, such access to the resources for personal advancement is drastically limited by comparison in rural areas. since many rural residents do not have internet access in their homes, many of these individuals do not own computers and have limited technology skills resulting from limited technology exposure. this makes the technology training disparity between rural and nonrural libraries quite problematic, since most americans need these skills to maintain a high standard of living and employment. employment assistance while public libraries in all areas saw adequate staffing as a statistically similar problem for helping patrons find jobs—51.9% or rural librarians agreed this was a challenge, only slightly exceeding the overall average of 49.8%—the greater issue is the disparity of confidence levels in assisting patrons in employment matters.25 nearly half (48.3%) of rural survey respondents agreed a lack of staff expertise was a challenge to helping patrons find and apply for jobs online, compared to 27.9% of urban and 37.7% of suburban libraries. the internet has become essential for many people who wish to gain employment, thus rural public librarians’ inability to support rural residents with limited technology skills is problematic. many government agencies, hospitals, and private employers—including walmart, the largest employer in the united states—will no longer accept paper applications, but instead insist potential employees submit applications via the internet.26 this can be especially challenging for individuals who have recently lost jobs they have held for decades, as they simultaneously need to refresh basic application and interviewing skills while learning how to use unfamiliar information technologies to find and apply for jobs. librarians can offer critical assistance in these cases, especially for individuals who do not own a computer or have internet access in their homes. however, inequities in staffing between rural rural public libraries and digital inclusion | real, bertot, and jaeger 13 and nonrural libraries can prevent rural residents from having equal access and aid in finding careers. government service access rural libraries also lag behind libraries in more populated areas in providing support for accessing online government services. there is no statistically significant difference between public libraries in staff providing assistance to patrons who need help filling out forms, with 96.6% of all libraries offering this service.27 however, only 45.6% of rural libraries assist customers in understanding government programs and services, compared to 57.8% of urban and 52.9% of rural libraries. rural libraries are also far less likely to have formal guides to help customers understand these government services, with only 15.3% of rural libraries offering such products as compared to 33.6% of urban and 22.2% of suburban counterparts. just 6.2% of rural libraries offer formal training classes for using government websites, understanding government programs, and completing forms. roughly a one-fourth (24.5%) of urban and 11.9% of suburban libraries offer such services. in terms of staff expertise, 20.0% of rural libraries reported having at least one staff member with significant knowledge and expertise of government services, compared to 31.4% of urban and 25.0% of suburban libraries. therefore, while most public libraries help patrons access government services, rural libraries lag substantially in the type of formal planning that may make patrons more aware of government services that would improve their quality of life. important services such as voter registration, motor vehicle services, payment of taxes, and school enrollment for children can now be done either only or much more efficiently online.28 these online services are more convenient for many americans, but “while many members of the public may struggle with accessing or using egovernment information and services, government agencies have come to focus on it as a means of cost savings rather than increasing access for members of the public.”29 government agencies have for the most part not taken many americans' lack of digital literacy into account when shifting their primary means of service to the digital realm, nor have they considered the effect this shift has on public libraries as the primary internet provider for many americans. this has led to extra responsibilities for rural public libraries but not a direct increase in resources. one might consider that rural libraries offer fewer of these services, or have less expertise in providing digital government services, in part because such services are not in demand by patrons. however, government services have steadily moved online and the pace is accelerating towards an e-only means of interacting with government. the open government movement,30 combined with the federal government’s release of the technology and services blueprint, signals the further use of technologies to offer innovative and operational digital government services—both through more traditional web-based services and mobile applications.31 and state and local governments are increasingly engaging in e-government services such as unemployment and social service benefits, taxation, licensing, and more. in short, federal, state, and local governments are moving rapidly to a range of e-services that will necessitate facility by librarians with technologies, information technology and libraries | march 2014 14 government services, and government information to better help their communities navigate the challenges of e-government. government intervention in digital literacy although most government agencies have not considered the effect their shift to primarily digital services has on individuals who lack basic digital literacy, the federal communications commission launched two programs that could help with the digital literacy problem. the first of these, digitalliteracy.gov, is designed to provide individuals with tools to facilitate digital inclusion, helping users to acquire skills that will make them more capable in the modern information environment. the challenge with this approach is that many resources on the website are designed for individuals who need such skills and who therefore probably do not have access to the internet or possess the skills to fully engage the resources. moreover, most of these resource links point to external sites, which are organized by arbitrary user ratings rather than skill level and relevance.32 likewise, educator resources – which should be most valuable in helping librarians to education patrons – are presented as links to external sites with limited information about each resource. these resources may be able to help patrons, but a collaborative effort that includes public librarians in creating resources could better target particular patron needs in a public access setting. a newer project, connect2compete, demonstrated more promising progress in this area. connect2compete is a partnership between the fcc and private businesses to provide low-cost internet and computers to low-income families, digital literacy training, and other services.33 they also publicize the digital literacy divide, working with the ad council and other organizations to promote this issue.34 the website allows users to search for places where they can receive digital literacy training, with the search results primarily displaying local public library branches. however, despite pointing users to public libraries for such training, connect2compete currently only helps to fund such training in limited cases. while this program provides a strong model for raising awareness about digital inclusion, it is unlikely to provide infrastructure resources to fully bridge the gap between rural and nonrural communities in the near future. while the fcc has been innovative by soliciting private funds to prevent connect2compete from using any taxpayer funds, these private funds will not replace the need for government funds for public libraries throughout the nation, nor is private funding likely to continue indefinitely. indeed, “while governments at all levels are relying on public libraries to ensure digital inclusion, the same governments are reducing the funding of the very libraries that are being relied on.”35 the following section will detail how decreasing funding and limited resources have contributed to the digital divide between rural and nonrural libraries. rural libraries and barriers to promoting digital inclusion when the internet was emerging in the 1990s, “public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service rural public libraries and digital inclusion | real, bertot, and jaeger 15 and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology services and resources.”36 while some libraries have increased their funding levels to match these challenges, most funding agencies have not recognized the costs or value of additional services that public libraries now offer in a wired nation. this section discusses the reasons why rural libraries have not been able to offer the same level of service nonrural library patrons routinely expect. funding inadequacies for rural libraries rural libraries face challenges from their problematic funding structure. sin noted that for public libraries, “on average, the local government provided 76.6% of the funding; the state, 8.7%; the federal government, 0.7%; and other sources, 13.9%.”37 this is a particular problem for rural libraries since, as holt explained, “if cities and suburbs had to survive on the extraordinarily low taxes on agricultural property, the urban/suburban public sector would have service levels so low that most officials would turn away in disgust.”38 this lack of local revenue for all public services—including libraries—in rural areas is exacerbated by the continuing population decrease in small towns and the desirability of such locales for retiring seniors, who prefer to live in areas with low taxes because of limited incomes.39 in other words, public library funding structures that place local governments at the forefront of budgeting plans put rural libraries at a serious disadvantage and promote a digital divide between rural and nonrural areas. holt notes, “it is the legitimate function of state government to make things right. state governments, after all, are of a size and scale that historically allows them to perform as equity agencies for locales.”40 indeed, the averages for funding sources cited above can vary, and state and federal governments have attempted to dampen the funding inequities between rural and nonrural libraries. one example is the federal e-rate program, established under the telecommunications act of 1996 to provide schools, libraries, and healthcare providers with a discounted “education rate” for communication technologies, including internet technologies.41 while this has subsidized part of the internet service costs for libraries throughout the nation, many libraries do not apply because they do not know they are eligible or the application process is too complicated. some rural libraries have had the advantage of their state library systems applying on their behalf, but even when funding is provided this only covers parts of the libraries’ connection and equipment costs. and, according to the plftas survey, only 61.5% of rural libraries received e-rate funds, compared to 75.0% of urban libraries, showing the program does not favor the class of libraries with the greatest connectivity issues.42 likewise, as noted above, the federal government designated $7.2 billion from the american recovery and reinvestment act of 2009 for improving broadband access throughout the nation, with funding designated for rural areas and public libraries in general. these improvements will take time, though, and will not fully compensate for the lack of local funds for rural libraries or rural libraries not receiving nearly as much in nongovernmental funds as nonrural libraries.43 additionally, while local governments in some areas have created their own broadband information technology and libraries | march 2014 16 infrastructure to compensate for corporate providers’ unwillingness to expand to some areas due to inadequate predicted profits, nineteen state governments banned such practices due to lobbying efforts from the broadband industry.44 the corporations that lobbied for these laws feared that if this becomes common practice, local governments could offer low enough pricing to compete against for-profit services. while this may be a legitimate concern, the end result of this legislation is local governments—including rural governments—in some states being legally blocked from allocating funds to solve the market failure that has prevented corporate providers from adequately expanding into rural areas. therefore public libraries’ funding and resource structures are inherently stacked against rural institutions. while e-rate and other federal and state programs may mitigate the problem, the ultimate solution needs to be a restructuring of library funding models that takes the primary burden off struggling local governments or at least increases state and federal contributions. in a seminal article on rural libraries and the technology written in 1995, vavrek noted that “public libraries cannot survive by only appealing to those who are least likely to be able to pay to support the library. while visions of the homeless person using the internet to locate information is both compassionate and within the social role of the public library, can the library afford to provide this access?”45 beyond patrons not being open to assisting less fortunate individuals, vavrek suggested attempts to diversify library services—including introducing internet technology services, which was novel at the time—could distract resources from libraries’ established services that have traditionally appealed to all income classes and, with this, erode public support for these institutions. the pew home broadband 2010 survey show vavrek’s thoughts on this matter were prescient, as 53% of survey respondents believed the government either should not support broadband expansion or that this should not be a very important priority.46 the benefits of greater broadband access and relevant service support may seem obvious to those who are intimate with this matter, but much of the public does not see the importance of expanding such services. if rural librarians cannot fight these perceptions and convince traditional library users and the general public of the importance of these services, then they will probably not be able to reverse these negative trends. unfortunately, rural libraries lack the time, resources, and data to lobby the public on these matters. staffing and training problems for rural librarians a lack of funding and resources affect not only rural public libraries, but also rural public librarians. in a study that illustrated such issues, flatley and wyman surveyed a random sample of libraries in extremely rural areas, with their service population baseline being 2,500 as opposed to the 25,000 threshold noted above.47 while the data they collected are somewhat dated (the survey was conducted in 2007), this study still deserves special attention because similar data have not been collected more recently or by other authors. the authors found that 80% of rural libraries have only a single full-time employee, and 50% have two or less paid employees when fulland part-time employees are considered.48 these rural public libraries and digital inclusion | real, bertot, and jaeger 17 employees are underpaid compared to the national average, with 72% reporting they earned $12.99 or less per hour.49 when asked why they believed their pay was relatively low, more than half (53%) of rural librarians responded it was because their communities lacked funds, demonstrating the structure of local funding being more important than state and federal funding to librarians’ salaries.50 flatley and wyman also found that only 14% of these employees held mls degrees, with 32% having achieved bachelor’s degrees and 37% having completed only a high school diploma.51 as one would expect in relation to most rural librarians not having professional training before entering the field, many of these individuals applied for their first library career because they saw a position advertised for their local library and it offered better pay than most other local jobs. while many rural librarians entered the profession because of reasons other than a desire to become librarians, the data suggest these individuals are capable and enthusiastic about their jobs. almost half (47%) of rural librarians had worked in the field for more than a decade, with an additional 22% having been librarians for six to ten years.52 two-thirds (66%) of survey respondents stated they intended to remain librarians until retirement age, and 97% responded they were very satisfied or somewhat satisfied with their careers.53 additionally, despite the relatively low pay for library positions, this was not the most common complaint rural librarians had about their jobs. instead, while 27% found low pay to be the greatest issue they faced, 29% felt a lack of funds for new materials was a greater problem.54 therefore, while certain technological issues in rural public libraries—such as the lack of technological training courses for patrons—can be framed accurately as a problem involving rural librarians, these problems should not be framed as the librarians’ fault. with current staffing levels, rural librarians do not have as much available staff time to provide training courses and one-on-one training as their suburban and urban counterparts. these librarians may also lack the knowledge and experience to train others in technological skills, and their libraries may lack the funds to help them acquire these abilities. these factors are outside of these librarians’ control, however, and “no matter how hard lis professionals try, one cannot expect public library systems (especially those in less-advantaged neighborhoods) to bridge the information gap when the libraries are themselves underfunded and understaffed.”55 considering typical rural librarians' high dedication levels, one can assume they would be willing to remedy information gaps if they first had the resources to fix their libraries’ skill, funding, and staffing gaps. possible solutions rural libraries face the dual issue of a lack of resources to allow librarians enough time to advocate for their branches and a lack of data that advocates can use to show funders these libraries’ value to their communities. as a solution for the latter problem, sin suggested that library and information science (lis) scholars and other prominent figures in the field begin a dialogue with underfunded libraries—including rural institutions—to work with librarians to gather, process, and interpret data on libraries’ needs and libraries’ effects on their information technology and libraries | march 2014 18 communities.56 this would have the dual benefit of giving librarians better information with which they could focus their services for maximum value and providing graduate-student and professional-level researchers with a stronger understanding of their field. the authors of this article would like to expand on this slightly to suggest that any researchers who draft scholarly papers and presentations from data collected from work with underfunded libraries should feel obligated to assist libraries in using this data for their own benefit. scholars are likely to be in a better position to advocate for libraries with which they collaborate than time-and resourcestrapped librarians, and they should feel an ethical responsibility to do so after reaping the benefits of research. more rural librarians also need the skills to empower them to lead technological training courses for patrons, gather data to better understand how to best optimize their services, and lobby for greater funding at the local level. mehra et al. of the school of information sciences at the university of tennessee attempted to remedy this problem to a limited degree with a program they launched in june 2010 with funding from the imls laura bush 21st century librarian program.57 the researchers used this funding to provide full scholarships—including laptop computers and funds for books—to sixteen rural librarians already working in the south and central appalachia regions, allowing them to earn an mls degree in two years of part-time study. the researchers had previously conducted a qualitative survey of rural librarians in tennessee to determine the training and resource needs of rural librarians,58 and they used these data to form a customized mls program for the scholarship students. this included courses focusing on strong technical competencies, service evaluation, grant writing, and other courses of particular relevance to the rural environment. likewise, georgia uses state funds to pay the salaries of many experienced librarians with mls degrees throughout the state, thereby lifting the burden of affording such individuals off cashstrapped counties and municipalities.59 however, as this system develops in georgia, state funding is still limited and there have been state funding cuts to other areas, such as materials and infrastructure, to allow for an increase in state-funded professional librarians.60 therefore, while this appears to be a promising model that can be of particular benefit to rural residents of the state, further study is needed to determine its overall effects. with an estimate of more than 8,000 rural public libraries operating in the united states,61 it would be impossible to find the resources to provide the large majority of librarians without an mls at these locations with the full training needed to earn the degree. even if such funding were available, a large portion—if not the majority—of these resources could be put to better use by improving rural libraries’ technological infrastructure, increasing salaries, and growing collections. therefore, while the mls may remain the gold standard for library professionalism, it is not a realistic goal for many experienced and dedicated librarians throughout the country. instead, a more realistic program on a larger scale may be to provide rural librarians with targeted online and in-person training to enhance the skills they feel they need to be more successful. faculty and rural public libraries and digital inclusion | real, bertot, and jaeger 19 graduate students in lis academic programs are perhaps the most capable people to lead such training, and they are likely more capable of writing grant proposals to cover the costs of such programs than the rural librarians they could assist. mehra et al. have shown promising progress in this direction,62 and by removing the mls goal (or only expecting it in limited cases), their work could easily be emulated to help lis educators empower librarians throughout the nation. connect2compete, as detailed above, also has the potential to provide a training model for public librarians. the organization plans to create a “digital literacy corps,” comprising individuals who will help train portions of the public in basic digital literacy skills.63 while this program is still in its early phases, the organization plans to include librarians among this corps, training them to be better able to train others. once again, this will be achieved through private funds donated by corporate partners. this is certainly a noble effort and will likely benefit many libraries and their patrons, but “having access to training and being able to take advantage of training are two separate things.”64 connect2compete, digitalliteracy.gov, and other organizations already provide some resources to help rural librarians understand digital literacy issues and provide better training, but librarians have limited time to familiarize themselves with these sources when dealing with their daily duties. for librarians to use current, future, or more refined training resources, the problem of low staffing—and its cause, low funding—must be addressed. since many rural librarians lack the skill or, more importantly, time to lobby for their own libraries, this is a significant area where partner organizations can help. whether these partners are university departments as envisioned by mehra et al. and sin or individuals funded by private donations in the connect2compete model is inconsequential. the important issue is that if these partner groups want to truly help rural libraries bridge the digital divide, these groups will have to contribute a significant portion of their efforts to lobbying to increase library funding enough to improve infrastructure and increase staffing—and, through this, staff time—for training and assisting patrons. as discussed above, the btop program has had success both in increasing technological infrastructure and human infrastructure, with grant funding being used in some cases to bring in temporary staff that is capable of training patrons in digital literacy and to increase training opportunities for patrons using existing staff. given the information above, btop’s holistic approach is certainly encouraging, and the program's use of federal funds has shown how resources from above the local level can serve as an equalizing force. the temporary nature and limited funding of this program, however, make it important to remember this cannot be considered as the primary solution to the digital inclusion problem. conclusion many rural public libraries are the only providers of free broadband internet service and computer terminals for their communities, with these communities having the lowest average proportion of homes with broadband connections. with the internet being essential to receive information technology and libraries | march 2014 20 important government services and to apply for jobs with some of the largest and most ubiquitous employers throughout the nation, the value of the services offered by these libraries cannot be understated. the basic public library funding structure needs to be modified to close the digital inclusion gap between rural and more populated areas. even if local governments remain the primary funding source for public libraries, this contribution cannot remain grossly disproportionate when compared to state and federal support. state and federal governments are already seeing savings by moving access to government services and information online, and these governments will benefit with the better employment rates and better employee competency that comes with a digitally inclusive society. since these governments share in the benefits of digital inclusion, they must also share in the costs. some programs have shown promising results in bolstering rural public libraries and, though this, improving this nation's digital inclusion. these results range from large-scale programs such as btop to smaller programs such as the mls education program initiated by mehra et al. a common element of many of these programs, though, is their temporary nature, showing that funders are not recognizing that as technological innovation continues, new problems in digital inclusion will emerge. for government decision makers to understand the ongoing nature of the digital inclusion problem, rural public librarians and their allies—including academics and other stakeholders— will need to gather better data and provide better advocacy. references 1. , “fy2011 public library (public use) data files,” institute of museum and library services, http://www.imls.gov/research/pls_data_files.aspx. 2. john carlo bertot etal., 2011–2012 public library funding and technology access survey: survey findings and results (college park, md: information policy and access center, 2012), http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf. 3. the studies originally began as the public libraries and the internet survey series until 2006 through various funding sources, at which time they became part of the public library funding and technology access study (http://www.ala.org/plinternetfunding), funded by the american library association and the bill & melinda gates foundation. 4. john carlo bertot et al., “public libraries and the internet: an evolutionary perspective,” library technology reports 47, no. 6 (2011): 7–8. 5. paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1–20. 6. bertot et al., 2011–2012 public library funding and technology access survey. http://www.imls.gov/research/pls_data_files.aspx http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf http://www.ala.org/plinternetfunding rural public libraries and digital inclusion | real, bertot, and jaeger 21 7. aaron smith, home broadband 2010 (washington, dc: pew research center, 2010): 8, http://www.pewinternet.org/~/media//files/reports/2010/home%20broadband%20201 0.pdf. 8. federal communications commission, connecting america: the national broadband plan (washington, dc: federal communications commission, 2009): xi–xiii, http://download.broadband.gov/plan/national-broadband-plan.pdf. 9. aaron smith, home broadband 2010, 5. 10. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286; bertot et al., “public libraries and the internet,” 12–13. 11. jaeger et al., “the intersection of public policy and public access,” 1–20. 12. us public libraries and the broadband technology opportunities program (btop). (washington, dc: american library association, 2013): 1–2, http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf. 13. ibid., 18. 14. “states continue to feel recession’s impact,” center on budget and policy priorities, last modified june 27, 2012, http://www.cbpp.org/cms/index.cfm?fa=view&id=711. 15. deanne w. swan et al., public libraries survey: fiscal year 2010 (imls-2013–pls-01) (washington, dc: institute of museum and library services, 2010) 16. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology & libraries 26, no. 2 (2007): 4–14, http://dx.doi.org/10.6017/ital.v26i2.3277. 17. natalie greene taylor et al., “public libraries in the new economy: 21st century skills, the internet, and community needs,” public library quarterly 31, no. 3 (2012): 191–219. 18. bharat mehra et al., “what is the value of lis education? a qualitative study of the perspectives of tennessee’s rural librarians,” journal of education for library & information science 52, no. 4 (2011): 272. 19. bertot et al., 2011–2012 public library funding and technology access survey, 15. 20. ibid., 22. 21. ibid., 46. 22. bertot, “public access technologies in public libraries,” 88. 23. bertot et al., 2011-2012 public library funding and technology access survey, 21. 24. ibid., 29. http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://download.broadband.gov/plan/national-broadband-plan.pdf http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf http://www.cbpp.org/cms/index.cfm?fa=view&id=711 information technology and libraries | march 2014 22 25. ibid., 42–45. 26. mehra et al., “what is the value of lis education?” 271–72. 27. bertot et al., 2011–2012 public library funding and technology access survey, 36. 28. paul t. jaeger and john carlo bertot, “responsibility rolls down: public libraries and the social and policy obligations of ensuring access to e-government and government information,” public library quarterly 30, no. 2 (2011): 91–116. 29. ibid., 100. 30. the obama administration’s commitment to open government: a status report (washington: government printing office, 2013): 4–7, http://www.whitehouse.gov/sites/default/files/opengov_report.pdf. 31. barack obama, digital government: building a 21st century platform to better serve the american people (washington, dc: office of management and budget, 2012), http://www.wh.gov/digitalgov/pdf. 32. “find educator tools,” digitalliteracy.gov, http://www.digitalliteracy.gov/content/educator. 33. “about us,” everyoneon, http://www.everyoneon.org/c2c. 34. ad council, “ad council & connect2compete launch nationwide psa campaign to increase digital literacy for 62 million americans,” press release, march 21, 2013, http://www.adcouncil.org/news-events/press-releases/ad-council-connect2competelaunch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans. 35. jaeger et al., “public libraries and internet access,” 14. 36. bertot, “public access technologies in public libraries,” 81. 37. sei-ching joanna sin, “neighborhood disparities in access to information resources: measuring and mapping u.s. public libraries’ funding and service landscapes,” library & information science research 33, no. 1 (2011): 45. 38. glenn e. holt, “a viable future for small and rural libraries,” public library quarterly 28, no. 4 (2009): 288. 39. ibid., 288–89. 40. ibid., 289. 41. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 42. bertot et al., 2011–2012 public library funding and technology access survey, 61. 43. sin, “neighborhood disparities in access,” 51. http://www.whitehouse.gov/sites/default/files/opengov_report.pdf http://www.wh.gov/digitalgov/pdf http://www.digitalliteracy.gov/content/educator http://www.everyoneon.org/c2c/ http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans rural public libraries and digital inclusion | real, bertot, and jaeger 23 44. olivier sylvain, “broadband localism,” ohio state law journal 73, no. 4 (2012): 20–24. 45. bernard vavrek, “rural information needs and the role of the public library,” library trends 44, no. 1 (1995): 26. 46. aaron smith, home broadband 2010, 2. 47. robert flatley and andrea wyman, “changes in rural libraries and librarianship: a comparative survey,” public library quarterly 28, no. 1 (2009): 25–26. 48. ibid., 34. 49. ibid., 35. 50. ibid., 28. 51. ibid., 33. 52. ibid., 26. 53. ibid., 29. 54. ibid., 30. 55. sin, “neighborhood disparities in access,” 50. 56. ibid., 51. 57. bharat mehra et al., “collaborations between lis education and rural libraries in the southern and central appalachia: improving librarian technology literacy and management training,” journal of education for library & information science 52, no. 3 (2011): 238–47. 58. mehra et al., “what is the value of lis education?” 59. “state paid position guidelines,” last updated august 2013, http://www.georgialibraries.org/lib/stategrants_accounting/official_state_paid_position_gui delines-updated-august-2013.pdf. 60. bob warburton, “georgia tweaks state funding formula to prioritize librarians,” library journal, february 2, 2014, http://lj.libraryjournal.com/2014/02/budgets-funding/georgiatweaks-state-funding-formula-to-prioritize-librarians. 61. bertot et al., 2011–2012 public library funding and technology access survey, 14. 62. mehra et al., “collaborations between lis education and rural libraries”; mehra et al., “what is the value of lis education?” 63. institute of museum and library services, “imls announces grant to support libraries’ roles in national broadband adoption efforts,” press release, june 14, 2012, http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadban d_adoption_efforts.aspx. http://www.georgialibraries.org/lib/stategrants_accounting/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx information technology and libraries | march 2014 24 64. bertot, “public access technologies in public libraries,” 88. text analysis and visualization research on the hetu dangse during the qing dynasty of china article text analysis and visualization research on the hetu dangse during the qing dynasty of china zhiyu wang, jingyu wu, guang yu, and zhiping song information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13279 zhiyu wang (mikemike248@gmail.com) is phd candidate, school of management, harbin institute of technology and associate professor, school of history, liaoning university. jingyu wu (734665532@qq.com) is graduate student, school of history, liaoning university. guang yu (yug@hit.edu.cn) is professor, school of management, harbin institute of technology. zhiping song (1367123893@qq.com) is graduate student, school of history, liaoning university. © 2021. abstract in traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. in this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. this technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. in this study, we use the historical documents of the qing dynasty hetu dangse, preserved in the archives of liaoning province, as data analysis samples. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters in the world. through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and svm (support vector machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the shengjing area of the qing dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. through this, archivists can be guided practically in historical materials’ management and compilation. introduction china has a long history documented in numerous archives. at present, various local archive departments preserve large numbers of historical documents from different periods. owing to the development of china’s archive digitization, archive management departments at all levels have established digital archive abstracts, catalogs, and subject indexes of historical documents in their collections realizing online retrieval of historical archives. with in-depth research on chinese history, simple catalog retrieval cannot satisfy researchers’ demand for related knowledge in historical archives. owing to the limitations of the catalog retrieval system, complex catalog data still need to be read manually. however, it is difficult to view the overall picture of the recorded content and impossible to easily distinguish important information in historical materials; this leads to various difficulties, such as the compilation of historical materials for chinese historical researchers. thus, in this study, we aim to use text analysis and visualization methods in machine learning to conduct data mining analysis of historical document data. these methods will help us discover the logical relationships of historical records and their purposes, accomplish visual presentations of historical entities and knowledge discovered in historiography, improve knowledge representation and automatic classification of historical data, and provide valuable information for historical archive researchers. mailto:mikemike248@gmail.com mailto:734665532@qq.com mailto:yug@hit.edu.cn mailto:1367123893@qq.com information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 2 during the process of analyzing traditional manual methods for interpreting historical documents, we find the following phenomena: macro description, single angle, selective analysis, and one-way knowledge connection, among others. for example, the hetu dangse preserved in the liaoning archives contains a total of 1,149 volumes and 127,000 pages, making it difficult to fully grasp and understand the overall content of such documents. relying on manual reading and analysis of entire archives is an unrealistic task. therefore, this paper proposes using machine learning, natural language processing (nlp), and other technologies to address various problems from traditional manual reading. first, information from historical documents can be revealed from different angles, and this allows the content of the documents to be displayed more comprehensively and scientifically through visual charts. second, use of objective quantitative analysis methods, such as text analysis and nlp, prevents subjective interpretations of the same content. third, nlp and other technologies can solve the problem of calculating massive text training data sets while forming systematic knowledge that avoids the omission and one-sided understanding of knowledge in the historical archive. the application of machine learning in historical data analysis has attracted the attention of researchers in management, history, and computer science. tao used the latent dirichlet allocation (lda) topic modeling algorithm to analyze the themes of documents from 1700 to 1800 included in the german archives, providing a more three-dimensional interpretation and explanation of the spiritual world of germany during the eighteenth century.1 chinese scholars kaixu et al. proposed a method of automatic sentence punctuation based on conditional random fields in ancient chinese.2 this method was proved to better solve the problem of automatic punctuation processing compared with the single-layer conditional random field strategy in ancient chinese as tested on the two corpora of the analects and records of the grand historian. swiss and south african scholars stauffer, fischer, and riesen, and chinese scholars wu, wang, and ma used the kws technology and deep reinforcement learning to automatically recognize handwritten pictures in historical documents.3 solar and radovan used the national and university library of slovenia’s historical pictures and maps as research data. using gis technology, they created a novel display method, and interdisciplinary data resource web application to access and research the data.4 chinese scholars dong et al. and polish scholars kuna and kowalski used the webgis technology to conduct efficient management and visualization research on historical data of natural disasters in ancient china and russia. 5 meanwhile, latvian scholars ivanovs and varfolomeyev and dutch scholars schreiber et al. used web technology to develop a web service platform and explored the intelligent environment of cultural heritage service utilization.6 korean scholars kim et al. used machine learning technology to determine the complex relationships between tasks of various classes in a specific historical period through the network of historical figures.7 judging from results in related fields, the semantic analysis and visualization of historical archives in an intelligent way are gradually moving from statistical description to knowledge mining. these results provide theoretical feasibility and practical technical experience for this study. at present, research on historical documents mainly focuses on the retrieval and utilization of historical material databases. since the words, semantics, grammar, and sentence patterns recorded in historical materials differ from modern texts, using data mining technologies such as machine learning and nlp to intelligently identify historical documents and organize historical data will help us more than traditional methods. this requires the cooperation of artificial intelligence and historical researchers to establish an effective method of historical big data information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 3 analysis to achieve the transformation from traditional manual historical document analysis to automatic artificial intelligence analysis methods. in this paper, we use machine learning and data visualization as a tool to identify differently the content of the historical documents from traditional literature reading, reveal valuable information in the content of historical documents, and promote more systematic, efficient, and detailed understanding of the literature. related technology definition to perform text analysis and visualization of the hetu dangse, we use machine learning technology such as word vector processing, the svm (support vector machines) model and network analysis. word vector is a numerical vector representation of a word’s literal and implicit meaning.8 we segmented the hetu dangse’s catalog data and used the word2vec model to transform the segmented data’s word vector form into a set of 50-dimensional numerical vectors representing a catalog’s vector data set. to accurately visualize historical document records’ relationship features, we reduced the vector data set’s dimensionality. dimensionality reduction, or dimension reduction, is data’s transformation from a highinto a low-dimensional space so that the representation retains some of the original data’s meaningful properties, ideally close to its intrinsic dimension.9 after dimensionality reduction, each catalog data in the vector data set is reduced from 50 to 2 dimensions to facilitate flat display. we used the svm model and network analysis technology to analyze the vector data set. the svm model is a set of supervised learning methods used for classification, regression, and outlier detection.10 it is given a vector data set as training to represent historical document records as points in space, and learns independently through the kernel algorithm. using the algorithm, it maps the separated new records to the same space, and predicts their category based on which side of the interval they fall. network analysis techniques derive from network theory, a computer science system demonstrating social networks’ powerful influences. network analysis technology’s characteristics determine that it is suitable for books and historical archives’ visualization in the library and information science field, because the visualization technique involves mapping entities’ relationships based on the symmetry or asymmetry of their relative proximity.11 thus, it helps to discover historical documents’ knowledge relevance. for example, citation network analysis can identify emerging relationships in healthcare domain journals.12 sample data preprocessing and classification this study uses the catalog of the qing dynasty historical archives from the hetu dangse collected by the liaoning archives as the research sample to conduct text analysis and visualization research. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters both in domestic and international. the hetu dangse is the official document of communication between shengjing general yamen, the wubu of shengjing and fengtian office, and the document communicated between the beijing internal affairs office in charge and the liubu of beijing during the qing dynasty. the hetu dangse was published from 2015 to 2018, including the hetu dangse·kangxi period (56 volumes), hetu dangse·yongzheng period (30 volumes), hetu dangse·qianlong period (24 volumes), hetu dangse·qianlong period (17 volumes), hetu dangse·daoguang period (52 volumes), hetu dangse·jiaqing period (58 volumes), hetu dangse·qianlong period official documents (46 volumes), hetu dangse·qianlong period official documents (46 volumes), and hetu dangse·general list (16 volumes).13 the hetu dangse is an information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 4 important document for studying the history of the qing dynasty. owing to the special status of shengjing in the qing dynasty, it has a unique historical significance as the companion capital of beijing and the hometown of the qing royal family. this provides original evidence from this time for studying politics, economy, culture, history, and natural ecology in northeast china. in this study, we preprocess the catalog data of the hetu dangse by performing text segmentation, creating a corpus, and labeling data before using text analysis and visualization technology to analyze the catalog data of hetu dangse. first, we use word frequency analysis and statistics to study the functions of institutions. second, we use the co-word clustering algorithm to quantify and visualize the institutional relationships. finally, we use the svm model to automatically classify and explore the catalog data of the hetu dangse. figure 1 illustrates this process. figure 1. text analysis flowchart. data preparation and preprocessing we collected 95,680 catalog data items in the hetu dangse of the liaoning archives, including 25,148 items from the kangxi period; 1,096 items from the yongzheng period; 23,819 items from the qianlong period; 20,730 items from the jiaqing period; and 15,887 items from the daoguang period. the content of each catalog data includes three parts: title information, time of publication (chinese lunar calendar), and responsible agency. the proportion for each period was not evenly distributed in the catalog data of the hetu dangse with the kangxi period catalog data having the highest proportion (26.2%). through the catalog data information, we can perform an in -depth analysis of the content of the hetu dangse from the three perspectives: institutional functions, institutional relationships, and topic classification. data cleaning as the text recorded in the archives of the hetu dangse are manchu and ancient chinese, using chinese word segmentation tools (jieba, snownlp, thulac, etc.) based on modern chinese will cause errors. therefore, it is necessary to construct a special text corpus for word segmentation. first, we construct a stop vocabulary list to remove words with little impact on semantics in the hetu dangse, such as for (为), please (请) and of (之). second, we use the word segmentation tools mentioned above for preliminary word segmentation and then perform part-of-speech tagging and word segmentation corrections based on the word segmentation results. the title part of the catalog data of the hetu dangse mainly contains three dimensions of information: the record title of the catalog, issuing institution, and receiving institution. accordingly, we set a total of four types of tags in the text corpus: issuing institution, receiving institution, record type, and keywords. the receiving institution and the issuing institution correspond to the institutions at the beginning and the end of the catalog, respectively, such as the words shengjing zhangguan fang zuoling, and shengjing ministry of justice. the record type is the front word of the receiving institution, such as counseling (咨) and please (请). the keywords are words that can represent the overall semantics information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 5 in the record title of the catalog, such as arrest (缉拿) and advance (进送). table 1 presents the corpus we developed. table 1. hetu dangse corpus num word property1 property 2 1 盛京掌关防佐领 organization noun 2 为 stop_words preposition 3 缉拿 keywords verb 4 逃人 keywords noun 5 舒廷 name noun 6 官事 stop_words noun 7 咨 keywords verb 8 盛京刑部 organization noun 9 正白旗佐领 organization noun 10 兆麟 name noun 11 呈 stop_words preposition 12 为 stop_words preposition 13 交纳 keywords verb 14 壮丁 keywords noun 15 银两事 keywords noun ┋ ┋ ┋ ┋ 61047 收讫事 keywords noun 61048 盛京佐领 organization noun label data to improve the utilization efficiency of the hetu dangse and show the document content information from multiple angles, we use a supervised machine learning method to automatically classify the catalog data of the hetu dangse. therefore, the original catalog data set must be labeled. we determine the classification and label of the hetu dangse catalog according to the chinese archives classification law, chapter 12. table 2 presents the 11 categories of the catalog. with this, we complete the hetu dangse catalog sampling classification and labeling laying the foundation for automatic catalog classification. the hetu dangse has a total of 95,680 catalog records involving five periods: kangxi, yongzheng, qianlong, jiaqing, and daoguang. we randomly select 500 records from each period and manually label these 2,500 records as the sample data set. the data classification after manual labeling is shown in figure 2. the overall distribution is relatively even, making it suitable for machine learning processing. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 6 table 2. data labels num category 1 type of official documents (政务种类) 2 palace, royal family and eight banners affairs(宫 廷、皇族及八旗事务) 3 bureaucracy, officials(职官、吏役) 4 military(军事) 5 politics and law(政法) 6 sino-foreign relations(中外关系) 7 culture, education, health and scientific cultural study(文化、教育、卫生及科学文化研究) 8 finance(财政) 9 agriculture, water conservancy, animal husbandry (农业、水利、畜牧业) 10 building(建筑) 11 transportation, post and telecommunication(交 通、邮电) figure 2. percentage of the hetu dangse catalog data label chart. results in this study, we used the catalog data of the hetu dangse as a sample to analyze and reveal the hetu dangse catalog data from three perspectives: institutional function, institutional relationship, and automatic classification. this will improve usage efficiency of the hetu dangse, thus improving information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 7 researchers’ mastery of relevant information about the document. to achieve the functional requirements of text analysis, we adopted four methods: word vector conversion, word frequency analysis, co-word clustering, and the svm model. word vector conversion of text catalog data the automatic classification of machine-learning technology is based on vector data sets. thus, the hetu dangse text catalog data set must be vectorized before automatic classification. currently, word vector conversion technology mainly includes methods such as one-hot, word2vec, and glove. hetu dangse records the history of the qing dynasty for more than 200 years. there are inevitable relationships among the contents recorded in the documents, indicating that they are not isolated from each other. the word2vec model provides an efficient implementation of cbow and skip-gram architectures for computing vector representations of words, both of which are simple neural network models with one hidden layer. the word2vec model produces word vectors as outputs from inputting the text corpus. this method generates a vocabulary from the input words and then learns the word vectors via backpropagation and stochastic gradient descent.14 this makes the word2vec model more suitable for catalog data from hetu dangse. word2vec includes the cbow model and the skip-gram model, which can enrich the semantic relevance depending on the context, and it is more suitable for the semantic relevance of historical documents such as the hetu dangse. therefore, we adopt the skip-gram model to analyze the catalog data of hetu dangse. we extracted the features of word vectors in catalog data from the corpus, input them into the word2vec model, imported the gensim library in python, trained the vector embeddings, and obtained the htd.model.bin vector file and htd.text.model model file. the correlation between each word in the hetu dangse catalog can be found by implementing the model. for example, if the word bannerman (旗人) is input into the model, the most relevant words are minren (民人, with 0.84726 relevance), accused (被控, with 0.812017), and robbery (抢 劫, with 0.795359). to visualize the ethnic relationships recorded in the hetu dangse catalog, we input the first 300 words of the word vector into the trained word2vec model and performed dimensionality reduction to realize a planar graph. to understand the structure of the data intuitively, we used the t-sne algorithm to reduce the dimensions of the word vector. the t-sne is a type of nonlinear dimensionality reduction used to ensure that similar data points in high-dimensional space are as close as possible in low-dimensional space. we set the embedded space dimension parameter of tsne to 2 and the initialization parameter as pca. this makes it more globally stable than random initialization. the maximum number of optimization iterations is 5,000. figure 3 presents the results. in figure 3, the terms sanling, yongling, zhaoling, prime minister, and fuling form clusters. in shengjing, the qing set up the sanling prime minister's office, and the prime minister's mausoleum affairs minister was appointed concurrently by general shengjing. near fujinmen, the sanling prime minister's office was established. in the 30th year of guangxu, the government office was changed to the prime minister's office of shengjing mausoleum affairs, and the governor of the three provinces concurrently served. under the sanling prime minister’s office, the sanling office was set up to undertake the sacrifice and repair affairs of the three tombs (xinbin yongling, shenyang fuling, and zhaoling).15 therefore, the clustering in figure 3 verifies the close relationship between the sanling prime minister's office and the tombs. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 8 figure 3. 2d tsne visualization of word2vec vectors. analysis of the relationship between the documents received and sent of the institution with the statistics of the text data obtained after word segmentation, we can find the quantitative relationship between the documents received and sent by the institution, using the pearson correlation coefficient to judge whether there is a correlation between the number of documents received and the number of documents sent by the same institution. 𝜌(𝑟,𝑠) = 𝑐𝑜𝑣(𝑅,𝑆) 𝜎𝑟 𝜎𝑠 (3.1) we suppose that the pearson correlation coefficient between the number of documents received and the number of documents sent is ρ(r,s), r= {r1, r2, r3...r11}. here, r is the variable set of documents received from the institutional sample. set s= {s1, s2, s3…s11} is the variable set of documents sent by the institutional sample. by dividing the covariance of r and s by the product of their respective standard deviations, we can obtain the value of the correlation coefficient of the documents sent and received by the same institution. mining the relationship between institutions’ sending and receiving documents based on co-word clustering to mine the relationship between the institutions’ sending and receiving documents, we adopt a co-word clustering algorithm to generate a visualized network map of institutional relationships. the global co-occurrence rate represents the probability of two words appearing together in all the data sets. in large-scale data sets, if two words often appear together in the text, these two words are considered to be strongly related to the semantics.16 clustering is a method that places objects into a group by similarity or dissimilarity. thus, keywords with high correlation to each other tend to be placed in the same cluster. social network analysis, which evaluates the unique structure of interrelationships among individuals, has been extensively used in social science, psychological science, management science, and scientometrics. 17 we can obtain a sociogram from the institutional function analysis. the main purpose of the sociogram is to provide information information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 9 about the relationship between institutions’ sending and receiving documents. in the sociogram, each member of a network is described by a “vertex” or “node.” vertices represent high-frequency words, and the sizes of the nodes indicate the occurrence frequency. the smaller the size of a node, the lower the occurrence frequency. lines depict the relationships between two institutions. they exist between two keywords, indicating that they received or sent documents to each other. the thickness is proportional to the correlation between the keywords. the thicker the line between the two keywords, the stronger the connection. using this rationale, the map visualization and network characteristics (centrality, density, core-periphery structure, strategic diagram, and network chart) were obtained by analyzing pearson’s correlation matrix or other similarity matrices.18 in this study, we conducted network analysis on a binary matrix to display the relationships between the documents sent and received by the institutions in the shengjing area during the qing dynasty recorded in the hetu dangse. further, we extracted the receiving institution and issuing institution from each record of catalog data in the hetu dangse, and then we composed a new data set with the following data from the receiving institution: issuing institution and title content. we used python to convert the new data set to endnote format and import it into vosviewer1.6.15 to calculate and draw a visual map of the new data set. van eck and waltman of the netherlands’ leiden university developed vosviewer, a metrological analysis software used for constructing and visualizing network graphs.19 although the software’s development principle is based on documents’ co-citation principles, it can be applied to the construction of data network knowledge graphs in various fields. combined with the co -word clustering algorithm, we can create an entity connection network map for historical documents through vosviewer software to reflect the recorded content. automatic classification method of historical archives catalog based on the svm model we used the svm model in machine learning for automatic classification. the svm model has the advantages of strong generalization, low error rate, strong learning ability, and support for small sample data sets, making it suitable for historical archive catalog data samples with small sample characteristics. therefore, we attempted to classify the catalog data set of hetu dangse using the svm model. first, we divided the vectorized labeled data set into a training set and a testing set. the training set accounts for 70% of the data, and the testing set accounts for 30%. to ensure the accuracy of the model prediction, we adopted a random division method to avoid overfitting. second, we used a linear kernel in the svm model and grid search to find the best parameter. various combinations of the penalty coefficient (c) and gamma parameter in the svm model were tested based on their accuracy ranked from high to low. we then determined the best parameter combination. after the model was established, we validated the predictive performance of the model from multiple perspectives such as precision, recall, and f1 score to ensure the generalization ability and availability of the model. we set the penalty coefficients to 10, 100, 200, and 300, while the gamma parameters are set to 0.1, 0.25, 0.5, and 0.75. we used the precision evaluation criteria to find the optimal parameter combination of the model and then imported them. the penalty coefficient is set to the x-axis, the gamma parameter set to the y-axis, and the precision set to the z-axis. we implemented the model to obtain the visualization that is shown in figure 4. clearly, the optimal parameter combination is a penalty coefficient of 10 and a gamma parameter of 0.075. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 10 figure 4. svm grid search parameter tuning diagram. discussion the history of a nation is the foundation on which it is built. historical documents are the witnesses and recorders of history. through the study of historical documents, we can go back to the past, cherish the present, and look forward to the future. an increasing number of scholars have studied these documents in recent years due to their importance. the hetu dangse records the document communications between institutions in shengjing (now shenyang) and beijing during the qing dynasty. it is an important historical document that cannot be ignored when information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 11 studying the history of northeast china during the qing dynasty. here, we use the catalog data of the hetu dangse as the sample data to test the machine learning methods previously mentioned. we explore the results from the perspectives of institutional function, institutional relationship, and automatic classification to determine the feasibility of our methods. functions of institutions the number of institutions involved in the hetu dangse is over 150. these functional departments formed the governance system of the shengjing area during the qing dynasty. to gain a deeper understanding of the qing dynasty’s ruling system in the shengjing area, the functions of these institutions should be examined. this study analyzes and studies the functions of the institutions in the shengjing area through the number of documents and the frequency of content of the sending and receiving institutions. analysis of the number of documents received and sent by institutions by sorting and statistically analyzing the catalog data of hetu dangse, we obtained data on the number of documents received and sent by institutions in the shengjing area recorded in the hetu dangse. we set the vertical axis as the total number of communicated documents, number of issued documents, and number of received documents. we set the horizontal axis as the names of the institutions and then drew a histogram. this study analyzes the number of institutional archives of the hetu dangse catalog from three perspectives: total number of sent and received documents, number of received documents, and number of issued documents to find the institutions with the highest research value in the shengjing area. in the histogram shown in figure 5(a), the top three institutions in total number of communicated documents are shengjing internal affairs office, shengjing zuoling, and shengjing ministry of revenue. we can also observe that the top 10 institutions have different volumes of their respective documents received and sent by institutions. therefore, the ranking of the total number of communicated documents is not directly related to the respective rankings of the number of documents received and the number of documents sent. in figure 5(b), we can observe that the top three institutions in number of documents received in the hetu dangse are shengjing internal affairs office, shengjing ministry of revenue, and shengjing general yamen. figure 5(c) shows the top three institutions in number of documents sent in the hetu dangse are shengjing internal affairs office, shengjing zuoling, and shengjing general yamen. the total number of communicated documents, number of documents sent, and number of documents received by the shengjing internal affairs office all rank first; this indicates that the shengjing internal affairs office is the most important department of the ruling system in the qing dynasty during the shengjing area. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 12 figure 5. number of documents received and sent by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 13 by using the number of documents received and sent by the institutions, we calculated the pearson correlation coefficient to determine if the number of documents received and sent by the same institution is relevant. as institutional samples, we selected the shengjing internal affairs office, shengjing ministry of revenue, (beijing) internal affairs office in charge, shengjing zuoling, shengjing ministry of works, shengjing ministry of justice, shengjing general yamen, shengjing close defense zuoling, shengjing ministry of war, fengtian general yamen, and shengjing ministry of rites. through calculation, the result of pearson correlation coefficient is 0.69 (save two decimal places), so there is a correlation between the number of sent and received documents, as shown in figure 6. figure 6. scatter plot of pearson correlation coefficient. the hetu dangse is a copy of official documents dealing with the royal affairs of the shengjing internal affairs office during the qing dynasty. it contains the official documents between the shengjing internal affairs office and the beijing internal affairs office in charge, the liubu, etc. and the local shengjing general yamen, fengtian office, the wubu of shengjing, and other yamens.16 thus, there exist a large stock of documents with the shengjing internal affairs office as the sending and receiving agency. the wubu of shengjing, shengjing general yamen, shengjing zuoling, and other institutions are important hubs for the operation of institutions in shengjing. they played an important role in maintaining and stabilizing the society of shengjing. the number of documents is second in importance only to the shengjing internal affairs office. analysis of the frequency of documents received and sent by institutions to further explore the functions of institutions with research value, we extracted the contents of the catalogs from the top three institutions in total number of documents sent and received: shengjing internal affairs office, shengjing ministry of revenue, and shengjing zuoling. we then classified the catalogs of the aforementioned institutions according to receipts and postings. subsequently, we used word segmentation and word frequency statistics to process the two types information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 14 of catalog information and draw comparison diagrams to explore their specific functions in the hetu dangse. as shown in figure 7, we can roughly divide the obtained segmentation words into two categories. one is the name of the communicated official document institutions, such as the ministry of revenue, the ministry of justice, and the ministry of rites on the side of the word frequency (see fig. 7[a]). the other is the name of the official document content and the words zhuangtou (庄头), dimu (地亩), and zhuangding (壮丁) on the side of the frequency of the words in the documents sent. through a comparative analysis of the top 10 words received and sent by the same institution, we conclude that the institutions with a close relationship between receiving and sending documents are not the same. for example, the ministry of revenue of shengjing internal affairs office ranks first in the frequency of documents sent by institutions, while the shengjing zuoling ranks first for receiving institutions (see fig. 7[b]). the contents of documents sent and received by the same institution are different. figure 7(c) shows how the affairs sent by shengjing zuoling to ula (乌拉), forage (粮草), and license (执照) differ from those represented by the zhuangtou (庄头), accounting (会计), and close defense (关防) in the frequency of documents sent and frequency of receipts, respectively. based on previous research on the functions of shengjing’s institutions, the shengjing internal affairs office was set up in the companion capital of shengjing during the qing dynasty to be in charge of shengjing cemetery, sacrifice, organization of staff transfer, and other matters. 20 this relates to the meaning of words such as sacrifice (祭祀) in figure 7(a). the functions of the shengjing ministry of revenue were represented in guangxu’s great qing huidian. the cashiers in charge of taxation in shengjing, number of annual losses in official villages, and banner land were carefully recorded. the expenditures were distinguished and the accounting obeyed the regulations according to the beijing ministry of revenue at the end of the year.21 this is related to the meaning of words, such as dimu (地亩), land sale (卖地), and money and grain (钱粮) in figure 7(b). in fu yonggong and guan jialu’s research of shengjing zuoling’s functions, shengjing zuoling handled the transfer communicated documents; supervised and urged the various departments of guangchu, duyu, zhangyi, accounting, construction, and qingfeng to undertake matters; managed officials and various people; maintained the shengjing palace and the warehouse; selected women to send to beijing inspect; heard all types of cases; undertook the emperor’s general letter; managed the ula people and tributes; and accepted the emperor or the internal affairs office in charge, among other tasks.22 this is connected to the meaning of words such as ula (乌拉), close defense (关防) and license (执照) in figure 7(c). information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 15 figure 7. word frequency comparison of documents received (in blue) and sent (in orange) by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 16 institutional relationship analysis to further study the governance structure of the shengjing area, we not only need to understand the functions of each institution but also explore the overlap between functions of institutions. the catalog data of the hetu dangse consist of three parts: receiving institutions, issuing institutions, and record title of the catalog. a document often includes two institutions, th e receiving institution and the issuing institution, and it is certain that the content of a document relates closely to the functions between the two institutions. by observing the closeness between the number of institutions through visualizations, we conducted a quantitative analysis of consistent catalog data of the receiving and issuing institutions in the hetu dangse to provide reliable data for further research in the intersection of institutional functions in shengjing area. results of institutional connection analysis using the co-word clustering algorithm, we counted the number of archive catalog data consistent with the receiving and issuing institutions. we set the vertical axis as the issuing institution and the horizontal axis as the receiving institution to obtain figure 8. the numbers inside the boxes represent the quantity of catalog data that are consistent with the issuing institution. to facilitate measurements in the statistical process, records less than or equal to 50 communicated documents between the receiving institution and the issuing institution have been zeroed out. as shown in figure 8, the institutions having close relations with the documents recorded in the hetu dangse are concentrated in the issuing institutions shengjing zuoling and shengjing internal affairs office, and the receiving institutions shengjing internal affairs office and shengjing zuoling. among the receiving institutions, the number of documents received by the shengjing internal affairs office from shengjing general yamen reached as high as 11,936. the top three documents received by shengjing zuoling were fengtian general yamen (2,265 pieces), shengjing ministry of revenue (1,527 pieces), and shengjing ministry of justice (1,520 pieces). it is worth noting that there are less than 50 documents from shengjing zuoling in the shengjing internal affairs office. the overlapping functions of the institutions in the shengjing area enabled individual offices to play bureaucratic games, passing responsibility to other offices, leading to low efficiency in handling affairs. for example, the military and political power in the shengjing area was jointly controlled by the shengjing general office and the shengjing ministry of war. the shengjing area’s tax power was controlled by the shengjing ministry of revenue and fengtian office and their subordinate offices. this phenomenon ran through the entire qing dynasty. research on the cr ossfunctionality of institutions has always been a hot topic in qing historiography. by analyzing the official documents between the institutional functions, we can further explore the overlap as well as the advantages and disadvantages of the qing dynasty shengjing ruling system to study the history of shengjing institutions in the qing dynasty more thoroughly providing a reference for the design of current institutions. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 17 figure 8. relationship of communicated documents by the hetu dangse institutions diagram. visualization of institutional network map we used the hetu dangse catalog as sample data and the co-word clustering algorithm to obtain the close relationship between institutions and the appearance frequency of institutions. we drew a visual network diagram by virtue of vosviewer1.6.15 to obtain figure 9. in figure 9, institutions are represented by default as a circle with their names. the size of the label and the circle of an institution are determined by the weight of the item. the higher the weight of an item, the larger the label and the circle of the item. for some items, labels may not be displayed to avoid overlapping labels. the color of an institution is determined by the cluster the institutions belong to, and lines between items represent links. as shown in figure 9, the relationships between the institutions and departments in the hetu dangse form three core groups: the shengjing internal affairs office (in charge), shengjing zuoling, and beijing internal affairs office in charge. however, the relationships between the three groups are not similar; the distance between the group (beijing) internal affairs office in charge and the two other groups is relatively large. the group at the core of shengjing internal affairs office and the group at the core of shengjing zuoling are closely connected to each other through the wubu of information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 18 shengjing (shengjing ministry of revenue, shengjing ministry of rites, shengjing ministry of war, shengjing ministry of justice, and shengjing ministry of works). further, there are two larger individuals: fengtian general yamen and shengjing general yamen. fengtian general yamen and shengjing zuoling are closely related to each other, and the relationship between shengjing general yamen and shengjing internal affairs office is relatively close. figure 9. co-occurrence of institutions network map. the city of shengjing was the companion capital of the qing dynasty. the qing government implemented special governance measures in these areas that differed greatly from those of direct inland provinces.23 to ensure the stable rule of the shengjing area, the qing dynasty performed the following tasks. first, the qing dynasty set up a general garrison as the highest military and political chief in the shengjing area to be responsible for all military and political affairs within its jurisdiction. second, they established the fengtian office, a capital of the same level as the shuntian office, to rule the common people of the shengjing area. the states and counties, as well as the garrison banner officer, which was under the rule of general garrison, were local administrative institutions under the fengtian office. these institutions implemented the dual management rule of the bannerman and common people. third, as the companion capital, the shengjing area followed the ming dynasty companion capital system to set up the wubu of shengjing to maintain power. in addition, the shengjing internal affairs office, which was in charge of palace affairs, communicated with the beijing internal affairs office in charge. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 19 results of automatic classification analysis catalogs are important information resources in the field of historical archives. the classification of archival catalogs can not only link relevant information in archives or archive fonds, improve researchers’ utilization efficiency, and save time to search for required archives, but it can also be shown to readers in clusters. as the hetu dangse catalog is a series of historical documents stored for a long period of time, its original classification system does not suit well existing archival management methods. the hetu dangse has a total of 1,149 volumes and 127,000 pages. each volume contains a different number of documents and the ink characters on chinese art paper are in manchu and chinese. reading and categorizing the full text of the hetu dangse not only requires a lot of manpower, material, and financial resources but also extremely high requirements for the classified staff. they need to possess a good knowledge of manchu, archival science, document taxonomy, and other related disciplines. therefore, sorting and organizing the content of the hetu dangse is an impractical task that relies on manual reading and comprehension. to address this problem, we used the svm model of machine learning to automatically classify and explore the catalog data of the hetu dangse. this model further demonstrates the relevance of the knowledge between documents in the hetu dangse and facilitates an in-depth analysis. we imported the vectorized labeled data set into the svm model and selected the optimal parameter combination to run the model. to visualize the data results, the 50-dimensional word vector is reduced to a 2-dimensional word vector using the t-distributed random neighborhood embedding algorithm. we used the svm model to establish a hyperplane visualized in 2dimensional form. the legend only in figure 10 shows the data distribution of the six categories with the highest proportion owing to the large number of categorized data. to test the classification effect of the svm model, we used precision and recall as metrics and calculated the f1 score to validate the model. the results are presented in table 3. based on the created svm model, 95,680 catalog data of the hetu dangse were predicted and classified. the results are shown in figure 11. although there exist certain deficiencies in accuracy and other aspects, it a positive impact for the content research, management, utilization, and retrieval discovery of hetu dangse. table 3. svm model validation parameters result precision 0.736 recall 0.717 f1 score 0.716 information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 20 figure 10. svm decision region boundary. figure 11. hetu dangse catalog data prediction classification. conclusion in this study, we used machine learning to analyze and visualize the catalog data of the hetu dangse, revealing the functional relationship of the qing dynasty, shengjing regional institutions recorded in this historical document, and showing the institutional communicated relationships. using the svm model, we achieved automatic classification of the hetu dangse catalog from the category perspective. owing to the massive archives of historical materials in ancient china, the information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 21 fonts of many historical materials cannot be recognized by computers or humans. the digitization of catalogs has become a digital bridge between researchers and historical documents. this not only achieves the concise summary and refinement of them but also greatly improves the utilization efficiency by researchers. the svm model can “learn” through the labeled sample data and realize automatic classification of large amounts of unlabeled catalog data. by automatic classification of catalog data, historical data researchers and archive managers can use and manage a large number of historical documents and catalog data more effectively, greatly increasing their utilization. the co-occurrence algorithm can reveal the rules written by the catalog data itself, discover the distance between the catalog data, and form clusters providing a clearer direction for researchers to use historical documents. the algorithm also saves time for researchers to identify documents without purpose, making content presentation of historical documents to readers clearer. this paper improves archivists’ awareness of archive data compilation and management. first, data is observed, topics are identified, and potential relationships between these are found and established to improve historical archives’ compilation. second, the visual presentation method and carrier is chosen, and via the web browser established relationships are visualized for the users to access and utilize. it can be said that scientometric research method can promote the transformation of historical research and archives management and compilation research from traditional explanatory scholarship to truth-seeking scholarship. currently, the application of machine learning technology has gradually extended from applied disciplines to traditional fields of literature, art, and sociology. however, there are still many opportunities in the field of historical research. this study used methods in the field of artificial intelligence to conduct text mining and visualize the presentation of historical archive document catalog data and proposes a new digital and intelligent solution for researching chinese historical documents. with the development of science and technology, research methods for historical documents are undergoing constant changes from the traditional manual subjective analysis of historical data to relying on quantitative analysis represented by deep learning and data mining technology. it is an irreversible trend to research historical documents more comprehensively, accurately, and scientifically by means of artificial intelligence and other technologies on the scientific frontier. for future work, we plan to conduct research on the qing dynasty historical documents from a deeper semantic analysis level, construct a knowledge graph through the method of named entity recognition, and construct an ontological model transforming historical documents into a structured knowledge base to discover new knowledge from historical documents in an automated manner. acknowledgments funding statement this work was supported by the general program of the national natural science foundation of china [grant number 72074060], the research foundation of the ministry of education of china [grant number 20jhq012], and the national social science fund of china [grant number 16btq089]. data accessibility the data sets supporting this article have been uploaded as part of the supplementary material. https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 22 competing interests we have no competing interests. endnotes 1 wang tao, “data mining of german historical documents in the 18th century, taking topic models as examples,” xuehai 1, no. 20 (2017): 206–16, https://doi.org/10.16091/j.cnki.cn321308/c.2017.01.021. 2 kaixu zhang and yunqing xia, “crf-based approach to sentence segmentation and punctuation for ancient chinese prose,” journal of tsinghua university (science and technology) 10, no. 27 (2009): 39–49, https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027. 3 michael stauffer, andreas fischer, and kaspar riesen, “keyword spotting in historical handwritten documents based on graph matching,” pattern recognition 81 (2018): 240–53, https://doi.org/10.1016/j.patcog.2018.04.001; wu sihang et al., “precise detection of chinese characters in historical documents with deep reinforcement learning,” pattern recognition 107 (2020): 107503, https://doi.org/10.1016/j.patcog.2020.107503. 4 renata solar and dalibor radovan, “use of gis for presentation of the map and pictorial collection of the national and university library of slovenia,” information technology and libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 5 shaochun dong et al., “semantic enhanced webgis approach to visualize chinese historical natural hazards,” journal of cultural heritage 14, no. 3 (2013): 181–89, https://doi.org/10.1016/j.culher.2012.06.009; jakub kuna and łukasz kowalski, “exploring a non-existent city via historical gis system by the example of the jewish district ‘podzamcze’ in lublin (poland),” journal of cultural heritage 46 (2020): 328–34, https://doi.org/10.1016/j.culher.2020.07.010. 6 aleksandrs ivanovs and aleksey varfolomeyev, “service-oriented architecture of intelligent environment for historical records studies,” procedia computer science 104 (2017): 57–64, http://doi.org/10.1016/j.procs.2017.01.062; guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” journal of web semantics 6, no. 4 (2008): 243–49, https://doi.org/10.1016/j.websem.2008.08.001. 7 m kim et al., “inference on historical factions based on multi-layered network of historical figures,” expert systems with applications 161 (2020): 113703, http://doi.org/10.1016/j.eswa.2020.113703. 8 hobson lane, cole howard, hannes hapke, natural language processing in action: understanding, analyzing, and generating text with python (new york: manning publications, 2019), 165. 9 laurens van der maaten, eric postma, and jaap van den herik, “dimensionality reduction: a comparative review,” tilburg university technical report, ticc-tr 2009-005 (2009), https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_200 9.pdf. https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027 https://doi.org/ https://doi.org/10.1016/j.patcog.2018.04.001 https://doi.org/10.1016/j.patcog.2020.107503 https://doi.org/10.6017/ital.v24i4.3385 https://doi.org/10.1016/j.culher.2012.06.009 https://doi.org/10.1016/j.culher.2020.07.010 http://doi.org/10.1016/j.procs.2017.01.062 https://doi.org/10.1016/j.websem.2008.08.001 http://doi.org/ https://doi.org/10.1016/j.eswa.2020.113703 https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 23 10 gavin hackeling, mastering machine learning with scikit-learn (birmingham: packt publishing, 2017). 11 richard smiraglia, domain analysis for knowledge organization: tools for ontology extraction (oxford: chandos publishing, 2015). 12 kuo-chung chu, hsin-ke lu, and wen-i liu, “identifying emerging relationship in healthcare domain journals via citation network analysis,” information technology and libraries 37, no. 1 (2018): 39–51, https://doi.org/10.6017/ital.v37i1.9595. 13 archives of liaoning province in china, “the hetu dangse series archives publication,” qing history research 6, no. 2 (2009): 1. 14 amit kumar sharma, sandeep chaurasia, and devesh kumar srivastava, “sentimental short sentences classification by using cnn deep learning model with fine tuned word2vec,” procedia computer science 167 (2020): 1139–47, https://doi.org/10.1016/j.procs.2020.03.416. 15 b hongxi, “research on the sanling management institutions of the qing dynasty outside the pass,” manchu minority research 4, no. 12 (1997): 38–56. 16 guangli zhu et al., “building multi-subtopic bi-level network for micro-blog hot topic based on feature co-occurrence and semantic community division,” journal of network and computer applications 170 (2020): 102815, https://doi.org/10.1016/j.jnca.2020.102815. 17 s. ravikumar, ashutosh agrahari, and s. n. singh, “mapping the intellectual structure of scientometrics: a co-word analysis of the journal scientometrics (2005–2010),” scientometrics 102 (2015): 929–55, https://doi.org/10.1007/s11192-014-1402-8. 18 jiming hu and yin zhang, “research patterns and trends of recommendation system in china using co-word analysis,” information processing and management 51, no. 4 (2015): 329–39, https://doi.org/10.1016/j.ipm.2015.02.002. 19 nees jan van eck and ludo waltman, “software survey: vosviewer, a computer program for bibliometric mapping, scientometrics, 84, no. 2 (2010): 523–38, https://doi.org/10.1007/s11192-009-0146-3. 20 z yanchang and l xinzhu, “the study of the function of shengjing office from the use of the official communication — an academic investigation based on hetu dangse,” shanxi archives 8, no. 12 (2020): 179–88. 21 shengjing ministry of revenue, guangxu's great qing huidian volume 25 (zhonghua book company, 1991), 211–12. 22 f yonggong and g jialu, “brief introduction of shengjing upper three banners baoyi zuoling,” historical archives 9, no. 30 (1992): 93–7. 23 wangyue, “research on the yamens and their affair relationships in shengjing area,” shenyang palace museum journal 1, no. 31 (2011): 67–77. https://doi.org/10.6017/ital.v37i1.9595 https://doi.org/10.1016/j.procs.2020.03.416 https://doi.org/10.1016/j.jnca.2020.102815 https://doi.org/ https://doi.org/10.1007/s11192-014-1402-8 https://doi.org/10.1016/j.ipm.2015.02.002 https://doi.org/10.1007/s11192-009-0146-3 abstract introduction related technology definition sample data preprocessing and classification data preparation and preprocessing data cleaning label data results word vector conversion of text catalog data analysis of the relationship between the documents received and sent of the institution mining the relationship between institutions’ sending and receiving documents based on co-word clustering automatic classification method of historical archives catalog based on the svm model discussion functions of institutions analysis of the number of documents received and sent by institutions analysis of the frequency of documents received and sent by institutions institutional relationship analysis results of institutional connection analysis visualization of institutional network map results of automatic classification analysis conclusion acknowledgments funding statement data accessibility competing interests endnotes the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers 214 highlights of isad board meeting 197 4 annual meeting new york, new york monday, july 8, 1974 the meeting was called to order by president frederick kilgour at 4:45 p.m. the following were present: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, susan k. martin, ralph m. shoffner, donald p. hammer ( isad executive secretary), and berniece coulter, secretary, isad. guests-henriette d. avram, roberto esteves, stephen salmon, merry sue smoller, and ruth l. tighe. additions to the agenda. mrs. martin requested that the matter of commercial brochures being included in isad mailings be added to the agenda. midwinter minutes approved. motion. it was moved by paul fasana that the minutes of the isad 197 4 chicago midwinter meeting be approved. seconded by ralph shoffner. carried. introduction of new officers. mr. kilgour introduced to the board henriette avram, vice-president/president-elect, and ruth tighe, member-at-large of the isad board of directors, who would assume office at the close of the new york conference. policy concerning materials used in isad dissemination or displays. motion. it was moved by susan martin that the isad board establish a policy that only material produced by ala units or related professional organizations be included in its disseminations or displays. seconded by paul fasana. carried. video/cable section. mr. roberto esteves, chairman of the ala video/ cable ad hoc study committee, solicited the interest of and activity by the isad board in getting video i cable incorporated into the isad structure. he reported that his committee had considered three alternatives as to where video/ cable concerns could be situated within ala: ( 1) it could remain as a task force in srrt; (2) a separate round table on video/ 216 i ouj'nal of libmry automation vol. 7 i 3 september 197 4 that had been before the evaluation, when it was his belief that isa was a cunent awareness service. mr. fasana recommended that the executive secretary write isa a letter informing them that the board cannot consider becoming a sponsor at this time. asidic. mr. hammer informed the board that peter watson had talked with him about asidic liaison, and they had concluded that asidic is primarily interested in having an observer at isad board meetings. to accomplish this requires no action from the board. wednesday, july 10, 1974 the meeting was called to order at 4:40 p.m. by president frederick kilgour. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, susan k. martin, ralph m. shoffner, donald p. hammer (isad executive secretary), and berniece coulter, secretary, isad. committee chairmen-brian aveney, brett butler, helen schmierer, velma veneziano. guests-henriette d. avram, gerald lazorick, ruth l. tighe. sdi service for ala members. mr. lazorick (ohio state university mechanized information center) discussed the advantages to ala members if the osu selective dissemination of information service were available to them by subscription. the center would charge $50 per year for a profile, as opposed to the standard $300. the contract for sdi and retrospective searches (two services) would require ala to guarantee $17,000 per year ( $10,000 for sdi and $7,000 for retrospective searches). also, mr. fasana estimated that advertising and publicity costs might be as much as $5,000. mr. lazorick further explained that the printing of the necessary materials and the mailing would be handled by the center. ala would be responsible for advertising, marketing, and billing. it was relayed to the board that mr. wedgeworth did not feel that ala would profit enough for the amount it would have to pay for the service. he felt that osu could provide the service directly to individuals without the intervention of ala. mr. kilgour said that he felt a need to know that the money paid would indeed return to ala. the board had in the past expressed an interest in this type of service for ala members, and mr. kilgour asked if this feeling still existed. there was agreement among the board that it would be a desirable service for ala members. mr. kilgour stated that it will be necessary to: ( 1) determine the actual costs; ( 2) find the least expensive way of informing the members of this opportunity; and ( 3) obtain a commitment from the membership. he further said he would talk with mr. wedgeworth to see if agreement could highlights of meetings 215 cable could be formed; or ( 3) a section on video i cable could be formed in isad. the committee had favored an isad section. a round table might have more appeal to members, but would be outside the ala divisional and political structure. he further made known the desire of the committee for coordination of the forty-nine existing groups involved with audiovisual in ala. he noted that it would be possible to create a committee on a v within the isad section on video/ cable if there were interest in that approach to solving the problem. motion. mr. fasana moved that the isad board endorse in principle the ala video/cable ad hoc study committee's suggestion to create within isad a section devoted to video/ cable. seconded by susan martin. carried. misleading claims. mr. salmon indicated that some advertising over the last few years had appeared to be misleading, and that in some cases librarians' and libraries' names had been incorporated into advertising literature without their knowledge. two rtsd committees touch upon these problems as they relate to technical processing products and services: the bookdealer-library relations committee and the micropublishing projects committee. with adequate care, mr. salmon suggested, such a committee could be used by isad to ensure that its members are adequately informed. isad board members indicated an interest in and a need for a service of this nature, but reflected a hesitancy regarding the sensitivity of the issue. mr. kilgour asked that the matter be deferred until the wednesday board meeting, after the function statements of the two rtsd committees had been distributed to the board. isad historian. mr. hammer reviewed the action of the board previously in deciding to eliminate the history committee and appoint a historian if ala were going to publish a history of the association for the 1976 centennial celebration. since it has since been determined that ala will not produce such a publication, isad has no need to appoint a historian. isa (infonnation science abstmcts). mr. kilgour felt that there were now two obvious avenues open to the board at this time: ( 1) to pursue the evaluation of isa, or ( 2) to drop it altogether. mr. fasana said he believed that if ala were a sponsor, isa would abstract more library literature. because chemical information journals are among their sponsors, they cover chemical literature heavily. he felt that isad should look seriously into isa sponsorship, as there is nothing comparable to it in the united states. the isa board is interested, he said, because it would increase' their subscriptions and also increase their scope if they obtained subscriptions from ala members. mr. hammer explained that isad at one time had attempted to organize a subscription campaign, but the response was poor. mr. kilgour said that his reaction had been in favor of sponsorship, but highlights of meetings 217 be reached after additional points were discussed. we could then determine the answers to the three questions mentioned above. bylaws and organization committee. the newly appointed chairman of the isad bylaws and organization committee, helen schmierer, explained that she had found two versions of the isad bylaws extant, and there was a question as to which was current. minutes of the division did not reveal that any actual vote by the membership concerning various changes in the bylaws ever took place. she suggested that her committee use the original ( 1968) version of the bylaws as the basis on which to present all subsequent changes to the membership for a vote. she told the board she would have a new version ready by midwinter, and that it could then be published in ]ola and voted on at the san francisco annual conference. in answer to a question from ms. schmierer, mr. kilgour explained that it was the intent of the board that the bylaws and organization committees be combined, and the resulting committee should provide guidelines for each new committee established subsequently within isad. he also stated that it was necessary that a change be made in the present bylaws so that if a president did not complete a term, there would be a special election in order to elect another vice-president to take over the following year. for other charges to the bylaws and organization committee, mr. kilgour referred ms. schmierer to the minutes of the 197 4 midwinter meeting. telecommunications committee report. mr. kilgour annouced that david waite had resigned as chairman of the telecommunications committee and that he had appointed philip long as new chairman. mr. long presented a report of the committee (exhibit 1). he said that the areas of interest of the committee were networking, protocol, and standards. the following resolution was passed at their meeting: "that ala, via isad, join the committee of corporate telephone users ( cctu) and thus support the effort to combat the at&t attempt to adversely modify the current w ats tariff; should it not be legally or financially feasible for isad i ala to join cctu the committee will nonetheless attempt to follow and rep01t on this and related regulatory items." mr. kilgour called for a motion recommending that ala become a member of the committee of corporate telephone users, an organization to combat the cunent revision of the w ats tariff, providing money is available and no legal problems are connected with ala's so doing. however, several members of the board wanted further information as to what would be the position of ala with regard to the organization and in what sense would that position be an advantage to the members of ala. copies of the document produced by the cctu were also requested. mr. long said 218 journal of libmry automation vol. 7/3 september 1974 that he would contact a member of that committee in new york and get copies to the board members. mr. long requested that his committee be enlarged. mr. kilgour told him to appoint as many members as he needed. program planning committee report. chairman brett butler reported on the new orleans institute on networking which he felt was very successful both topically and financially. he said smaller libraries are beginning to consider automation, and therefore are sending staff to these institutes. he reported that $9,300 was received from registration fees, and expenses were approximately $6,100. in addition, $1,800 in expenses were paid by slice. mr. hammer will send a report to the board. mr. butler told of the committee's meeting in may in chicago. the minutes of that meeting, written by mr. hammer, had been approved by the committee and could be distributed to the board. he further related that the program at the new york annual conference had gone well, with approximately 400 in attendance. there were no plans for publication of the proceedings of the program, although it had been taped for sale by ala. mr. butler said the program planning committee desired liaison with each isad operating committee. they had appointed someone to tesla and hoped to do likewise with the telecommunications committee. at the suggestion of ms. avram, a serials institute has been planned for atlanta in october, preceding the asis meeting. josh smith (asis) and mr. hammer are the coordinators. mr. butler also announced that another institute on networking would be held in the spring in new orleans. with more advance publicity he felt there would be a greater response than the institute of march 197 4, which had an attendance of over 125. the 1975 institute will be a basic tutorial; james rizzolo is responsible for the content. plans for a series of cooperative programs with asis were laid out by the committee. this had been discussed with josh smith and had received his approval. mr. butler said he would prepare a statement which would describe the fiscal organization to be sent to the board for a mail vote. mr. kilgour expressed his opinion that with the new dues structure, the board must look at the financial gain involved in the institutes. in fact, any money-making venture must be considered at this time due to the dues structure change. plans for the cable tv preconference at san francisco ( 1975) were dropped. a program for san francisco would center around reactions to the document produced by the national commission on libraries and information science, the final draft of which is to be published in january 1975. this program is to be analytical in nature. mr. butler explained that there is possible cosponsorship interest. also at the san francisco annual conference, the office of intellectual highlights of meetings 219 freedom will cosponsor with isad a panel on various aspects of privacy and data file security. mr. butler announced that fifteen deans of library schools had attended that morning's meeting of the committee. there is interest in cosponsorship of continuing education programs, but nothing has been made definite at this point. the committee will explore this further. committee on representation in machine readable form of bibliographic information (marbi) report. (exhibit 2). velma veneziano requested that mr. fasana report to isad, as he had prepared a summary of the meeting for the rtsd board of directors. she asked mr. kilgour if the board would approve her writing the canadian library association to grant permission to send an official observer to marbi, as requested. mr. kilgour suggested that the letter definitely state that this representative would be a nonvoting participant. cola. (exhibit 3 ). thursday, july 11, 1974 the meeting was called to order by president frederick kilgour at 4:30p.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul p. fasana, susan k. martin, ralph m. shoffner, donald p. hammer ( isad executive secretary), and berniece coulter, secretary, isad. guests-henriette d. avram, william summers. report of lola editor. copies of the ]ola annual report were distributed to the board (exhibit 4). ms. martin requested board reaction to changes suggested by the isad editorial board: ( 1) incorporate the issn on the cover of the journal, and drop the coden; (2) change the color of the cover of lola for each volume, beginning with the march 1975 issue; and ( 3) consider changing the title of the journal, in the light of possible incorporation of information technologies into isad, to the journal of library technology (jolt). the consensus of the board was that: ( 1) coden should remain on the cover; ( 2) a change in cover stock was quite appropriate; and (3) ]ola is a long-established title, and should remain. committee reports. mr. kilgour suggested that committee reports to the board be discontinued to save time and that written reports be submitted in the future. motion. it was moved by ralph shoffner that all isad committee reports be submitted to the board in writing and that the chairman appear before the board only if the committee desired some board action, and that the board has previously received this request in writing. seconded by larry auld. carried. mr. kilgour suggested that committee appointments be sent to the board by carbon copies of letters rather than reported as an agenda item. 220 ]oumal of library automation vol. 7/3 september 1974 representative to ansi x-4. motion. it was moved by ralph shoffner that mr. hammer explore and obtain, if possible, ala representation to ansi x-4 committee and that the board conditionally appoint arthur brody to be that representative. seconded by larry auld. carried. committee on technical standards for library automation (tesla) report. (exhibit 5). motion. it was moved by paul fasana to turn over to helen schmierer, chairman of the bylaws and organization committee, the matter of a revised charge to tesla. seconded by susan martin. carried. membership survey committee report. the final report of the membership survey ad hoc committee was distributed to the board members. this completed the work of the committee and the committee was therefore disbanded. mr. william summers, a member of the committee, appeared before the board. he stated that they had the computer capability to run any data correlations desired by the board. mr. kilgour asked the board members to request any correlations they would want from don hammer by october 15. he will forward them to ms. pope by mid-november, and she will have the correlations ready by the midwinter meeting in 1975. the board noted that the survey showed that 25 percent of !sad members are library directors, and that the most frequent age is over fifty. the number of people belonging to !sad who have no contact with library automation was surprising to some. a significant number of !sad members responded to the questionnaire. misleading claims. it was the sense of the board that the establishment of a committee in !sad to investigate misleading claims be referred to the bylaws and organization committee. the chairman is to contact william north, the ala attorney, concerning legal implications, and also steve salmon, who had shown interest in these problems, should be approached concerning the chairmanship. general discussion. most of the discussion centered around the new dues structure of the association. there was a question of how funds would be distributed to the divisions from institutional membership dues. ms. martin said that she would send the board an analysis of the expenditures and income for ]ola. the need for cash capital should be considered for the continuing publication of the journal despite advertising fluctuations. mr. shoffner stated that he favored using lola funds to sponsor !sad institutes and that there is a need for more introductory and elementary education in the !sad institutes. participants in the institutes had shown interest in basic knowledge of automation in order to make decisions in their work even though not necessarily involved directly with automation. highlights of meetings 221 exhibit 1 telecommunications committee report progress report of activities to date: 1. committee decided to maintain an awareness of future possibilities of two-way cable for data transmission, but not to continue active role in broadcast cable area in view of ongoing work in the area elsewhere. 2. committee members extensively debated the directions to which its future efforts would be actively directed; these included education, network protocol standards, etc. 3. committee accepted reports from messrs. randel and long on current suppliers of bibliographic services via star networking, and on current ansi and eia (plus iso) standards activities related to present and future bibliographic data transmission. 4. committee resolved: to attempt to formulate methods for computer-to-computer interaction (protocols) by telecommunication links, such that a single terminal of arbitrary characteristics could access a variety of host services in a "user-transparent" fashion. 5. various members of the committee accepted assignments in gathering data and protocols in use in such networks as arpa, tym-share, ncic, etc. it was recognized that the membership of the committee must be enlarged and that more than two ala meeting forums yearly are needed for the task. recommendations for division board action: the committee moved and unanimously passed a resolution that ala, via !sad, join the committee of corporate telephone users (cctu) and thus support the effort to combat the at&t attempt to adversely modify the current wats tariff; should it not be legally or financially feasible for !sad/ ala to join cctu the committee will nonetheless attempt to follow and report on this and related regulatory items. exhibit 2 committee on representation in machine readable form of bibliographic information (marbi) report following is a summary of deliberations and actions of the committee: 1. jola editorial, vol. 7, no. 2, 1974. the chairperson was asked to send a letter to the editor correcting the erroneous/ ambiguous reference to marbi and its relationship (formal and otherwise) to lc, clr cembi, etc. 2. conser. the committee took note of and discussed recent developments of the conser project. the committee will review and comment on formal recommendations of conser affecting marc serials format when they are submitted through lc. 3. clrjnsf sponsored conference on national bibliographic control. formal "conclusions and recommendations" of the conference have been distributed. the committee decided to take note of this document, ask each member to comment on the substance, and to prepare a formal critique/reaction of the conclusions for clr. 4. character sets. a progress report (by h. avram) was presented of international activities. extended character sets for latin (i.e., roman), cyrillic, and greek have been agreed to by iso working group on character sets. a draft standard i's being prepared. further work is being done on character sets for mathematical symbols and african languages. 5. iso 2709, format structure for marc records. progress report given. no action taken. 6. content designators. a progress report on international activities was given (by 222 i oumal of library automation vol. 7 i 3 september 197 4 h. avram) as well as a summary of some working papers prepared to date. copies will be submitted to the marbi members. 7. iso filing standards. a progress report was given. discussion but no action. 8. authority record formats. copies of the lc proposal for "authorities: a marc format" were distributed. a description of the work in progress at lc was given. lc tentatively plans to initiate a service for authorities in machine-readable form in 1975. the service probably will include names new to lc with cross-references and names new to marc with cross-references. 9. microform experiment. lc representative described a com microform experiment currently being defined/ set up at lc. the experiment will focus on lcsh 8th ed. in com format. 10. isbd-serials. the first formal publication of isbd-s was available at this conference. it was decided that each member would review the document and send comments to mr. fasana by august 15. mr. fasana was instructed to prepare a summary of the comments supplied. 11. catalog code revision committee. the need to establish liaison and input to this committee was discussed. arrangements were made with the chairperson of the ccrc (j. byrum) to establish input and liaison between the two committees. exhibit 3 cola discussion group report the isad cola discussion group met on july 7, 1974. brian aveney, chairman, mentioned that the subject of merger of cola with the marc users' discussion group had been informally raised, and invited comment from any members of the groups. discussion centered around the time needs and a suggestion was made that cola and mudg meet back to back. further discussion was deferred for later informal contacts. the program divided into two different sessions. the first consisted of a series of independent presentations on library automation activities around the counhy. those who reported were: helena rivoire (bucknell university); ron miller and bill mathews (nelinet); ann ekstrom (oclc); richard de gennaro (university of pennsylvania); james sokoloski (university of massachusetts); james dolby (r&d associates); howard harris (university of chicago); and stephen silberstein (university of california, berkeley). the second half of the program consisted of a panel presentation about the use of microform catalogs in libraries. richard jensen (university of texas, permian basin) described the use during the last year of a divided microfiche catalog produced under contract by richard abel co. no other form of access to the collection is provided for public use. a brief questionnaire about patron response indicated no great difficulties in use. some complaints about readers and filing were noted. mary fischer (los angeles public library) discussed the transition to com fiche for internal reports, for reasons of cost. a variety of reports can now be distributed to all branches which formerly did not have access to this information except at the central library. james rizzolo (new york public library) mentioned the dance collection catalog now available on film. user response has been very positive, but the fact that this is the first time any form of catalog has been available is probably a large factor in this response. a com marc character set has been developed with a new york vendor for use in internal fiche files, and samples were made available to the group. highlights of meetings 223 exhibit 4 journal of library automation annual report this report covers the eighteen months between january 1973 and june 1974. during this period nine issues of ]ola appeared, from the june 1972 to the june 1974 issues. these issues contained thirty-nine articles and twenty-three book reviews. in addition, lola/technical communications was incorporated into the journal with the march 1973 issue. with volume 7 (1974), an editorial or guest editorial appears in each issue. in january 1973 the journal was eight months behind. ala's central production unit was to have taken over the technical editing with the 1973 volume, but due to the unforeseen delay in publication the staff was not familiar with the journal, or the printer. by march all the major problems had been sorted out, and the june 1972 issue was sent to the printer. at that time there was a bacldog of thirty-five manuscripts, of which twenty were eventually published, nine were rejected, and six are still pending (either sent back to the author for revision, or still in the process of locating or identifying the author). with volume 6 (1973), the contract for printing was given to the ovid bell press, inc. spencer-walker did not bid on a contract renewal. because of the increasing cost of paper and the narrower selection offered by paper manufacturers, the editorial board determined that at the same time it would be reasonable to change from use of permalife to another cheaper but acid-free stock. warren old english was selected; at the time (june 1973) it was $25.10 per hundredweight. since february 1973, ]ola has received fifty manuscripts for consideration: published 18 rejected 11 accepted 7 in review 4 pending 9 sent to tc editor 1 it is difficult to summarize the content of these nine issues. when categorized very broadly, the thirty-nine articles covered the following topics: aspects of cataloging 7 search keys and file structure 7 national automation and standards 7 isad topics 5 circulation 5 acquisitions 2 serials 2 information reh·ieval 2 administration 1 other 1 don bosseau continues, i am pleased to say, as editor of technical communications. peter simmons (university of british columbia) accepted the position of book review editor, and is also doing an excellent job. he reports that, in addition to the reviews already published, eleven reviews have been submitted and are awaiting publication, and six books are in the hands of reviewers. the central production unit has been of invaluable assistance in bringing the journal up to date, in negotiating with the post office on our behalf, and in continuing to provide technical editing support. lola is now completely up to date; i hope that we shall continue to improve the 224 journal of libmry automation vol. 7/3 september 197 4 standards for acceptance of articles, and that time will now permit us to examine the journal critically to determine where improvements could or should be made. exhibit 5 committee on technical standards for library automation report recommendations for division board action: i. nominate mr. arthur brody as isad representative to ansi-x4. 2. approve revised charge to tesla. the tesla met in three sessions. i. minutes of previous meeting. approved. 2. charge to the committee. the charge to the committee had been revised and the · reasons for each revision documented, and the revisions reviewed. it was voted that the charge as revised be approved by the isad board. 3. draft procedure. the tesla procedure for handling standards proposals was reviewed and the following changes recommended: a. proposal outline item viii be made optional. b. reactor ballot include three responses, e.g., for/ against-need for standard; for/against-specification of standard; yes/no-available to work on specification. these changes will be made and published in the next issue of jola-tc. 4. publication of materials relating to standards. the article describing the committee's procedures and role and outlining the standards organization potentially impacting libraries was published in jola. the committee discerned that standards exist which would be of importance to the library community and that these be identified and reviewed in terms of their impact on libraries. as a first step, a listing of those standards will be drawn up and, on review of the committee, published in jola-tc. 5. representative to ansi-x4. the ala is currently not represented on ansi-x4. the committee recommends that the isad board nominate mr. arthur brody as the ala representative to ansi-x4. 6. metrication. the current movement to metric measure may impact libraries. a subcommittee of ms. madeline henderson (chairperson) and dr. ed bowles was formed to develop a position paper on the impact of metrication. 7. standards program at san francisco. the committee will present a 1 ~-hour program on standards at the next annual convention, in san francisco. 8. open meeting. reactor ballot responses to the potential standards areas and a general review of the committee's activities were held in its third session. 9. next meeting. tentatively the committee will meet at the asis conference in atlanta. time and date to be announced. exhibit 6 isad/led committee on education for information science report discussion: directions of committee. need for visibility at ala and follow up to denver ( 1971) meeting. highlights of meetings 225 possible tutorial or institute topics-cosponsors. action: 1. plan program for san francisco 1975. speaker: ph.d. student from syracuse to design guidelines for module development panel: two to three modules presented reactors: discussion 2. work out subject outline based on questionnaire for distribution at san francisco for possible module development ready for committee approval by midwinter. recommendations for division board action: program slot for san francisco highlights: serious concern about lack of member participation. isad and led may want to reexamine purpose-need for committee and/ or reorganization. i i '! mobile website use and advanced researchers: understanding library users at a university marine sciences branch campus mary j. markland, hannah gascho rempel, and laurie bridges information technology and libraries | december 2017 7 abstract this exploratory study examined the use of the oregon state university libraries website via mobile devices by advanced researchers at an off-campus branch location. branch campus–affiliated faculty, staff, and graduate students were invited to participate in a survey to determine what their research behaviors are via mobile devices, including frequency of their mobile library website use and the tasks they were attempting to complete. findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. results of this survey will be used to address knowledge gaps around library resources and research tools and to generate more ways to study advanced researchers’ use of library services via mobile devices. introduction as use of mobile devices has expanded in the academic environment, so has the practice of gathering data from multiple sources about what mobile resources are and are not being used. this data informs the design decisions and resource investments libraries make in mobile tools. web analytics is one tool that allows researchers to discover which devices patrons use to access library webpages. but web analytics data do not show what patrons want to do and what hurdles they face when using the library website via a mobile device. web analytics also lacks nuance in that it cannot distinguish user characteristics, such as whether users are novice or advanced researchers, which may affect how these users interact with a mobile device. user surveys are another tool for gathering data on mobile behaviors. user surveys help overcome some of the limitations of web analytics data by directly asking users about their perceived research skills and the resources they use on a mobile device. as is the case at most libraries, oregon state university libraries serves a diverse range of users. we were interested in learning whether advanced researchers—particularly advanced researchers who work at a branch campus—use the library’s resources differently than main mary j. markland (mary.markland@oregonstate.edu), is head, guin library; hannah gascho rempel (hannah.rempel@oregonstate.edu) is science librarian and coordinator of graduate student success services; and laurie bridges (laurie.bridges@oregonstate.edu) is instruction and outreach librarian, oregon state university libraries and press. mailto:mary.markland@oregonstate.edu mailto:hannah.rempel@oregonstate.edu mailto:laurie.bridges@oregonstate.edu mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 8 campus users. we were chiefly interested in these advanced researchers because of the mobile nature of their work. they are graduate students and faculty in the field of marine science who work in a variety of locations, including their offices, labs, and in the field (which can include rivers, lakes, and the ocean). we focused on the use of the library website via mobile devices as one way to determine whether specific library services should be adapted to best meet the needs of this targeted user community. oregon state university (osu) is oregon’s land-grant university; its home campus is in corvallis, oregon. hatfield marine science center (hmsc) in newport is a branch campus that includes a branch library. guin library at hmsc serves osu students and faculty from across the osu colleges along with the co-located federal and state agencies of the national oceanic and atmospheric administration (noaa), us fish and wildlife service, environmental protection agency (epa), united states geological survey (usgs), united states department of agriculture (usda), and the oregon department of fish and wildlife. the guin library is in newport, which is forty-five miles from the main campus. like many other branch libraries, guin library was established at a time when providing a print collection close to where researchers and students work was paramount, but today it must adapt its services to meet the changing information needs of its user base. branch libraries are typically designed to serve a clientele or subject area, which can create a different institutional culture from the main library. guin library serves advanced undergraduates, graduate students, and scientific researchers. hmsc’s distance from corvallis, the small size of the researcher community, and the shared focus on a research area—marine sciences—create a distinct culture. while guin library is often referred to as the “heart of hmsc,” the number of in-person library users is decreasing. this decline is not unexpected as numerous studies have shown that faculty and graduate students have fewer needs that require an in-person trip to the library.1 studies have also shown that faculty and graduate students can be unaware of the services and resources that libraries provide, thereby continuing the cycle of underuse. 2 to learn more about the needs of hmsc’s advanced researchers, this exploratory study examined their research behaviors via mobile devices. the goals of this study were to • determine if and with what frequency advanced researchers at hmsc use the osu libraries website via mobile devices; • gather a list of tasks advanced users attempt to accomplish when they visit the osu libraries website on a mobile device; and • determine whether the mobile behaviors of these advanced researchers are different from those of researchers from the main osu campus (including undergraduate students), and if so, whether these differences warrant alternative modes of design or service delivery. information technology and libraries | december 2017 9 literature review the conversation about how best to design mobile library websites has shifted over the past decade. early in the mobile-adoption process some libraries focused on creating special websites or apps that worked with mobile devices.3 while libraries globally might still be creating mobilespecific websites and apps,4 us libraries are trending toward responsively designed websites as a more user-friendly option and a simpler solution for most libraries with limited staff and budgets.5 most of the literature on mobile-device use in higher education is focused on undergraduates across a wide range of majors who are using a standard academic library. 6 to help provide context for how libraries have designed their websites for mobile users, some of those specific findings will be shared later. but because our study focused on graduate students and faculty in a sciencefocused branch library, we will begin with a discussion of what is known about more advanced researchers’ use of library services and their mobile-device habits. several themes emerged from the literature on graduate students’ relationships with libraries. in an ironic twist, faculty think graduate students are being assisted by the library while librarians think faculty are providing graduate students with the help they need to be successful.7 this results in many graduate students end up using their library’s resources in an entirely disintermediated way. graduate students, especially those in the sciences, visit the physical library less often and use online resources more than undergraduate students.8 most graduate students start their research process with assistance from academic staff, such as advisors and committee members,9 and are unaware of many library services and resources.10 as frequent virtual-library users who receive little guidance on how to use the library’s tools, graduate students need a library website that is clear in scope and purpose, offers help, and has targeted services. 11 compared to reports on undergraduate use of mobile devices to access their library’s website, relatively few studies have focused on graduate-student or faculty mobile behaviors. a recent survey of japanese library and information science (lis) students compared and undergraduate graduate students’ usage of mobile devices to access library services and found slight differences. however, both groups reported accessing libraries as last on their list of preferred smartphone uses.12 aharony examined the mobile use behaviors of israeli lis graduate students and found approximately half of these graduate students used smartphones and perceived them to be useful and easy tools for use in their everyday life, and could transfer those habits to library searching behaviors.13 when looking specifically at how patrons use library services via a mobile device, rempel and bridges found the top reason graduate students at their main campus used the osu libraries website via mobile devices was to find information on library hours, followed by finding a book and researching a topic.14 barnett-ellis and vann surveyed their small university and found that both undergraduate and graduate students were more than twice as likely to use mobile devices as are their faculty and staff; a majority of students also indicated they were likely to use mobile devices to conduct research.15 finally, survey results showed graduate students in hofstra university’s college of education reported accessing library materials via a mobile device twice as often as other student groups. in addition, these graduate students reported being comfortabl e mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 10 reading articles up to five pages long on their mobile devices. graduate students were also more likely to be at home when using their mobile device to access the library, a finding the authors attributed to education graduate students frequently being employed as full-time teachers.16 research on how faculty members use library resources characterizes a population that is confident in their literature-searching skills, prefers to search on their own, and has little direct contact with the library.17 faculty researchers highly value convenience;18 they rely primarily on electronic access to journal articles but prefer print access to monographs.19 faculty tend to be self-trained at using search tools, such as pubmed or other online databases, and therefore are not always aware of the more in-depth functionality of these tools.20 in contrast to graduate students, rempel and bridges found that faculty using the library website via mobile devices were less interested in information about the physical library, such as library hours, and were more likely to be researching a topic.21 medical faculty are one of the few faculty groups whose mobile-research behaviors have been specifically examined. a survey administered by bushhousen et al. at a medical university revealed that a third of respondents used mobile apps for research-related activities.22 findings by boruff and storie indicate that one of the biggest barriers to mobile use in health-related academic settings was wireless access.23 thus apps that did not require the user to be connected to the internet were highly desired. faculty and graduate students in health-related academic settings saw a role for the library in advocating for better wireless infrastructure, providing access to a targeted set of heavily used resources, and providing online guides or in-person tutorials on mobile apps or procedures specific to their institution. 24 according to the literature, most design decisions for library mobile sites have been made on the basis of information collected about undergraduate students’ behavior at main-branch campuses. to help inform our understanding of how recent decisions have been made, the remainder of the literature review focuses on what is known about undergraduate students’ mobile behavior. undergraduate students are very comfortable using mobile technologies and perceive themselves to be skilled with these devices. according to the 2015 educause center for research and analysis’ (ecar) study of undergraduate students and information technology, most undergraduate students consider themselves sophisticated technology users who are engaged with information technologies.25 undergraduate students mainly use their smartphones for nonclass activities. but students indicate they could be more effective technology users if they were more skilled at tools such as the learning management system, online collaboration tools, e-books, or laptops and smartphones in class. of interest to libraries is the ecar participants’ top area of reported interest, “search tools to find reference or other information online for class work.”26 however, when a mobile library site is in place, usage rates have been found to be lower than anticipated. in a study of undergraduate science students, salisbury et al. found only 2 percent of respondents reported using their cell phones to access library databases or the library’s catalog every hour or daily, despite 66 percent of the students browsing the internet using their mobile information technology and libraries | december 2017 11 phone hourly or daily. salisbury et al. speculated that users need to be told about mobileoptimized library resources if libraries want to increase usage. 27 rempel and bridges used a pop-up interrupt survey while users were accessing the osu libraries mobile site.28 this approach allowed a larger cross-section of library users to be surveyed. it also reduced memory errors by capturing their activities in real time. activities that had been included in the mobile site because of their perceived usefulness in a mobile environment, such as directions, asking a librarian a question, and the coffee shop webcam, were rarely cited as a reason for visiting the mobile site. the osu libraries branch at hmsc is entering a new era. a marine studies initiative will result in the building of a new multidisciplinary research campus at hmsc that aims to serve five hundred undergraduate students. the change in demographics and the increase in students who will need to be served has prompted guin library staff to explore how the current population of advanced researchers interact with library resources. in addition, examining the ways undergraduate students at the main campus use these tools will help with planning for the upcoming changes in the user community. methods this study used an online qualtrics survey to gather information about how frequently advanced researchers (graduate students, faculty, and affiliated scientists at a branch library for marine science) use the osu libraries website via mobile devices, what they search for, and other ways they use mobile devices to support their research behaviors. a recruitment email with a link to the survey was sent to three discussion lists used by hmsc community in spring 2016. the survey was available for four weeks, and a reminder email was sent one week before the survey closed. the invitation email included a link to an informedconsent document. once the consent document had been reviewed, users were taken to the survey via a second link. respondents could provide an email address to receive a three-dollar coffee card for participating in the study, but their email address was recorded in a separate survey location to preserve their anonymity. the invitation email indicated that this survey was about using the website via a mobile device, and the first survey question asked users if they had ever accessed the library website on a mobile device. if they answered “no,” they were immediately taken to the end of the survey and were not recorded as a participant in the study. a similar survey was conducted with users from osu’s main campus in 2012–13 and again in 2015. the results from 2012–13 have been published previously,29 but the results from 2015 have not. while the focus of the present study is on the mobile behaviors of advanced researchers in the hmsc community, data from the 2015 main-campus study is used to provide a comparison to the broader osu community. osu main-campus respondents in 2015 and hmsc participants in 2016 both answered closedand open-ended questions that explored participants’ general mobiledevice behaviors and behaviors specific to using the osu libraries website via mobile devices. mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 12 however, the hmsc survey also asked questions about behaviors related to using the osu (nonlibrary) website via a mobile device and participants’ mobile scholarly reading and writing behaviors. the survey concluded with several demographic questions. the survey data was analyzed using qualtrics’ cross-tab functionality and microsoft excel to observe trends and potential differences between user groups. open-ended responses were examined for common themes. twenty-three members of the hmsc community completed the survey, whereas one hundred participants responded to the 2015 main campus survey. participation in the 2015 survey was capped at one hundred respondents because limited incentives were available. the participation difference between the two surveys reflects several differences between the two sampled communities. the most obvious difference is size. the osu community comprises more than thirty-six thousand students, faculty, and staff; the hmsc community is approximately five hundred students, researchers, and faculty—some of whom are also included as part of the larger osu community. the second factor influencing response rates relates to the difference in size between the two communities, but is more striking in the hmsc community: the survey relied on a self-selected group of users who indicated they had a history using the library website via a mobile device. therefore, it is not possible to estimate the population size of mobile-device library-website users specific to the branch library or the main campus library. this limitation means that the results from this study cannot be used to generalize findings to all users who visit a library website via mobile devices; instead the results are intended to present a case that other libraries may compare with behaviors observed on their own campuses. sharing the behaviors of advanced researchers at a branch campus is particularly valuable as this population has historically been understudied. results and discussion participant demographics and devices used of the twenty-three respondents to the hmsc mobile behaviors survey, 13 (62 percent) were graduate students, 7 (34 percent) were faculty (this category includes faculty researchers and courtesy faculty), and one respondent was an noaa employee. two participants declined to declare their affiliation. of the 97 respondents to the 2015 osu main-campus survey who shared their affiliation, 16 (16 percent) were graduate students, 5 (5 percent) were faculty members, and 69 (71 percent) were undergraduates. respondents varied in the types of mobile devices they used when doing library research. smartphones were used by 78 percent (18 respondents) and 22 percent (5 respondents) used a tablet. apple (15 respondents) was the most common device brand used, although six of the respondents used an android phone or tablet. compared to the general population’s device ownership, these respondents are more likely to own apple devices, but the two major device types owned (apple and android) match market trends.30 information technology and libraries | december 2017 13 frequency of library site use on mobile devices most of the hmsc respondents are infrequent users of the library website via mobile devices: 50 percent (11 respondents) did so less than once a month; 41 percent (9 respondents) did so at least once a month; and 9 percent (2 respondents) did so at least once a week. the low level of library website usage via mobile devices was especially notable as this population reports being heavy users of the library website via laptops or desktop computers, with 82 percent (18 respondents) visiting the library website via those tools at least once a week. researchers at hmsc used the library website via mobile devices much less often than the 2015 main-campus respondents (undergraduates, graduate students, and faculty). no hmsc respondents visited the mobile site daily compared to 10 percent of main-campus users, and only 9 percent of hmsc respondents visited weekly compared to 28 percent of main-campus users (see figure 1). figure 1. 2016 hmsc participants vs. 2015 osu main-campus participants reported frequency of library website visits via a mobile device by percent of responses. while hmsc advanced researchers share some mobile behaviors with main-campus students, this exploratory study demonstrates they do not use the library website via mobile devices as frequently. some possible reasons for this are researchers rarely spend time coming and going to and from classes and therefore do not have small gaps of time to fill throughout their day. instead, their daily schedule involves being in the field or in the lab collecting and analyzing data. 0% 10% 20% 30% 40% 50% 60% this is my first time less often than once a month at least once a month at least once a week every day or almost every day branch 2016 main 2015 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 14 alternatively, they are frequently involved in writing-intensive projects such as drafting journal articles or grant proposals. they carve out specific periods to do research and do not appear to be filling time with short bursts of literature searching. they can work on laptops and do not need to multitask on a phone or tablet between classes or in other situations. mobile-device ownership among hmsc graduate students might also be limited because of personal budgets that do not allow for owning multiple mobile devices or for having the most recent model. in addition, this group of scientists may not be on the front edge of personal technologies, especially compared to medical researchers, because few mobile apps are designed specifically for the research needs of marine scientists. where researchers are when using mobile devices for library tasks because mobile devices facilitate connecting to resources from many locations, and because advanced researchers conduct research in a range of settings—including the field, the office, and home—we asked respondents where they were most likely to use the library website via a mobile device. thirty-two percent were most likely to be at home, 27 percent in transit; 18 percent at work; and 9 percent in the field. the popularity of using the library website via mobile devices while in transit was somewhat unexpected, but perhaps should not have been because many people try to maximize their travel time by multitasking on mobile devices. the distance from the main campus might explain this finding because a local bus service provides an easy way to travel to and from the main campus, and the hour-long trip would provide opportunities for multitasking via a mobile device. relatively few respondents used mobile devices to access the library website while at work. previous studies show that a lack of reliable campus wireless internet access can affect students’ ability to use mobile technology.31 hmsc also struggles to provide consistent wireless access, and signals are spotty in many areas of our campus. despite signal boosters in guin library, wireless access is still limited at times. in addition, cell phone service is equally spotty both at hmsc and up and down the coast of oregon. it is much less frustrating to work on a device that has a wired connection to the internet while at hmsc. these respondents did use mobile devices while at home, which might indicate they had a better wireless signal there. alternatively, working from home on a mobile device might indicate that they compartmentalize their library-research time as an activity to do at home instead of in the office. researchers used their mobile devices to access the library while in the field less than originally expected, but upon further reflection, it made sense that researchers would be less likely to use library resources during periods of data collection for oceanic or other water-based research projects because of their focused involvement during that stage. the water-based research also increases the risk of losing mobile devices. library resources accessed via mobile devices information technology and libraries | december 2017 15 to learn more about how these respondents used the library website, we asked them to choose what they were searching for from a list of options. respondents could choose as many options as applied to their searching behaviors. hmsc respondents’ primary reason for visiting the library’s site via a mobile device was to find a specific source: 68 percent looked for an article, 45 percent for a journal, 36 percent for a book, and 14 percent for a thesis. many of the hmsc respondents also looked for procedural or library-specific information: 36 percent looked for hours, 32 percent for my account information, 18 percent for interlibrary loan, 14 percent for contact information, 9 percent for how to borrow and request books, 9 percent for workshop information, and 9 percent for oregon estuaries bibliographies—a unique resource provided by the hmsc library. fifty-five percent of searches were for a specific source and 43 percent were for procedural or libraryspecific information. notably missing from this list were respondents who reported searching via their mobile device for directions to the library. compared to the 2015 osu libraries main-campus survey respondents, hmsc respondents were much more likely to visit the library website via a mobile device to look for an article (68 percent vs. 37 percent), find a journal (45 percent vs. 23 percent), access my account information (32 percent vs. 7 percent), use interlibrary loan (18 percent vs. 5 percent), or find contact information (14 percent vs. 1 percent). however, unlike hmsc participants, who do not have access to course reserves at the branch library, 7 percent of osu main-campus respondents used their mobile devices to find course reserves on the library website. see figure 2. 0% 10% 20% 30% 40% 50% 60% 70% directions contact information interlibrary loan course reserves my account a journal a book library hours an article branch 2016 main 2015 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 16 figure 2. 2016 hmsc vs. 2015 osu main-campus participants reported searches while visiting the library website via a mobile device by percent of responses. it is possible that hmsc users with different affiliations might use the library site via a mobile device differently. these exploratory findings show that graduate students used the greatest variety of content via mobile devices. graduate students as a group reported using 11 of the 14 provided content choices via a mobile device while faculty reported using 8 of the 14. graduate students were the largest group (62 percent of respondents), which might explain why as a group they searched for more types of content via mobile devices. interestingly, faculty members and faculty researchers reported looking for a thesis via a mobile device, but no graduate students did. perhaps these graduate students had not yet learned about the usefulness of referencing past theses as a starting point for their own thesis writing. or perhaps they were only familiar with searching for journal articles on a topic. in contrast, faculty members might have been searching for specific theses for which they had provided advising or mentoring support. to help us make decisions about how to best direct users to library content via mobile devices, we asked respondents to indicate their searching behaviors and preferences. of the 16 hmsc respondents who answered this question, 12 (75 percent) used our web-scale discovery search box via mobile devices; 4 (25 percent) reported that they did. presumably these latter searchers were navigating to another database to find their sources. of 16 respondents, only 6 (38 percent) indicated that they looked for a specific library database (as opposed to the discovery tool) when using a mobile device. those respondents who were looking for a database tended to be looking for the web of science database, which makes sense for their field of study. when conducting searches for sources on their mobile devices, hmsc respondents employed a variety of search strategies: the 12 respondents who replied used a combination of author (75 percent), journal title (67 percent, keyword (67 percent), and book title (50 percent) searches when starting at the mobile version of the discovery tool. when asked about their preferred way to find sources, a majority of hmsc respondents reported that they tended to prefer a combination of searching and menu navigation while using the library website from mobile devices, while the remainder were evenly divided between preferring menu driven and search-driven discovery. while osu libraries does not currently provide links to any specific apps for source discovery, such as pubmed mobile or jstor browser, 13 (62 percent) of the hmsc respondents indicated they would be somewhat or very likely to use an app to access and use library services. this finding connects to the issue of reliable wireless access. medical graduate students had a wider array of apps available to them, but the primary reason they wanted to use these apps was because they provided a better searching experience in hospitals that had intermittent wireless access—an experience to which researchers at hmsc could relate.32 university website use behaviors on mobile devices to help situate respondents’ library use behaviors on mobile devices in comparison to the way they use other academic resources on mobile devices, we asked hmsc respondents to describe information technology and libraries | december 2017 17 their visits to resources on the osu (nonlibrary) website via mobile devices. compared to their use of the library site on a mobile device, respondents’ use of university services was higher: 43 percent (9 respondents) visited the university’s website via a mobile device at least once a week compared to only 9 percent (2 respondents) who visited the library site with that frequency. this makes sense because of the integral function many of these university services play in most university employees’ regular workflow. respondents indicated visiting key university sites including myosu (a portal webpage, visited by 60 percent of respondents), the hmsc webpage (55 percent), canvas (the university’s learning management system, visited by 50 percent of respondents), and webmail (45 percent). see figure 3. figure 3. university webpages hmsc respondents access on a mobile device by percent of responses. university resources such as campus maps, parking locations, and the graduate school website were frequently used by this population. the use of the first two makes sense as hmsc users are located off-site and need to use maps and parking guidance when they visit the main campus. the use of the graduate school website makes sense because the respondents were primarily graduate students and graduate school guidelines are a necessary source of information. interestingly, our advanced users are similar to undergraduates in that they primarily read email, information from social networking sites, and news on their mobile devices. 33 other research behaviors on mobile devices mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 18 we wanted to know what other research-related behaviors the hmsc respondents are engaged in via mobile devices to determine if there might be additional ways to support researchers’ workflows. we specifically asked about respondents’ reading, writing, and note-taking behaviors to learn how well these respondents have integrated them with their mobile usage behaviors. all respondents reported reading on their mobile device (see figure 4). email represented the most common reading activity (95 percent), followed by “quick reading” activities, such as reading social networking posts (81 percent), current news (81 percent), and blog posts (62 percent). smaller numbers used their mobile devices for academic or long-form reading, such as reading scholarly articles (33 percent) or books (19 percent). of those respondents who read articles and books on their mobile devices, only respondents highlighted or took notes using their mobile device. seven respondents used a citation manager on their mobile device: three used endnote, one used mendeley, one used pages, and one used zotero. one respondent used evernote on their mobile device, and one advanced user reported using specific data and database management software, websites, and apps related to their projects. more advanced and interactive mobilereading features, such as online spatial landmarks, might be needed before reading scholarly articles on mobile devices becomes more common.34 figure 4. what hmsc respondents reported reading on a mobile device by percent of responses. limitations this exploratory study had several limitations, most of which reflect the nature of doing research with a small population at a branch campus. this study had a small sample size, which limited observations of this population; however, future studies could use research techniques such as interviews or ethnographic studies to gather deep qualitative information about mobile-use 19% 33% 62% 81% 81% 95% 0% 20% 40% 60% 80% 100% 120% books academic or scholarly articles blog posts current news social networking posts (facebook, twitter, etc.) email percent of responses information technology and libraries | december 2017 19 behaviors in this population. a second limitation was that previous studies of the osu libraries mobile website used google analytics to compare survey results with what users were actually doing on the library website. unfortunately, this was not possible for this study. because of how hmsc’s network was set up, anyone at hmsc using the osu internet connections is assigned an ip address that shows a corvallis, oregon, location rather than a newport, oregon, location, which rendered parsing hmsc-specific users in google analytics impossible. the research behaviors of advanced researchers at a branch campus has not been well-examined; despite its limitations, this study provides beneficial insights into the behaviors of this user population. conclusion focusing on how advanced researchers at a branch campus use mobile devices while accessing library and other campus information provides a snapshot of key trends among this user group. these exploratory findings show that these advanced researchers are infrequent users of library resources via mobile devices and, contrary to our initial expectations, are not using mobile devices as a research resource while conducting field-based research. findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. the results of this survey will be used to address the hmsc knowledge gaps around use of library resources and research tools via mobile devices. both graduate students and faculty lack awareness of library resources and services and have unsophisticated library research skills. 35 while the osu main campus has library workshops for graduate students and faculty, these workshops have been inconsistently duplicated at the guin library. because the people working at hmsc come from such a wide variety of departments across osu that focus on marine sciences, hmsc has never had a library orientation. the results indicate possible value in devising ways to promote guin library’s resources and services locally, which could include highlighting the availability of mobile library access. while several participants mentioned using research tools like evernote, pages, or zotero on their mobile devices, most participants did not report enhancing their mobile research experience with these mobile-friendly tools. workshops specifically modeling how to use mobile-friendly tools and apps such as dropbox, evernote, goodreader, or browzine could help introduce the benefits of these tools to these advanced researchers. because wireless access is even more of a concern for researchers at this branch location than for researchers at the main campus, database-specific apps will be explored to determine if the use of searching apps could help alleviate inconsistent wireless access. if database apps that are appropriate for marine science researchers are available, these will be promoted to this user population. future research might involve follow-up interviews or focus groups, ethnographic studies, or interviews, which could expand the knowledge of these researchers’ mobile-device behaviors and mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 20 their perceptions of mobile devices. exploring the technology usage by these advanced researchers in their labs, including electronic lab notebooks or other tools, might be an interesting contrast to their use of mobile devices. in addition, as the hmsc campus grows with the expansion of the marine studies initiative, increasing numbers of undergraduates will use guin library. the ecar 2015 statistics show that current undergraduates own multiple internet-capable devices.36 presumably, these hmsc undergraduates will be likely to follow the trends seen in the ecar data. certainly, the plans to expand hmsc’s internet and wireless infrastructure will affect all its users. our mobile survey gave us insights into how a sample of the hmsc population uses the library’s resources and services. these observations will allow guin library to expand its services for the hmsc campus. we encourage other librarians to explore their unique user populations when evaluating services and resources. references 1 maria anna jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies,” journal of academic librarianship 30, no. 1 (2004): 51–66, https://doi.org/10.1016/j.jal.2003.11.007; pali u. kuruppu and anne marie gruber, “understanding the information needs of academic scholars in agricultural and biological sciences,” journal of academic librarianship 32, no. 6 (2006): 609–23; lotta haglund and per olsson, “the impact on university libraries of changes in information behavior among academic researchers: a multiple case study,” journal of academic librarianship 34, no. 1 (2008): 52–59, https://doi.org/10.1016/j.acalib.2007.11.010; nirmala gunapala, “meeting the needs of the ‘invisible university’: identifying information needs of postdoctoral scholars in the sciences,” issues in science and technology librarianship, no. 77 (summer 2014), https://doi.org/10.5062/f4b8563p. 2 tina chrzastowski and lura joseph, “surveying graduate and professional students’ perspectives on library services, facilities and collections at the university of illinois at urbanachampaign: does subject discipline continue to influence library use?,” issues in science and technology librarianship no. 45 (winter 2006), https://doi.org/10.5062/f4dz068j; kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences”; haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers.” 3 ellyssa kroski, “on the move with the mobile web: libraries and mobile technologies,” library technology reports 44, no. 5 (2008): 1–48, https://doi.org/10.5860/ltr.44n5. 4 paula torres-pérez, eva méndez-rodríguez, and enrique orduna-malea, “mobile web adoption in top ranked university libraries: a preliminary study,” journal of academic librarianship 42, no. 4 (2016): 329–39, https://doi.org/10.1016/j.acalib.2016.05.011. 5 david j. comeaux, “web design trends in academic libraries—a longitudinal study,” journal of web librarianship 11, no. 1 (2017), 1–15, https://doi.org/10.1080/19322909.2016.1230031; https://doi.org/10.1016/j.jal.2003.11.007 https://doi.org/10.1016/j.acalib.2007.11.010 https://doi.org/10.5062/f4b8563p https://doi.org/10.5062/f4dz068j https://doi.org/10.5860/ltr.44n5 https://doi.org/10.1016/j.acalib.2016.05.011 https://doi.org/10.1080/19322909.2016.1230031 information technology and libraries | december 2017 21 zebulin evelhoch, “mobile web site ease of use: an analysis of orbis cascade alliance member web sites,” journal of web librarianship 10, no. 2 (2016): 101–23, https://doi.org/10.1080/19322909.2016.1167649. 6 barbara blummer and jeffrey m. kenton, “academic libraries’ mobile initiatives and research from 2010 to the present: identifying themes in the literature,” in handbook of research on mobile devices and applications in higher education settings, ed. laura briz-ponce, juan juanesméndez, and josé francisco garcía-peñalvo (hershey, pa: igi global, 2016), 118–39. 7 jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies.” 8 chrzastowski and joseph, “surveying graduate and professional students’ perspectives on library services, facilities and collections at the university of illinois at urbana-champaign.” 9 carole a. george et al., “scholarly use of information: graduate students’ information seeking behaviour,” information research 11, no. 4 (2006), http://www.informationr.net/ir/114/paper272.html. 10 kristin hoffman et al., “library research skills: a needs assessment for graduate student workshops,” issues in science and technology librarianship 53 (winter-spring 2008), https://doi.org/10.5062/f48p5xfc; hannah gascho rempel and jeanne davidson, “providing information literacy instruction to graduate students through literature review workshops,” issues in science and technology librarianship 53 (winter-spring 2008), https://doi.org/10.5062/f44x55rg. 11 jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies.” 12 ka po lau et al., “educational usage of mobile devices: differences between postgraduate and undergraduate students,” journal of academic librarianship 43, no. 3 (may 2017), 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. 13 noa aharony, “mobile libraries: librarians’ and students’ perspectives,” college & research libraries 75, no. 2 (2014): 202–17, https://doi.org/10.5860/crl12-415. 14 hannah gashco rempel and laurie m. bridges, “that was then, this is now: replacing the mobile-optimized site with responsive design,” information technology and libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. 15 paula barnett-ellis and charlcie pettway vann, “the library right there in my hand: determining user needs for mobile services at a medium-sized regional university,” southeastern librarian 62, no. 2 (2014): 10–15. https://doi.org/10.1080/19322909.2016.1167649 http://www.informationr.net/ir/11-4/paper272.html http://www.informationr.net/ir/11-4/paper272.html https://doi.org/10.5062/f48p5xfc https://doi.org/10.5062/f44x55rg https://doi.org/10.1016/j.acalib.2017.03.004 https://doi.org/10.5860/crl12-415 https://doi.org/10.6017/ital.v32i4.4636 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 22 16 william t. caniano and amy catalano, “academic libraries and mobile devices: user and reader preferences,” reference librarian 55, no. 4 (2014), 298–317, https://doi.org/10.1080/02763877.2014.929910. 17 haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers.” 18 kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences.” 19 christine wolff, alisa b. rod, and roger c. schonfeld, “ithaka s+r us faculty survey 2015,” ithaka s+r, april 4, 2016, http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey2015/. 20 m. macedo-rouet et al., “how do scientists select articles in the pubmed database? an empirical study of criteria and strategies,” revue européenne de psychologie appliquée/european review of applied psychology 62, no. 2 (2012): 63–72. 21 rempel and bridges, “that was then, this is now.” 22 ellie bushhousen et al., “smartphone use at a university health science center,” medical reference services quarterly 32, no. 1 (2013): 52–72, https://doi.org/10.1080/02763869.2013.749134. 23 jill t. boruff and dale storie, “mobile devices in medicine: a survey of how medical students, residents, and faculty use smartphones and other mobile devices to find information,” journal of the medical library association 102, no. 1 (2014): 22–30, https://doi.org/10.3163/15365050.102.1.006. 24 bushhousen et al., “smartphone use at a university health science center”; boruff and storie, “mobile devices in medicine.” 25 eden dahlstrom et al., “ecar study of students and information technology, 2015 ," research report, educause center for analysis and research, 2015, https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en. 26 ibid., 24. 27 lutishoor salisbury, jozef laincz, and jeremy j. smith, “science and technology undergraduate students’ use of the internet, cell phones and social networking sites to access library information,” issues in science and technology librarianship 69 (spring 2012), https://doi.org/10.5062/f4sb43pd. 28 rempel and bridges, “that was then, this is now.” 29 ibid. https://doi.org/10.1080/02763877.2014.929910 http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.1080/02763869.2013.749134 https://doi.org/10.3163/1536-5050.102.1.006 https://doi.org/10.3163/1536-5050.102.1.006 https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en https://doi.org/10.5062/f4sb43pd information technology and libraries | december 2017 23 30 “mobile/tablet operating system market share,” netmarketshare, march 2017, https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1. 31 boruff and storie, “mobile devices in medicine”; patrick lo et al., “use of smartphones by art and design students for accessing library services and learning,” library hi tech 34, no. 2 (2016): 224–38, https://doi.org/10.1108/lht-02-2016-0015. 32 boruff and storie, “mobile devices in medicine.” 33 dahlstrom et al., “ecar study of students and information technology, 2015.” 34 caroline myrberg and ninna wiberg, “screen vs. paper: what is the difference for reading and learning?” insights 28, no. 2 (2015): 49–54, https://doi.org/10.1629/uksg.236. 35 barnett-ellis and vann, “the library right there in my hand”; haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers”; hoffman et al., “library research skills”; kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences”; lau et al., “educational usage of mobile devices”; macedo-rouet et al., “how do scientists select articles in the pubmed database?” 36 dahlstrom et al., “ecar study of students and information technology, 2015.” https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1 https://doi.org/10.1108/lht-02-2016-0015 https://doi.org/10.1629/uksg.236 abstract introduction literature review methods results and discussion participant demographics and devices used frequency of library site use on mobile devices where researchers are when using mobile devices for library tasks library resources accessed via mobile devices university website use behaviors on mobile devices other research behaviors on mobile devices limitations conclusion references 8 information technology and libraries | march 2010 t. michael silver monitoring network and service availability with open-source software silver describes the implementation of a monitoring system using an open-source software package to improve the availability of services and reduce the response time when troubles occur. he provides a brief overview of the literature available on monitoring library systems, and then describes the implementation of nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. particular attention is paid to using the plug-in architecture to monitor library services effectively. the author includes example displays and configuration files. editor’s note: this article is the winner of the lita/ex libris writing award, 2009. l ibrary it departments have an obligation to provide reliable services both during and after normal business hours. the it industry has developed guidelines for the management of it services, but the library community has been slow to adopt these practices. the delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in it best practices, and a focus on automation systems instead of infrastructure. larger systems that employ dedicated it professionals to manage the organization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. in the practice of system and network administration, thomas a. limoncelli, christine j. hogan, and strata r. chalup present a comprehensive look at best practices in managing systems and networks. early in the book they provide a short list of first steps toward improving it services, one of which is the implementation of some form of monitoring. they point out that without monitoring, systems can be down for extended periods before administrators notice or users report the problem.1 they dedicate an entire chapter to monitoring services. in it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which provides long-term data on uptime, use, and performance.2 while the software discussed in this article provides both types of monitoring, i focus on real-time monitoring and the value of problem identification and notification. service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. an article in the september 2008 issue of ital describes the development and deployment of a wireless network, including a perl script written to monitor the wireless network and associated services.3 the script updates a webpage to display the results and sends an e-mail notifying staff of problems. an enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. it would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. dave pattern at the university of huddersfield shared another perl script that monitors opac functionality.4 again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. below, i discuss how i modified his script to provide more meaningful monitoring of our opac than the stock webpage monitoring plug-in included with our opensource networks monitoring system, nagios. service monitoring can consist of a variety of tests. in its simplest form, a ping test will verify that a host (server or device) is powered on and successfully connected to the network. feher and sondag used ping tests to monitor the availability of the routers and access points on their network, as do i for monitoring connectivity to remote locations.5 a slightly more meaningful check would test for the establishment of a connection on a port. feher and sondag used this method to check the daemons in their network.6 the step further would be to evaluate a service response, for example checking the status code returned by a web server. evaluating content forms the next level of meaning. limoncelli, hogan, and chalup discuss end-to-end monitoring, where the monitoring system actually performs meaningful transactions and evaluates the results.7 pattern’s script, mentioned above, tests opac functionality by submitting a known keyword search and evaluating the response.8 i implemented this after an incident where nagios failed to alert me to a problem with the opac. the web server returned a status code of 200 to the request for the search page. users, however, want more from an opac, and attempts to search were unsuccessful because of problems with the index server. modifying pattern’s original script, i was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n software selection limoncelli, hogan, and chalup do not address specific t. michael silver (michael.silver@ualberta.ca) is an mlis student, school of library and information studies, university of alberta, edmonton, alberta, canada. monitoring network and service availability with open-source software | silver 9 how-to issues and rarely mention specific products. their book provides the foundational knowledge necessary to identify what must be done. in terms of monitoring, they leave the selection of an appropriate tool to the reader.9 myriad monitoring tools exist, both commercial and open-source. some focus on network analysis, and some even target specific brands or model lines. the selection of a specific software package should depend on the services being monitored and the goals for the monitoring. wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are opensource projects under a general public license or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 von hagen and jones suggest two of them: nagios and zabbix.11 i selected the nagios open-source product (http:// www.nagios.org). the software has an established history of active development, a large and active user community, a significant number of included and usercontributed extensions, and multiple books published on its use. commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. monitoring appliances based on nagios are available, as are sensors designed to interoperate with nagios. because of the flexibility of a software design that uses a plug-in architecture, service checks for library-specific applications can be implemented. if a check or action can be scripted using practically any protocol or programming language, nagios can monitor it. nagios also provides a variety of information displays, as shown in appendixes a–e. n installation the nagios system provides an extremely flexible solution to monitor hosts and services. the object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. community sites such as monitoringexchange (formerly nagios exchange), nagios community, and nagios wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with nagios.12 but that flexibility comes at a cost—nagios has a steep learning curve, and usercontributed plug-ins often require the installation of other software, most notably perl modules. nagios runs on a variety of linux, unix, and berkeley software distribution (bsd) operating systems. for testing, i used a standard linux server distribution installed on a virtual machine. virtualization provides an easy way to test software, especially if an alternate operating system is needed. if given sufficient resources, a virtual machine is capable of running the production instance of nagios. after installing and updating the operating system, i installed the following packages: n apache web server n perl n gd development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the gd library n gcc and gnu make, which are needed to compile some plug-ins and perl modules most major linux and bsd distributions include nagios in their software repositories for easy installation using the native package management system. although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. if a reasonably recent version of the software is available from a repository, i will install from there. some software packages are either outdated or not available, and i manually install these. detailed installation instructions are available on the nagios website, in several books, and on the previously mentioned websites.13 the documentation for version 3 includes a number of quick-start guides.14 most package managers will take care of some of the setup, including modifying the apache configuration file to create an alias available at http://server.name/nagios. i prepared the remainder of this article using the latest stable versions of nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n configuration nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. planning your configuration beforehand is highly recommended. nagios has two main configuration files, cgi.cfg and nagios.cfg. the former is primarily used by the web interface to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. the latter is the main configuration file and controls all other program operations. the cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). the flexibility offered allows a variety of different structures. i group network 10 information technology and libraries | march 2010 devices into groups but create individual files for each server. nagios uses an objectoriented design. the objects in nagios are displayed in table 1. a complete review of nagios configuration is beyond the scope of this article. the documentation installed with nagios covers it in great detail. special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a manageable configuration. the discussion below provides a brief introduction, while appendixes f–j provide concrete examples of working configuration files. n cgi.cfg the cgi.cfg file controls the web interface and its associated cgi (common gateway interface) programs. during testing, i often turn off authentication by setting use_authentication to 0 if the web interface is not accessible from the internet. there also are various configuration directives that provide greater control over which users can access which features. the users are defined in the /etc/nagios/htpasswd.users file. a summary of commands to control entries is presented in table 2. the web interface includes other features, such as sounds, status map displays, and integration with other products. discussion of these directives is beyond the scope of this article. the cgi.cfg file provided with the software is well commented, and the nagios documentation provides additional information. a number of screenshots from the web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg the nagios.cfg file controls the operation of everything except the web interface. although it is possible to have a single monolithic configuration file, organizing the configuration into manageable files works better. the two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. a third type of file that gets included is resource.cfg, which defines various macros for use in commands. organizing the object files takes some thought. i monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. i use the following configuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix h) n groups.cfg, containing all groups—hostgroups, servicegroups, and contactgroups, (see appendix g) n templates.cfg, containing all object templates, (see appendix f) n timeperiods.cfg, containing the time ranges for checks and notifications all devices and servers that i monitor are placed in directories using the cfg_dir directive: servers—contains server configurations. each file includes the host and service configurations for a physical or virtual server. devices—contains device information. i create individual files for devices with service monitoring that goes beyond simple ping tests for connectivtable 1. nagios objects object used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts figure 1. nagios configuration relationships. copyright © 2009 ethan galstead, nagios enterprises. used with permission. monitoring network and service availability with open-source software | silver 11 ity. devices monitored solely for connectivity are grouped logically into a single file. for example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. the resource.cfg file uses two macros to define the path to plug-ins and event handlers. thirty other macros are available. because the cgi programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for usernames and passwords needed in check commands. placing sensitive information in service configurations exposes them to the web server, creating a security issue. n configuration the appendixes include the object configuration files for a simple monitoring situation. a switch is monitored using a simple ping test (see appendix j), while an opac server on the other side of the switch is monitored for both web and z39.50 operations (see appendix i). note that the opac configuration includes a parents directive that tells nagios that a problem with the gateway-switch will affect connectivity with the opac server. i monitor fifty remote sites. if my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. the web port, web service, and opac search services demonstrate different levels of monitoring. the web port simply attempts to establish a connection to port 80 without evaluating anything beyond a successful connection. the web service check requests a specific page from the web server and evaluates only the status code returned by the server. it displays a warning because i configured the check to download a file that does not exist. the web server is running because it returns an error code, hence the warning status. the opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. i used a number of templates in the creation of this configuration. templates reduce the amount of repetitive typing by allowing the reuse of directives. templates can be chained, as seen in the host templates. the opac definition uses the linux-server template, which in turn uses the generic-host template. the host definition inherits the directives of the template it uses, overriding any elements in both and adding new elements. in practical terms, generic-host directives are read first. linux-server directives are applied next. if there is a conflict, the linuxserver directive takes precedence. finally, opac is read. again, any conflicts are resolved in favor of the last configuration read, in this case opac. n plug-ins and service checks the nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. using the plug-ins is straightforward, as demonstrated in the appendixes. most plugins will provide some information on use if executed with—help supplied as an argument to the command. by default, the plug-ins are installed in /usr/lib/nagios/ plugins. some distributions may install them in a different directory. the plugins folder contains a subfolder with usercontributed scripts that have proven useful. most of these plug-ins are perl scripts, many of which require additional perl modules available from the comprehensive perl archive network (cpan). the check_hip_search plug-in (appendix k) used in the examples requires additional modules. installing perl modules is best accomplished using the cpan perl module. detailed instructions on module installation are available online.15 some general tips: n gcc and make should be installed before trying to install perl modules, regardless of whether you are installing manually or using cpan. most modules are provided as source code, which may require compiling before use. cpan automates this process but requires the presence of these packages. n alternately, many linux distributions provide perl module packages. using repositories to install usually works well assuming the repository has all the needed modules. in my experience, that is rarely the case. table 2. sample commands for managing the htpasswd.users file create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users delete an entry from the file: htpasswd -d /etc/nagios/htpasswd.users 12 information technology and libraries | march 2010 n many modules depend on other modules, sometimes requiring multiple install steps. both cpan and distribution package managers usually satisfy dependencies automatically. manual installation requires the installer to satisfy the dependencies one by one. n most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. in the absence of such documentation, running the script on the command line usually produces an error containing the name of the missing module. n testing should be done using the nagios user. using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. the best practice is to use the nagios user for as much of the configuration and testing as possible. the lists and forums frequently include questions from new users that have successfully installed, configured, and tested nagios as the root user and are confused when nagios fails to start or function properly. n advanced topics once the system is running, more advanced features can be explored. the documentation describes many such enhancements, but the following may be particularly useful depending on the situation. n nagios provides access control through the combination of settings in the cgi.cfg and htpasswd.users files. library administration and staff, as well as patrons, may appreciate the ability to see the status of the various systems. however, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to cgi programs that perform actions. n nagios permits the establishment of dependency relationships. host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connecting services in a meaningful manner. for example, certain opac functions are dependent on ils services. defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n event handlers allow nagios to initiate certain actions after a state change. if nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. care should be taken when creating these scripts as service restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n nagios provides notification escalations, permitting the automatic notification of problems that last longer than a certain time. for example, a service escalation could send the first three alerts to the admin group. if properly configured, the fourth alert would be sent to the managers group as well as the admin group. in addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n nagios can work in tandem with remote machines. in addition to custom scripts using secure shell (ssh), the nagios remote plug-in executor (nrpe) add-on allows the execution of plug-ins on remote machines, while the nagios service check acceptor (nsca) add-on allows a remote host to submit check results to the nagios server for processing. implementing nagios on the feher and sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. these add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. the nagios exchange (http://exchange.nagios .org/) contains similar user-contributed programs for windows. n nagios can be configured to provide redundant or failover monitoring. limoncelli, hogan, and chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n alternative means of notification increase access to information on the status of the network. i implemented another open-source software package, quickpage, which allows nagios text messages to be sent from a computer to a pager or cell phone.17 appendix l shows a screenshot of a firefox extension that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 the nagios community has developed a number of alternatives, including specialized web interfaces and rss feed generators.19 monitoring network and service availability with open-source software | silver 13 n appropriate use monitoring uses bandwidth and adds to the load of machines being monitored. accordingly, an it department should only monitor its own servers and devices, or those for which it has permission to do so. imagine what would happen if all the users of a service such as worldcat started monitoring it! the additional load would be noticeable and could conceivably disrupt service. aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. an organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. if a library has a definite need to monitor another service, obtaining permission to do so is a vital first step. if permission is withheld, the service level agreement between the library and its service provider or vendor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n benefits the system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. i was able to justify the time spent on setting up monitoring the first day of production. one of the available plug-ins monitors sybase database servers. it was one of the first contributed plug-ins i implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. this happened twice, approximately a year apart. each time, the integrated library system was down while the vendor addressed the issue. when i enabled the sybase service checks, nagios immediately returned a warning for the free space. the advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. that single event convinced the library director of the value of the system. since that time, nagios has proven its worth in alerting it staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n conclusion monitoring systems and services provides it staff with a vital tool in providing quality customer service and managing systems. installing and configuring such a system involves a learning curve and takes both time and computing resources. my experiences with nagios have convinced me that the return on investment more than justifies the costs. references 1. thomas a. limoncelli, christina j. hogan, and strata r. chalup, the practice of system and network administration, 2nd ed. (upper saddle river, n.j.: addison-wesley, 2007): 36. 2. ibid., 523–42. 3. james feher and tyler sondag, “administering an opensource wireless network,” information technology & libraries 27, no. 3 (sept. 2008): 44–54. 4. dave pattern, “keeping an eye on your hip,” online posting, jan. 23, 2007, self-plagiarism is style, http://www.daveyp .com/blog/archives/164 (accessed nov. 20, 2008). 5. feher and sondag, “administering an open-source wireless network,” 45–54. 6. ibid., 48, 53–54. 7. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 8. pattern, “keeping an eye on your hip.” 9. limoncelli, hogan, and chalup, the practice of system and network administration, xxv. 10. “comparison of network monitoring systems,” wikipedia, the free encyclopedia, dec. 9, 2008, http://en.wikipedia .org/wiki/comparison_of_network_monitoring_systems (accessed dec. 10, 2008). 11. william von hagen and brian k. jones, linux server hacks, vol. 2 (sebastopol, calif.: o’reilly, 2005): 371–74 (zabbix), 382–87 (nagios). 12. monitoringexchange, http://www.monitoringexchange. org/ (accessed dec. 23, 2009); nagios community, http:// community.nagios.org (accessed dec. 23, 2009); nagios wiki, http://www.nagioswiki.org/ (accessed dec. 23, 2009). 13. “nagios documentation,” nagios, mar. 4, 2008, http:// www.nagios.org/docs/ (accessed dec. 8, 2008); david josephsen, building a monitoring infrastructure with nagios (upper saddle river, n.j.: prentice hall, 2007); wolfgang barth, nagios: system and network monitoring, u.s. ed. (san francisco: open source press; no starch press, 2006). 14. ethan galstead, “nagios quickstart installation guides,” nagios 3.x documentation, nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed dec. 3, 2008). 15. the perl directory, (http://www.perl.org/) contains complete information on perl. specific information on using cpan is available in “how do i install a module from cpan?” perlfaq8, nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed dec. 4, 2008). 16. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 17. thomas dwyer iii, qpage solutions, http://www.qpage .org/ (accessed dec. 9, 2008). 18. petr šimek, “nagioschecker,” google code, aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed dec. 8, 2008). 19. “notifications,” monitoringexchange, http://www .monitoringexchange.org/inventory/utilities/addon-projects/notifications (accessed dec. 23, 2009). 14 information technology and libraries | march 2010 appendix a. service detail display from test system appendix b. service details for opac (hip) and ils (horizon) servers from production system appendix c. sybase freespace trends for a specified period appendix d. connectivity history for a specified period appendix e. availability report for host shown in appendix d appendix f. templates.cfg file ############################################################################ # templates.cfg sample object templates ############################################################################ ############################################################################ # contact templates ############################################################################ monitoring network and service availability with open-source software | silver 15 # generic contact definition template this is not a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # host templates ############################################################################ # generic host definition template this is not a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # linux host definition template this is not a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } appendix f. templates.cfg file (cont.) 16 information technology and libraries | march 2010 # define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # service templates ############################################################################ # generic service definition template this is not a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } appendix f. templates.cfg file (cont.) monitoring network and service availability with open-source software | silver 17 # define a ping service. this is not a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } appendix f. templates.cfg file (cont.) appendix g. groups.cfg file ############################################################################ # contact group definitions ############################################################################ # we only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias nagios administrators members nagiosadmin } ############################################################################ # host group definitions ############################################################################ # define an optional hostgroup for linux machines define hostgroup{ hostgroup_name linux-servers ; the name of the hostgroup alias linux servers ; long name of the group } # create a new hostgroup for ils servers define hostgroup{ hostgroup_name ils-servers ; the name of the hostgroup alias ils servers ; long name of the group } # create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; the name of the hostgroup alias network switches ; long name of the group } ############################################################################ # service group definitions ############################################################################ 18 information technology and libraries | march 2010 # define a service group for network connectivity define servicegroup{ servicegroup_name network alias network infrastructure services } # define a servicegroup for ils define servicegroup{ servicegroup_name ils-services alias ils related services } appendix g. groups.cfg file (cont.) appendix h. contacts.cfg ############################################################################ # contacts.cfg sample contact/contactgroup definitions ############################################################################ # just one contact defined by default the nagios admin (that’s you) # this contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias nagios admin email nagios@localhost } appendix i. opac.cfg ############################################################################ # opac server ############################################################################ ############################################################################ # host definition ############################################################################ # define a host for the server we’ll be monitoring # change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias opac server monitoring network and service availability with open-source software | silver 19 appendix i. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # service definitions ############################################################################ # create a service for monitoring the http port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # create a service for monitoring the web service define service{ use generic-service host_name opac service_description web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # create a service for monitoring the opac search define service{ use generic-service host_name opac service_description opac search check_command check_hip_search } # create a service for monitoring the z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } appendix j. switches.cfg ############################################################################ # switch.cfg sample config file for monitoring switches ############################################################################ ############################################################################ # host definitions ############################################################################ 20 information technology and libraries | march 2010 appendix k. check_hip_search script #!/usr/bin/perl -w ######################### # check horizon information portal (hip) status. # hip is the web-based interface for dynix and horizon # ils systems by sirsidynix corporation. # # this plugin is based on a standalone perl script written # by dave pattern. please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # the original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use lwp::useragent; # note the requirement for perl module lwp::useragent! use lib “/usr/lib/nagios/plugins”; use utils qw($timeout %errors); # define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias gateway switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # service definitions ############################################################################ ### # create a service to ping to switches # note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description ping normal_check_interval 5 retry_check_interval 1 } appendix j. switches.cfg monitoring network and service availability with open-source software | silver 21 ### some configuration options my $hipserverhome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipserversearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.gw&term=li nux&x=18&y=13&aspect=subtab132&getxml=true”; my $hipsearchtype = “xml”; my $httpproxy = ‘’; ### check home page is available... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserverhome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “hip_search critical: $status\n”; exit $errors{‘critical’}; } } ### check search page is returning results... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserversearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipsearchtype ) eq ‘html’ ) { if ( $content =~ /\(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; appendix k. check_hip_search script (cont.) 22 information technology and libraries | march 2010 } } if( lc( $hipsearchtype ) eq ‘xml’ ) { if( $content =~ /\(\d+?)\<\/hits\>/ ) { $results = $1; } } ### modified section original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “hip_search critical: no results returned|results=0\n”; exit $errors{‘critical’}; } if ( $results ) { print “hip_search ok: $results results returned|results=$results\n”; exit $errors{‘ok’}; } } } appendix k. check_hip_search script (cont.) appendix l. nagios checker display persistent urls and citations offered for digital objects by digital libraries article persistent urls and citations offered for digital objects by digital libraries nicholas homenda information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12987 abstract as libraries, archives, and museums make unique digital collections openly available via digital library platforms, they expose these resources to users who may wish to cite them. often several urls are available for a single digital object, depending on which route a user took to find it, but the chosen citation url should be the one most likely to persist over time. catalyzed by recent digital collections migration initiatives at indiana university libraries, this study investigates the prevalence of persistent urls for digital objects at peer institutions and examines the ways their platforms instruct users to cite them. this study reviewed institutional websites from the digital library federation’s (dlf) published list of 195 members and identified representative digital objects from unique digital collections navigable from each institution’s main web page in order to determine persistent url formats and citation options. findings indicate an equal split between offering and not offering discernible persistent urls with four major methods used: handle, doi, ark, and purl. significant variation in labeling persistent urls and inclusion in item-specific citations uncovered areas where the user experience could be improved for more reliable citation of these unique resources. introduction libraries, archives, and museums often make their unique digital collections openly available in digital library services and in different contexts, such as digital library aggregators like the digital public library of america (dpla, https://dp.la/) and hathitrust digital library (https://www.hathitrust.org/). as a result, there can be many urls available that point to digital objects within these collections. take, for example, image collections online (http://dlib.indiana.edu/collections/images) at indiana university (iu), a service launched in 2007 featuring open access iu image collections. users discover images on the site through searching and browsing and its collections are also shared with dpla. the following urls exist for the digital object shown in figure 1, an image from the building a nation: indiana limestone photograph collection: • the url as it appears in the browser in image collections online: https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/i mages/vac5094/vac5094-01446 • the persistent url on that page (“bookmark this page at”) http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 • the url pasted from the browser for the image in dpla: https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone nicholas homenda (nhomenda@indiana.edu) is digital initiatives librarian, indiana university bloomington. © 2021. https://dp.la/ https://www.hathitrust.org/ http://dlib.indiana.edu/collections/images https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone mailto:nhomenda@indiana.edu information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 2 as a digital library or collection manager, which url would you prefer to see cited for this object? figure 1. an example of a digital object with multiple urls. mcmillan mill, ilco id in2288_1. courtesy, indiana geological and water survey, indiana university, bloomington, indiana. retrieved from image collections online at http://purl.dlib.indiana.edu/iudl/images/vac5094/vac509401446. citation instructions given to authors in major style guides explicitly mention using the best possible form of a resource’s url: “[i]t is important to choose the version of the url that is most likely to continue to point to the source cited.”1 of the three urls above, the second is a purl, or persistent url (https://archive.org/services/purl/), which is why both image collections online and dpla instruct users to bookmark or cite it. other common methods for issuing and maintaining persistent urls include digital object identifiers (doi, https://www.doi.org/), handles (http://handle.net/), and archival resource keys (ark, https://n2t.net/e/ark_ids.html). all of those have been around since the late 1990s to early 2000s. at indiana university libraries, recent efforts have focused on migrating digital collections to new digital library platforms, mainly based on the open source samvera repository software (https://samvera.org/). as part of these efforts, we wanted to survey how peer institutions were http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://archive.org/services/purl/ https://www.doi.org/ http://handle.net/ https://n2t.net/e/ark_ids.html https://samvera.org/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 3 employing persistent, citable urls for digital objects to determine if a prevailing approach had emerged since indiana university libraries’ previous generation of digital library services were developed in the earlyto mid-2000s. besides having the capability of creating and reliably serving these urls, our digital library platforms need to make these urls easily accessible to users, preferably along with some assertion that the urls should be used when citing digital objects and collections instead of the many non-persistent urls also directing to those same digital objects and collections. although libraries, archives, and museums have digitized and made digital objects in digital collections openly accessible for decades using several methods for providing persistent, citable urls, how do institutions now present digital object urls to people who encounter, use, and cite them? by examining digital collections within a large population of digital library institutions’ websites, this study aims to discover 1. what methods of url persistence are being employed for digital objects by digital library institutions? 2. how do these institutions’ websites instruct users to cite these digital objects? literature review the study of digital objects in the literature often takes a philosophical perspective in attempting to define them. moreover, practical accounts of digital object use and reuse note the challenges associated with infrastructure, retrieval, and provenance. much of the literature about common methods of persistent url resolution comes from individuals and entities who developed and maintain these standards, as well as overviews of the persistent url resolution methods available. finally, several studies have investigated the problem of “link rot” by tracking the availability of web-hosted resources over time. allison notes the generations of philosophical thought that it took to recognize common characteristics of physical objects and the difficulty in understanding an authentic version of a digital object, especially with different computer hardware and software changing the way digital objects appear.2 hui also investigates the philosophical history of physical objects to begin to define digital objects through his methods of datafication of objects and objectification of data, noting that digital objects can be approached in three phases: objects, data, and networks, in order to define them.3 lynch is also concerned with determining the authenticity of digital objects and challenges inherent in the digital realm. in describing digital objects, he creates a hierarchy with raw data at the bottom, elevated to interactive experiential works at the top which elicit the fullest emotional connection contributing to the authentic experience of the work.4 the literature often examines digital objects from the practitioner’s perspective, such as the publishing industry’s difficulty in repurposing digital objects for new publishing products. publishers in benoit and hussey’s 2011 case study note the tension between managers and technical staff concerning assumptions about what their computer system could automatically do with their digital objects; their digital objects always require some human labor and intervention to be accurately described and retrievable later. 5 dappert et al. note the need to describe a digital object’s environment in order to be able to reproduce it in their work with the premis data dictionary for preservation metadata (https://www.loc.gov/standards/premis/).6 strubulis et al. provide a model for digital object provenance using inference and resource description framework (rdf) triples (https://w3.org/rdf/) since storing full provenance information for https://www.loc.gov/standards/premis/ https://w3.org/rdf/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 4 complex digital objects, such as the large amount of mars rover data they offer as an example, would be cost prohibitive.7 in 2001, arms describes the landscape of persistent uniform resource names (urn) of handles, purls, and dois near the latter’s inception.8 recent work by koster explains the persistent identifier methods most in use today and examines current infrastructure practices for maintaining them.9 the persistent link resolution method most prominently featured in the literature is the digital object identifier (doi). beginning in 1999, those behind developing and implementing doi have explained its inception, development, and trajectory, continuing with paskin’s deep explanation in 2002 of the reasons why doi exist and the technology behind the service. 10 discipline-specific research notes the utility of doi. sidman and davidson and weissberg studied doi for the purposes of automating the supply chain in the publishing industry.11 derisi, kennison, and twyman, on behalf of the public library of science (plos) announced their 2003 decision to broadly implement doi, followed by additional disciplinespecific encouragement of the practice by skiba in nursing education and neumann and brase in molecular design.12 the archival resource key (ark) is an alternative permanent link resolution scheme. since 2001, the open-source ark identifier offers a self-hosted solution for providing persistent access to digital objects, their metadata, and a maintenance commitment.13 recently, duraspace working groups have planned for further development and expansion of ark with the arks in the open project (https://wiki.lyrasis.org/display/arks/arks+in+the+open+project). persistent urls (purls) have been used to provide persistent access to digital objects for nearly 20 years, and their use in the library community is well documented. shafer, weibel, and jul anticipate uniform resource names becoming a web standard and offer purls as an intermediate step to aid in urn development.14 shafer also explained how oclc uses purls and alternate routing methods (arms) to properly direct global users to oclc resources.15 purls are also used to provide persistent access to government information and were seen by the cendi persistent identification task group as essential to their early efforts to implement the federal enterprise architecture (fea) and a theoretical federal persistent identification resolver.16 digital objects and collections should ideally be accessible via urls that work beyond the life of any one platform, lest the materials be subjected to “link rot,” or the process of decay when previously working links no longer correctly resolve. ducut et al. investigated 1994–2006 medline abstracts for the presence of persistent link resolution services such as handle, purl, doi, and webcite and found 20% of the links were inaccessible in 2008.17 mcmurry et al. investigated link rot in life sciences data and suggested practices for formatting links for increased persistence and approaches for versioning.18 the topic of link rot has been examined as early as 2003, in markwell and brooke’s “broken links: just how rapidly do science education hyperlinks go extinct,” cited by multiple link rot studies. ironically, this article is no longer accessible at the cited url.19 methodology this study sought a set of digital objects within library institutions’ digital collections websites. to locate examples of publicly accessible digital objects in digital collections, this study collected institutional websites from the digital library federation’s (dlf) published list of 195 members https://wiki.lyrasis.org/display/arks/arks+in+the+open+project information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 5 as of august 2019.20 subsequent investigation aimed to find one representative digital object from unique digital collections navigable from each institution’s main web page. this study aimed to locate digital collections that met the following criteria: 1. collections are openly available. 2. collections are in a repository service, as opposed to highlighted content visible on an informational web page or blog. 3. collections are gathered within a site or service that contains multiple collections, as opposed to individual digital project websites, when possible. 4. collections are unique to an institution, as opposed to duplicated or licensed content. these criteria were developed in an effort to find unique, publicly accessible digital objects within each institution’s digital collections. to be sure, users search for and discover materials in a variety of ways and in numerous services, but studying the information-seeking behavior of users looking for digital objects or digital collections is outside the scope of this study. ultimately, digital collections indexed by search engines or available in aggregator services like dpla often contain links to collections and objects in their institutionally hosted platforms. users who discover these materials are likely to be directed to the sites this study investigated. for the purposes of this study, at least one digital collection was investigated from each dlf institution. multiple sites for an institution were investigated when more than one publicly accessible site or service met the above criteria. when digital collections at an institution were delivered only through the library catalog discovery service, reasonable attempts were made to delimit discoverable digital collections content. in total, 183 digital collections were identified for this study. once digital collections were located, subsequent investigation aimed to locate individual digital objects within them. while digital objects represent diverse materials available in a variety of formats, for ease of comparing approaches between institutions, a mixture of ind ividual digital images, multipage digital items, and audiovisual materials were examined. objects for this study were primarily available in websites containing a variety of collections and format types with common display characteristics despite format differences, and no additional efforts were made to locate equal or proportional digital object formats at each institution. one representative digital object was identified per digital collection, totaling 183 digital objects. once a digital object was located at an institution, the object’s unique identifier, format, persistent url, persistent url label, method of link resolution (if identifiable), and citation were collected with particular focus on the object’s persistent url, if available. commonly used persistent url types and their url components can be identified, as seen in table 1; however, any means of persistence was collected if clearly identified. after examining initial results, the object’s provided citation, if available, was added to the list of data collected since many digital collection platforms provide recommended citations for individual objects. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 6 table 1. commonly used persistent url methods and corresponding url components persistent url type url component archival resource key (ark) ark:/ digital object identifier (doi) doi.org/ (or doi:) handle hdl.handle.net persistent url (purl) purl. results most institutions have a single digital collection site or service that met the selection criteria for this study. some appear to have multiple digital collection repositories, often separated by digital object format or library department, and many institutions have collections that are only publicly accessible through discrete project web sites, such as digital exhibits or focused digital humanities research projects. out of 195 dlf member institutions, 171 had publicly accessible digital collections. of these 171 institutions, 153 had digital collections services/sites that adhered to the criteria of this study, while 21 had only project-focused digital collections sites. since several institutions had more than one digital collection platform accessible via their main institutional website, a population of 183 digital collections were investigated. one representative digital object from each collection was gathered, consisting of 107 digital images, 73 multipage items, and 3 audiovisual items (totaling 183). table 2. number of instances of digital collection platforms identified platform number percentage of total (183) custom or unidentifiable 53 29% contentdm 46 25% islandora 19 10% dspace 11 6% samvera 11 6% omeka 10 5% internet archive 7 4% digital commons 6 3% fedora custom 4 2% luna 3 2% xtf 3 2% artstor 2 1% iiif server 2 1% primo 2 1% aspace 1 1% elevator 1 1% knowvation 1 1% veridian 1 1% information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 7 as seen in table 2, almost a third of digital collection platforms encountered appear to be customdeveloped or customized to not reveal the software platform upon which they were based. of the platform-based services encountered where software was identifiable, 17 different platforms were used and the top five were contentdm, islandora, dspace, samvera (hyrax, avalon, curation concerns, etc.), and omeka. table 3. occurrence of persistent links in surveyed digital collections, method of link persistence, and persistent link labels persistent links? number percentage of total (183) no/unknown 93 51% yes/ persistence claimed 90 49% persistent link method number percentage of total (90) unknown 33 37% handle 27 30% ark 19 21% doi 6 7% purl 5 6% persistent link label number percentage of total (90) othera 24 26.7% permalink 22 24.4% identifier 13 14.4% [no label given] 10 11.1% permanent link 7 7.8% uri 5 6% persistent link 3 3.3% handle 2 2.2% link to the book 2 2.2% persistent url 2 2.2% atwenty-four other persistent link labels were reported,21 each occurring only once. as seen in table 3, the numbers of digital objects with and without publicly accessible persistent (or seemingly persistent) links were nearly equal. among the digital objects with persistent links, the majority claimed persistence without a discernible resolution method, with the rest divided between handle, ark, doi, and purl. these objects also had 33 different labels for these links in the public-facing interface. the top five labels were: permalink (22), identifier (13), permanent link (7), uri (5), and persistent link (3). as seen in table 4, the majority of digital objects surveyed had a unique item identifier in their publicly viewable item record. the majority did not offer a citation in the item’s publicly viewable record. among items that offered citations, the majority contained a link to the item, and three offered downloadable citation formats only, such as endnote, zotero, and mendeley. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 8 table 4. various digital object characteristics surveyed unique item identifier in item record number percentage of total (183) yes 132 72% no 51 28% citation in item record number percentage of total (183) yes 65 36% no 118 64% citations containing links to item number percentage of total (65) yes 39 60% downloadable citation format only 3 5% no 23 35% discussion since proper citation practice dictates choosing the url most likely to provide continuing access to a resource, it follows that providing persistent urls to resources such as digital objects or digital collections is also a good practice. it is encouraging to see a large number of institutions surveyed providing urls that persist (or claim to persist). providing persistent access to a unique digital resource implies a level of commitment to maintaining its url into the future, requiring policies, technology, and labor resources, further augmented by costs associated with registering certain types of identifiers like doi.22 it is likely that institutions not providing persistent (or not obviously persistent) urls are either internally committing to preserving their objects, collections, and services through means not known to end users; are constrained by technological limitations of their digital collection platforms; hope to develop or adopt new digital library services that offer these capabilities; or lack the resources to offer persistent urls. the four commonly used methods of persistent link resolution—doi, handle, ark, and purl— have been used for nearly 20 years, and it is not surprising that alternative observable methods were seldom encountered in this study. handles were the most common persistent url method, which seems related to the digital library platform used by an institution. dspace distributions are pre-bundled with handle server software, for example, and 12 out of 27 platforms serving digital objects with handles were based on dspace (https://duraspace.org/dspace/). when choosing to implement or upgrade a digital library platform, institutions often consider several available options. choosing a platform that offers the ability to easily create and maintain persistent urls might be less burdensome than making urls persist via independent or alternative means. thirty-three digital objects offered links that had labels implying some sort of persistence but lacked information describing the methods used or url components consistent with commonly used methods, as seen in table 1. to achieve persistence, there might be a combination of url rewriting, locally implemented solutions, or nonpublic persistent urls existing. it would benefit users, increasingly aware of the need to cite digital objects using persistent links, for digital object platforms that offer persistent linking to explicitly state that fact and ideally offer some evidence of the resolution method used. researchers will be looking for citable persistent links that offer https://duraspace.org/dspace/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 9 some cues signifying their persistence, whether it is clearly indicated language on the website or a url pattern consistent with the four major methods commonly used. the amount of variation in labeling persistent links was surprising. commonly used digital library software platforms have default ways of labeling these fields. nearly all of the “reference url” labels encountered are in contentdm sites, for example. since the concept of offering a persistent link to a digital object is not uncommon, perhaps there can be a more consistent approach to choosing the label for this content. when a researcher finds a digital object in an institutional digital library service, they might want to cite that object. accurately citing resources in all formats is an essential research skill, and digital library platforms often try to aid users by providing dynamically generated or pre-populated citations based on unique metadata associated with that object. it was somewhat surprising to encounter these types of citation helpers that did not include persistent links. since a digital object’s preferred persistent link is often different than the url visible in the browser, efforts should be made to make citations available containing persistent links. there are institutions with digital collections that were not examined in this study due to a number of factors. first, this study examined the 195 institutions who were members of the digital library federation, and there are 2,828 four-year postsecondary institutions in the united states as of 2018.23 additional study could expand perceptions about persistent links for digital objects when looking beyond the dlf member institutions, which are predominantly four-year postsecondary institutions but also contain museums, public libraries, and other cultural heritage organizations. an alternative approach to collecting this data would be to conduct user testing focused on finding and citing digital objects from a number of institutions. this approach was not used, however, since the initial goal of this study was to see how peer digital library institutions have employed persistent links and citations across a broad yet contained spectrum. as one librarian with extensive digital library experience, my approach to locating these platforms and resources is subject to subconscious bias i may have accumulated over my professional career, but i would hope that my experience makes me more able to locate these platforms and materials than the average user. digital library platforms are numerous, and often institutions have several of them with varying degrees of public visibility or connectivity to their institution’s main library website. this study’s findings for any particular institution are not as authoritative as self-reported information from the institution itself. while a survey aimed at collecting direct responses from institutions might have yielded more accuracy, a potentially low response rate would also make it difficult to truly know what methods of persistent linking peer institutions are employing, especially with the majority of these resources being openly findable and accessible. still, further study with self reported information could shed more light on the decisions to provide certain methods of persistent links to objects within their chosen digital collection platforms. moreover, it is possible that some digital object formats are more likely to have persistent urls than others. newer formats such as three-dimensional digital objects, commonly cited resources like data sets, and scholarship held in institutional repositories could be available in digital library services similar to those surveyed in this study with different persistent url characteristics. additional study could aim to survey populations of digital objects by format across multiple institutions to investigate any correlation between persistent urls and object format. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 10 conclusion unique digital collections at digital library institutions are made openly accessible to the pu blic in a variety of ways, including digital library software platforms and digital library aggregator services. regardless of how users find these materials, best practices require users to cite urls for these materials that are most likely to continue to provide access to them. persistent urls are a common way to ensure cited urls to digital objects remain accessible. commonly used methods of issuing and maintaining persistent urls can be identified in digital object records within digital collection platforms available at these institutions. this study identified characteristics about these digital objects, their platforms, prevalence of persistent urls in their records, and the way these urls are presented to users. findings indicate that dlf member institutions are split evenly between providing and not providing publicly discernible persistent urls with wide variation on how these urls are presented and explained to users. decisions made in developing and maintaining digital collection platforms and the types of urls made available to users impact which urls users cite and the possibility of others encountering these resources through these citations. embarking on this study also was prompted by digital collection migrations at indiana university, and these findings provide us interesting examples of persistent url usage at other institutions and ways to improve the user experience in digital collection platforms. endnotes 1 the chicago manual of style online (chicago: university of chicago press, 2017), ch. 14, sec. 7. 2 arthur allison et al., “digital identity matters,” journal of the american society for information science & technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 3 yuk hui, “what is a digital object?” metaphilosophy 43, no. 4 (2012): 380–95, https://doi.org/10.1111/j.1467-9973.2012.01761.x. 4 clifford lynch, “authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust” council on library and information resources (clir), 2000, https://www.clir.org/pubs/reports/pub92/lynch/. 5 g. benoit and lisa hussey, “repurposing digital objects: case studies across the publishing industry,” journal of the american society for information science & technology 62, no. 2 (2011): 363–74, https://doi.org/10.1002/asi.21465. 6 angela dappert et al., “describing and preserving digital object environments,” new review of information networking 18, no. 2 (2013): 106–73, https://doi.org/10.1080/13614576.2013.842494. 7 christos strubulis et al., “a case study on propagating and updating provenance information using the cidoc crm,” international journal on digital libraries 15, no. 1 (2014): 27–51, https://doi.org/10.1007/s00799-014-0125-z. 8 william y. arms, “uniform resource names: handles, purls, and digital object identifiers,” communications of the acm 44, no. 5 (2001): 68, https://doi.org/10.1145/374308.375358. https://doi.org/10.1111/j.1467-9973.2012.01761.x https://www.clir.org/pubs/reports/pub92/lynch/ https://doi.org/10.1002/asi.21465 https://doi.org/10.1080/13614576.2013.842494 https://doi.org/10.1007/s00799-014-0125-z https://doi.org/10.1145/374308.375358 information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 11 9 lukas koster, “persistent identifiers for heritage objects,” code4lib journal 47 (2020), https://journal.code4lib.org/articles/14978. 10 albert w. simmonds, “the digital object identifier (doi),” publishing research quarterly 15, no. 2 (1999): 10, https://doi.org/10.1007/s12109-999-0022-2; norman paskin, “digital object identifiers,” information services & use 22, no. 2/3 (2002): 97, https://doi.org/10.3233/isu2002-222-309. 11 david sidman and tom davidson, “a practical guide to automating the digital supply chain with the digital object identifier (doi),” publishing research quarterly 17, no. 2 (2001): 9, https://doi.org/10.1007/s12109-001-0019-y; andy weissberg, “the identification of digital book content,” publishing research quarterly 24, no.4 (2008): 255–60, https://doi.org/10.1007/s12109-008-9093-8. 12 susanne derisi, rebecca kennison, and nick twyman, “the what and whys of dois,” plos biology 1, no. 2 (2003): 133–34, https://doi.org/10.1371/journal.pbio.0000057; diane j. skiba, “digital object identifiers: are they important to me?,” nursing education perspectives 30, no. 6 (2009): 394–95, https://doi.org/10.1016/j.lookout.2008.06.012; janna neumann and jan brase, “datacite and doi names for research data,” journal of computer-aided molecular design 28, no. 10 (2014): 1035–41, https://doi.org/10.1007/s10822-014-9776-5. 13 john kunze, “towards electronic persistence using ark identifiers,” california digital library, 2003, https://escholarship.org/uc/item/3bg2w3vs. 14 keith e. shafer, stuart l. weibel, and erik jul, “the purl project,” journal of library administration 34, no. 1–2 (2001): 123, https://doi.org/10.1300/j111v34n01_19. 15 keith e. shafer, “arms, oclc internet services, and purls,” journal of library administration 34, no. 3–4 (2001): 385, https://doi.org/10.1300/j111v34n03_19. 16 cendi persistent identification task group, “persistent identification: a key component of an egovernment infrastructure,” new review of information networking 10, no. 1 (2004): 97–106, https://doi-org/10.1080/13614570412331312021. 17 erick ducut, fang liu, and paul fontelo, “an update on uniform resource locator (url) decay in medline abstracts and measures for its mitigation,” bmc medical informatics & decision making 8, no. 1 (2008): 1–8, https://doi.org/10.1186/1472-6947-8-23. 18 julie a. mcmurry et al., “identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data,” plos biology 15, no. 6 (2017): 1–18, https://doi.org/10.1371/journal.pbio.2001414. 19 john markwell and david brooks, “broken links: just how rapidly do science education hyperlinks go extinct?” (2003), cited by many and previously available from: http://wwwclass.unl.edu/biochem/url/broken_links.html [currently non-functional]. 20 “our member institutions,” digital library federation (2020), https://www.diglib.org/about/members/. https://journal.code4lib.org/articles/14978 https://doi.org/10.1007/s12109-999-0022-2 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.1007/s12109-001-0019-y https://doi.org/10.1007/s12109-008-9093-8 https://doi.org/10.1371/journal.pbio.0000057 https://doi.org/10.1016/j.lookout.2008.06.012 https://doi.org/10.1007/s10822-014-9776-5 https://escholarship.org/uc/item/3bg2w3vs https://doi.org/10.1300/j111v34n01_19 https://doi.org/10.1300/j111v34n03_19 https://doi-org/10.1080/13614570412331312021 https://doi.org/10.1186/1472-6947-8-23 https://doi.org/10.1371/journal.pbio.2001414 http://www-class.unl.edu/biochem/url/broken_links.html http://www-class.unl.edu/biochem/url/broken_links.html https://www.diglib.org/about/members/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 12 21 twenty-four labels used only once: archival resource key; ark; bookmark this page at; citable link; citable link to this page; citable uri; copy; copy and paste this url; digital object url; doi; identifier (hdl); item; link; local identifier; permanent url; permanently link to this resource; persistent link to this item; persistent link to this record; please use this identifier to cite or link to this item; related resources; resource identifier; share; share link/location; to cite or link to this item, use this identifier. 22 one of the frequently asked questions (https://www.doi.org/faq.html) states that doi registration fees vary. 23 national center for education statistics, “table 317.10. degree-granting postsecondary institutions, by control and level of institution: selected years, 1949–50 through 2017–18,” in digest of education statistics, 2018, https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp. https://www.doi.org/faq.html https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp abstract introduction literature review methodology results discussion conclusion endnotes let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects tanya m. johnson information technology and libraries | june 2016 39 abstract three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. previous research has not addressed the evaluation of digitized versions of three-dimensional objects. this paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. introduction much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. however, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. as one author stated, “while researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (jarrell 1998, 32). the importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (given and mctavish 2010, 11). however, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. no longer do people need to physically visit museums to experience and learn from threedimensional objects. rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. nevertheless, as newell (2012) states: collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these tanya m. johnson (tmjohnso@gmail.com), a recent mlis degree graduate from the school of communication & information, rutgers, the state university of new jersey, is winner of the 2016 lita/ex libris student writing award. mailto:tmjohnso@gmail.com let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. the raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) in this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (given and mctavish 2010). in light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. literature review two-dimensional digitization there are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. beginning on the simpler end of the scale, bincsik, maezaki, and hattori (2012) describe the process of photographing japanese decorative art objects in order to create an image database of objects from multiple museums. specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. they also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. for this project, the researchers utilized nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. at times, they also took measurements of the objects in order to assist museums in maintaining accurate records. the raw image files were then processed with programs such as adobe photoshop, saved as original tif files, and converted into jpeg format for upload. despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: with decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. one objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (bincsik, maezaki, and hattori 2012, 46) one difficulty with photography, particularly when digitizing artwork, is that color is a function of light. thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using rgb filters. information technology and libraries | june 2016 41 more accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to rgb’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (novati, pellegri, and schettini 2005). multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., novati, pellegri, and schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (chane et al. 2013). the use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (louw and crowley 2013). this technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (louw and crowley 2013, 89–90). robotic systems, such as gigapan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (louw and crowley 2013). various types of gigapixel imaging, including the gigapan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. one study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (holovachov, zatushevsky, and shydlovsky 2014). the researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. however, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. the system created by holovachov, zatushevsky, and shydlovsky utilized a standard slr camera, fitted with a macro lens and attached to an immovable stand. the researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. they used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1the difference between telecentric and non-telecentric lenses is explained by the researchers: “contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. an object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. there is no perspective error and the image projection is parallel. therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (holovachov, zatushevsky, and shydlovsky 2014, 7). let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (holovachov, zatushevsky, and shydlovsky 2014, 7). three-dimensional digitization given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. acquiring a three-dimensional image of an object takes place in three steps: 1. preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. digital recording, which is the main digitization process according to the plan from phase 1. 3. data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (pavlidis et al. 2007, 94) steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of x,y,z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (metallo and rossi 2011). there are several techniques that can be utilized to acquire three-dimensional data from a physical object. table 1 explains the four general methods most commonly used by museums. information technology and libraries | june 2016 43 type description positives negatives approx. price range laser scanning a laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations high accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 white light (structured light) scanning a pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations captures texture details, making it very accurate; can capture color dark, shiny, or translucent objects are problematic $15,000– $250,000 photogrammetry three-dimensional data is extracted from multiple twodimensional pictures can capture small objects and mountain ranges; good color information need either precise placement of cameras or more precise software to obtain accurate data cameras: $500– $50,000; software: free– $40,000 volumetric scanning magnetic resonance imaging (mri) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (ct) uses rotating x-rays to create twodimensional slices, which can then be reconstructed into three-dimensional images both types can view the interior and exterior of an object; ct can be used for reflective or translucent objects; mri can image soft tissues no color information; mri requires object to have high water content $200,000– $2,000,000 table 1. description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to pavlidis et al. 2007; metallo and rossi 2011; abel et al. 2011; and berquist et al. 2012). the type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. for example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that ct scans would be needed to examine the internal structures of such specimens (kuzminsky and gardiner 2012). another study let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (montani et al. 2012). additionally, sometimes multiple types of digitization can be used for the same objects with similar results. one study, for example, obtained virtually equivalent threedimensional models of the same object using laser scanning and two types of photogrammetry (lerma and muir 2014). most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. the researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. as a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. however, the researchers noted that most systems integrating multispectral photography with threedimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (chane et al. 2013). public access to three-dimensionally digitized objects despite museums’ growing focus on increasing public access to collections via digitization (given and mctavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. for example, abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. the researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a java applet without compromising quality or necessitating download of potentially copyrighted works. by contrast, literature concerning the digital michelangelo project, during which researchers three-dimensionally digitized various michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (dellepiane et al. 2008). one study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (hess et al. 2011). this repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an activex plugin. despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2see http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/ecurator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator information technology and libraries | june 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 studies that do address public access directly tend to focus on the improvement of museum websites generally. for example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (kravchyna and hastings 2002). another study found four types of museum website users, who each had different needs and expectations of sites. relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (cameron 2003, 335). further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (cameron 2003, 335). similarly, some studies have addressed the incorporation of web 2.0 technologies into museum websites. srinivasan et al. (2009), for example, argue that web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. an implementation of this concept is found in hunter and gerber’s (2010) system of social tagging attached to threedimensional models. this paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. through the evaluation of five websites, this paper will provide some guidance for the digitization of threedimensional objects and their presentation in digital collections for public access. methodology and evaluative criteria evaluations of digital museums are not as prevalent as evaluations of digital libraries. however, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. for digital libraries, saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) systemcentered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3see http://www.3dencounters.com. http://www.3dencounters.com/ let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. this system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. in one comprehensive evaluation of the websites of art museums, pallas and economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. each dimension then contained several specific criteria. many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. the problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. as hariri and norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. this framework is detailed in table 2, below. dimension description functionality what technology is used to display the object? how well does it work? must programs or files be downloaded? are the loading times of displays acceptable? usability how easy is the site to use? what is the navigation system? are there searching and browsing functions, and how well does each work? how findable are individual objects? presentation how does the display of the object look? what is the context in which the object is presented? are there multiple viewing options? is there any interactivity permitted? content does the site provide an adequate collection of objects? for individual objects, is there sufficient information provided? is there additional educational content? table 2. summary of evaluative criteria five digital collections, specified below, will be evaluated based on these criteria. this will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. results information technology and libraries | june 2016 47 it is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. however, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. museum of fine arts, boston (mfa) (http://www.mfa.org/collections) the mfa, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. the website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. there are many collections, which are well organized and curated into separate exhibits and galleries. in addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. each item record contains a detailed description of the item as well as its provenance. thus, the mfa website attains a very high rating for usability and content. however, individual items are represented by only single pictures of varying quality. some pictures are color, some are black and white, and no two pictures appear to have the same lighting. additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. accordingly, the mfa website receives a medium rating for functionality and a low rating for presentation. digital fish library (dfl) (http://www.digitalfishlibrary.org/index.php) the dfl project is a comprehensive program that utilizes mri scanning to digitize preserved biological fish samples from a particular collection housed at the scripps institution of oceanography. after mri scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (berquist et al. 2012). navigating the dfl website is very intuitive, as the individual specimen records are organized by taxonomy. it is easy to search for particular species or browse through the clickable, pictorial interface. records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. individual records also provide links to other species within the taxonomic family. thus, the dfl website attains high ratings in both usability and content. for functionality and presentation, however, the ratings are medium. although for each item there are videos and still images obtained from threedimensional volume renderings and mri scans, they are small in size and have low resolution. there is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires java, but this author could not get it to work despite best efforts. one nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 48 figure 1. annotated three-dimensional rendering of internal structures of hammerhead shark, from the digital fish library (http://www.digitalfishlibrary.org/library/viewimage.php?id=2851) the eton myers collection (http://etonmyers.bham.ac.uk/3d-models.html) the eton myers collection of ancient egyptian art is housed at eton college, and a project to threedimensionally digitize the items for public access was undertaken via collaboration between that institution and the university of birmingham. digitization was accomplished with threedimensional laser scanners, data was then processed with geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (chapman, gaffney, and moulden 2010). usability of the eton myers collection website is extremely low. the initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. another website from the university of birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. there is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the university of birmingham site, there is no way to know to what item any given number refers. the http://www.digitalfishlibrary.org/library/viewimage.php?id=2851 http://etonmyers.bham.ac.uk/3d-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection information technology and libraries | june 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. the eton myers collection website also receives a low rating for functionality. in order to access three-dimensional models of items, users must download and install a program called meshlab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in meshlab. despite compression, some of the file folders are still quite large and take some time to download. presentation of the items is also rated low. even for the high resolution versions of the three-dimensional renderings, viewed in meshlab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). this is evident from a comparison of the threedimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. figure 2. comparison of original photograph (left) and three-dimensional rendering (right) of item number ecm 361, from the eton myers collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3d&op-latest_year=%3d). notably, chapman, gaffney, and moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 50 epigraphia 3d project (http://www.epigraphia3d.es) the epigraphia 3d project was created to present an online collection of various historical roman epigraphs (also known as inscriptions) that were discovered and excavated in spain and italy; the physical collection is housed at the museo arqueológico nacional (madrid). digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and photoshop was used to obtain appropriate textures. finally, the three-dimensional model was published on the web using sketchfab, a web service similar to flickr that allows in-browser viewing of three-dimensional renderings in many different formats (ramírez-sánchez et al. 2014). the epigraphia 3d website is intuitive and informative. browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. thus, usability is rated as medium. despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. however, the epigraphia 3d website receives very high ratings for functionality and presentation. the individual threedimensional models are detailed, legible, and interactive. individual inscriptions are transcribed for each item. the use of sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. when viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. users can also display the wireframe model and the textured or surfaced rendering, as shown in figure 3 below. figure 3. three-dimensional textured (left) and wireframe (middle) renderings from the epigraphia 3d project (http://www.epigraphia3d.es/3d-01.html), as compared to an original twodimensional photograph of the same object (right) (http://edabea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 information technology and libraries | june 2016 51 smithsonian x 3d (http://3d.si.edu) the smithsonian x 3d project, although affiliated with all of the smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). the website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. the website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). functionality and presentation, however, are rated high. the x3d explorer in-browser software (powered by autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. the individual three-dimensional models are high resolution, detailed, and wellrendered, with very good surface texture mapping. however, it must be noted that the x3d explorer tool is in beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. table 3, below, summarizes the results of the evaluation. functionality usability presentation content mfa medium very high low very high dfl medium high medium high eton myers low low low low epigraphia 3d very high medium very high medium smithsonian x 3d high medium high medium table 3. summary of evaluation results for each website by individual criteria discussion based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. when digitizing, the museum should utilize the method that best suits the object or collection. for example, while mri scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (abel et al. 2011; berquist et al. 2012). regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (metallo and rossi 2011). however, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the mfa’s, on which everything other than the object itself is presented outstandingly. further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their twodimensional photographs, like the epigraphia 3d project. there is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (ramírez-sánchez et al. 2014). finally, compatibility is a key issue when conducting threedimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. in the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. the mfa website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. furthermore, remember that the website is simply a way to access the museum itself. hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. the museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the smithsonian x 3d project. finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. users should not be required to download software or files to view records; epigraphia 3d’s use of sketchfab and the smithsonian’s x 3d explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (given and mctavish 2010). in order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. in closing, however, it must be noted that this author is not a technician. although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. in addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. conclusion there may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. this paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. information technology and libraries | june 2016 53 references abate, d., r. ciavarella, g. furini, g. guarnieri, s. migliori, and s. pierattini. “3d modeling and remote rendering technique of a high definition cultural heritage artefact.” procedia computer science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. abel, r. l., s. parfitt, n. ashton, simon g. lewis, beccy scott, and c. stringer. “digital preservation and dissemination of ancient lithic technology with modern micro-ct.” computers and graphics 35, no. 4 (august 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. berquist, rachel m., kristen m. gledhill, matthew w. peterson, allyson h. doan, gregory t. baxter, kara e. yopak, ning kang, h.j. walker, philip a. hastings, and lawrence r. frank. “the digital fish library: using mri to digitize, database, and document the morphological diversity of fish.” plos one 7, no. 4: (april 2012). http://dx.doi.org/10.1371/journal.pone.0034499. bincsik, monika, shinya maezaki, and kenji hattori. “digital archive project to catalogue exported japanese decorative arts.” international journal of humanities and arts computing 6, no. 1– 2 (march 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. cameron, fiona. “digital futures i: museum collections, digital technologies, and the cultural construction of knowledge.” curator: the museum journal 46, no. 3 (july 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. chane, camille simon, alamin mansouri, franck s. marzani, and frank boochs. “integration of 3d and multispectral data for cultural heritage applications: survey and perspectives.” image and vision computing 31, no. 1 (january 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. chapman, henry p., vincent l. gaffney, and helen l. moulden. “the eton myers collection virtual museum.” international journal of humanities and arts computing 4, no. 1–2 (october 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. dellepiane, m., m. callieri, f. ponchio, and r. scopigno. “mapping highly detailed colour information on extremely dense 3d models: the case of david's restoration.” computer graphics forum 27, no. 8 (december 2008): 2178–87. http://dx.doi.org/10.1111/j.14678659.2008.01194.x. given, lisa m., and lianne mctavish. “what’s old is new again: the reconvergence of libraries, archives, and museums in the digital age.” library quarterly 80, no. 1 (january 2010): 7– 32. http://dx.doi.org/10.1086/648461. hariri, nadjla, and yaghoub norouzi. “determining evaluation criteria for digital libraries’ user interface: a review.” the electronic library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. hess, mona, francesca simon millar, stuart robson, sally macdonald, graeme were, and ian brown. “well connected to your digital object? e-curator: a web-based e-science platform for museum artefacts.” literary and linguistic computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 54 holovachov, oleksandr, andriy zatushevsky, and ihor shydlovsky. “whole-drawer imaging of entomological collections: benefits, limitations and alternative applications.” journal of conservation and museum studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. hunter, jane, and anna gerber. 2010. “harvesting community annotations on 3d models of museum artefacts to enhance knowledge, discovery and re-use.” journal of cultural heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. jarrell, michael c. “providing access to three-dimensional collections.” reference & user services quarterly 38, no. 1 (1998): 29–32. kravchyna, victoria, and sam k. hastings. “informational value of museum web sites.” first monday 7, no. 4 (february 2002). http://dx.doi.org/10.5210/fm.v7i2.929. kuzminsky, susan c. and megan s. gardiner. “three-dimensional laser scanning: potential uses for museum conservation and scientific research.” journal of archaeological science 39, no. 8 (august 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. lerma, josé luis, and colin muir. “evaluating the 3d documentation of an early christian upright stone with carvings from scotland with multiples images.” journal of archaeological science 46 (june 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. louw, marti, and kevin crowley. “new ways of looking and learning in natural history museums: the use of gigapixel imaging to bring science and publics together.” curator: the museum journal 56, no. 1 (january 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. metallo, adam, and vince rossi. “the future of three-dimensional imaging and museum applications.” curator: the museum journal 54, no. 1 (january 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. montani, isabelle, eric sapin, richard sylvestre, and raymond marquis . “analysis of roman pottery graffiti by high resolution capture and 3d laser profilometry.” journal of archaeological science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. newell, jenny. “old objects, new media: historical collections, digitization and affect.” journal of material culture 17, no. 3 (september 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. novati, gianluca, paolo pellegri, and raimondo schettini. “an affordable multispectral imaging system for the digital museum.” international journal on digital libraries 5, no. 3 (may 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. pallas, john, and anastasios a. economides. “evaluation of art museums' web sites worldwide.” information services and use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/isu2008-0554. parandjuk, joanne c. “using information architecture to evaluate digital libraries.” the reference librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.1080/02763870903579737 information technology and libraries | june 2016 55 pavlidis, george, anestis koutsoudis, fotis arnaoutoglou, vassilios tsioukas, and christodoulos chamzas. “methods for 3d digitization of cultural heritage.” journal of cultural heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. ramírez-sánchez, manuel, josé-pablo suárez-rivero, and maría-ángeles castellano-hernández. “epigrafía digital: tecnología 3d de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” el profesional de la información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. saracevic, tefko. “digital library evaluation: toward an evolution of concepts.” library trends 49, no. 3 (2000): 350–69. srinivasan, ramesh, robin boast, jonathan furner, and katherine m. becvar. “digital museums and diverse cultural knowledges: moving past the traditional catalog.” the information society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. xie, hong iris. “users’ evaluation of digital libraries (dls): their uses, their criteria, and their assessment.” information processing and management 44, no. 3 (may 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 introduction current trends and goals in the development of makerspaces at new england college and research libraries ann marie l. davis information technology and libraries | june 2018 94 ann marie l. davis (davis.5257@osu.edu) is faculty librarian of japanese studies at the ohio state university. abstract this study investigates why and which types of college and research libraries (crls) are currently developing makerspaces (or an equivalent space) for their communities. based on an online survey and phone interviews with a sample population of crls in new england, i found that 26 crls had or were in the process of developing a makerspace in this region. in addition, several other crls were actively promoting and diffusing the maker ethos. of these libraries, most were motivated to promote open access to new technologies, literacies, and stem-related knowledge. introduction and overview makerspaces, alternatively known as hackerspaces, tech shops, and fab labs, are trendy new sites where people of all ages and backgrounds gather to experiment and learn. born of a global community movement, makerspaces bring the do-it-yourself (diy) approach to communities of tinkerers using technologies including 3d printers, robotics, metaland woodworking, and arts and crafts.1 building on this philosophy of shared discovery, public libraries have been creating free programs and open makerspaces since 2011.2 given their potential for community engagement, college and research libraries (crls) have also been joining the movement in growing numbers.3 in recent years, makerspaces in crls have generated positive press in popular and academic journals. despite the optimism, scholarly research that measures their impact is sparse. for example, current library and information science literature overlooks why and how various crls choose to create and maintain their respective makerspace. likewise, there is scant data on the institutional objectives, frameworks, and experiences that characterize current crl makerspace initiatives.4 this study begins to fill this gap by investigating why and which types of crls are creating makerspaces (or an equivalent room or space) for their library communities. specifically, it focuses on libraries at four-year colleges and research universities in new england. throughout this study, makerspace is used interchangeably with other terms, including maker labs and innovation spaces, to reflect the variation in names and objectives that underlie the current trends. in exploring their motives and experiences, this article provides a snapshot of the current makerspace movement in crls. mailto:davis.5257@osu.edu current trends and goals in the development of makerspaces | davis 95 https://doi.org/10.6017/ital.v37i2.9825 the study finds that the number of crls actively involved in the makerspace movement is growing. in addition to more than two dozen that have or are in the process of developing a makerspace, another dozen crls have staff who support the diffusion of maker technologies, such as 3d printing and crafting tools that support active learning and discovery, in the campus library and beyond.5 comprising research and liberal arts schools, public and private, and small and large, the crls involved with makerspaces are strikingly diverse. despite these differences, this population is united by common objectives to promote new literacies, provide open access to new technologies, and foster a cooperative ethos of making. literature review the body of literature on library makerspaces is brief, descriptive, and often didactic. given the newness of the maker movement in public and academic libraries, many articles focus on early success stories and defining the movement vis-à-vis the mission of the library. for instance, laura britton, known for having created the first makerspace in a public library (the fayetteville free library’s fabulous laboratory), defines a makerspace as “a place where people come together to create and collaborate, to share resources, knowledge, and stuff.”6 this definition, she determines, is strikingly similar to that of the library. most literature on makerspaces appears in academic blogs, professional websites, and popular magazines. among the most frequently cited is tj mccue’s article, which celebrates britton’s (née smedley) fablab while distilling the intellectual underpinnings of the makerspace ethos.7 phillip torrone, editor of make: magazine, supports smedley’s project as an example of “rebuilding” or “retooling” our public spaces.8 within this camp, david lankes, professor of information studies at syracuse university, applauds such work as activist and community-oriented librarianship.9 many authors emphasize the philosophical “fit,” or intersection, of public makerspaces with the principles of librarianship. building on torrone’s work, j. l. balas claims that creating access to resources for learning and making is in keeping with the “library’s historical role of providing access to the ‘tools of knowledge.’”10 others emphasize the hands-on, participatory, and intergenerational features of the maker movement, which has the potential to bridge the digital divide.11 still others identify areas of literacy, innovation, and ste(a)m skills where library makerspaces can have a broad impact. while public libraries often focus on early childhood or adult education, crls adopt separate frameworks for information literacy. like public libraries, they aim to build (meta)literacies and ste(a)m skills. nevertheless, their programs often tailor to curricular goals in the arts and sciences or specialized degrees in engineering, education, and business. this is especially true of crls situated within large, research-intensive universities. considering their specific missions and aims, this study seeks to identify the goals and challenges that reinforce the development of makerspaces in undergraduate and research environments. research design and method data presented in this study was gathered from library directors (or their designees) through an online survey and oral telephone interviews. after choosing a sampling frame of crls in new england, i developed a three-path survey, sent invitations, and collected and analyzed data using the online platform surveymonkey. the survey was distributed following review by the information technology and libraries | june 2018 96 institutional review board (irb) at southern connecticut state university, where i completed a master of library science (mls) degree. survey population to assess generalized findings for the larger population in north america, i chose a clustersampling approach that limited the survey population to the crls in new england. in generating the sampling frame, i included four-year and advanced-degree institutions based on the assumption that libraries at these schools supported specialized, research, or field-specific degrees. i omitted for-profit and two-year institutions, based on the assumption that they are driven by separate business models. this process generated a contact list of 182 library directors at the designated crls in connecticut, maine, massachusetts, new hampshire, rhode island, and vermont. survey design the purpose of the survey was to gather basic data about the size and structure of the respondents’ institutions and to gain insights on their views and practices regarding makerspaces (the survey is reproduced in the appendix). the first page of the survey contained a statement of consent, including my contact information and that of my irb. after a short set of preliminary questions, the survey branched into one of three paths based on respondents’ answers about makerspaces. the respondents were thus categorized into one of three groups: path one (p1) for those with no makerspace and no plans to create one, path two (p2) for those with plans to develop a makerspace in the near future, and path three (p3) for those already running a makerspace in their libraries. p3 was the longest section of the survey, containing several questions about p3 experiences with makerspaces such as staffing, programing, and objectives. data collection in summer 2015, brief email invitations and two reminders were sent to the targeted population.12 to increase the participation rate, i sometimes wrote personal emails and made direct phone calls to crls known to have makerspace. for cold-call interviews, i developed a script explaining the nature of the online survey. after obtaining informed consent, i proceeded to ask the questions in the online survey and manually enter the participants’ responses at the time of the interview. on a few occasions, online respondents followed up with personal emails volunteering to discuss their library’s experiences in more detail. i took advantage of these invitations, which often provided unique and welcome insights. in analyzing the responses, i used tabulated frequencies for quantitative results and sorted qualitative data into two different categories. the first category was identified as “short and objective” and coded and analyzed numerically. the longer, more “subjective and value-driven” data was analyzed for common trends, relationships, and patterns. within this second category, i also identified outlier responses that suggested possible exceptions to common experiences. results the survey closed after one month of data collection. at this time, 55 of 182 potential respondents had participated, yielding a response rate of 30.2%. among these participants, the survey achieved a 100.0% response rate (9 completed surveys of 9 targeted crls) among libraries that were current trends and goals in the development of makerspaces | davis 97 https://doi.org/10.6017/ital.v37i2.9825 currently operating makerspaces. i created a list of all known crl makerspaces in new england based on an exhaustive website search of all crls in this region. subsequent interviews with the managers of the makerspaces on this list revealed no other hidden or unknown makerspaces in this region. of the 55 respondents, 29 (52.7%) were in p1, 17 (30.9%) were in p2, and 9 (16.4%) were in p3. (see figure 1.) figure 1. survey participants’ (n = 55) current crl efforts and plans to develop and operate a makerspace. among respondents in p2 and p3, the majority (13 of 23) indicated that they were from libraries that served a student population of 4,999 people or fewer, while only one library served a population of 30,000 or more (see figure 2). in terms of sheer numbers, makerspaces might seem to be gaining traction at smaller crls, but proportionally, one cannot say that smaller crls are adopting makerspaces at a higher rate because the majority of survey participants have student populations of 19,999 or less (51, or 91.1%). the number of institutions with populations over 20,000 were in a clear minority (5, or 8.9%). (see figure 3.) information technology and libraries | june 2018 98 figure 2. p2 and p3 crls with makerspaces or concrete plans to develop a makerspace. figure 3. the majority of crls (67.2%) that participated in the survey had a population of 4,999 students or less. only 1.8% of schools that participated had a population of 30,000 students or more. current trends and goals in the development of makerspaces | davis 99 https://doi.org/10.6017/ital.v37i2.9825 crls with no makerspace (p1 = 29) in the first part of the survey, the majority of p1 respondents demonstrated positive views toward makerspaces despite having no plans to create one in the near future. budgetary and space limitations aside, many were relatively open to the possibility of developing a makerspace in a more distant future. in the words of one respondent, “we have several areas within the library that present a heavy demand on our budget. in [the] future, we would love to consider a makerspace, and whether it would be a sensible and appropriate investment that would benefit our students.” when asked what their reasons were for not having a makerspace, some respondents (8, or 27.6%) said they had not given it much thought, but most (21, or 72.4%) offered specific answers. among these, the most frequently cited reason (11, or 37.8%) was that a library makerspace would be redundant: such spaces and labs were already offered in other departments within the institution or in the broader community. at one crl, for example, the respondent said the library did not want to compete with faculty initiatives elsewhere on campus. other reasons included that makerspaces were expensive and not a priority. some (5, or 17.2%) libraries preferred to allocate their funds to different types of spaces such as “a very good book arts studio/workshop” or “simulation labs.” some (6, or 20.6%) shared concerns about a lack of space, staff, or simply “a good culture of collaboration [on campus].” merging these sentiments, one respondent concluded, “people still need the library to be fairly quiet. . . . having makerspace equipment in our library would be too distracting.” while some were skeptical (sharing concerns about potential hazards or that makerspaces were simply “the flavor of the month”), the majority (roughly 60%) were open and enthusiastic. one respondent, in fact, held a leadership position in a community makerspace beyond campus. according to this librarian, 3d printers, scanners, and laser cutters were sure to become more common, and crls would no doubt eventually develop “a formal space for making stuff.” crls with plans for a makerspace in the near future (p2 = 17) the second section of the survey (p2) focused primarily on the motivations and means by which this cohort planned to develop a makerspace. when asked why they were creating a makerspace, the most common response was to promote learning and literacy (15 respondents, or 88.2%). in addition, a large majority (12 respondents, or 70.6%) felt that makerspaces helped to promote the library as relevant, particularly in the digital age. three more reasons that earned top scores (10 respondents each, or 58.2%) were being inspired by the ethos of making, creating a complement to digital repositories and scholarship initiatives, and providing access to expensive machines or tools. additional reasons included building outreach and responding to community requests.13 (see figure 4.) information technology and libraries | june 2018 100 figure 4. rationale behind p2 respondents’ decision to plan a makerspace (n = 17). while p2 respondents indicated a clear decision to create a makerspace, their timeframes were noticeably different. i categorized their open responses into one of six timeframes: “within six months,” “within one year,” “within two years,” “within four years,” “within six years,” and “unknown.” the result presented a clear trimodal distribution with three subgroups: six crls with plans to open within 18 months, five with plans to open within the next two years, and six with plans to open after three or more years (see figure 5). in addition to their timeframe, p2 respondents were also asked about their plans for financing their future makerspaces. based on their open responses, the following six funding sources emerged: • the library budget, including surplus moneys or capital project funds • internal funding, including from campus constituents • donations and gifts • external grants • cost recovery plans, including small charges to users • not sure/in progress current trends and goals in the development of makerspaces | davis 101 https://doi.org/10.6017/ital.v37i2.9825 figure 5. p2 respondents’ timeframe for developing the makerspace (n = 17). with seven mentions, the most common of the above funding was the “library budget.” with two mentions each, the least common sources were “cost recovery” and “not sure/in progress.” among those who mentioned external grant applications, one respondent mentioned a focus on women and stem opportunities, and another specifically discussed attempts at grants from the institute of museum and library services. (see figure 6.) figure 6. p2respondents’ plans for gathering and financing makerspace (n = 17). regarding target user groups, some respondents focused on opportunities to enhance specific disciplinary knowledge, while others emphasized a general need for creating a free and open environment. one respondent mentioned that at her state-funded library, the space would be “geared to younger [primary and secondary school] ages,” “student teachers,” and “librarians on practicum assignments.” by contrast, another respondent at a large, private, carnegie r1 information technology and libraries | june 2018 102 university emphasized that the space was earmarked for the undergraduate and graduate students. in contrast to the cohort in p1, a notable number in p2 chose to create a makerspace despite the existence of maker-oriented research labs elsewhere on campus. as one respondent noted, the university was still “lacking a physical space where people could transition between technologies” and an open environment “where students doing projects for faculty” could come, especially later in the evenings. another respondent at a similarly large, private institution explained that his colleagues recognized that most labs at their university were earmarked for specific professional schools. as a result, his colleagues came up with a strategy to provide self-service 3d printing stations at the media center, located in the library at the heart of campus. crls with operating makerspaces (p3 = 9) the final section of the survey (p3) focused on the motivations and means by which crls with makerspaces already in operation chose to develop and maintain their sites. in addition, this section gathered information on p3 crl funding decisions, service models, and types of users in their makerspaces. of the nine respondents in this path, all had makerspaces that had opened within the last three years. among these, roughly a third (4) had been in operation from one to two years; another third (3) had operated for two to three years; and two had opened within the last year. (see table 1.) table 1. length of time the crl makerspace has been in operation for p3 respondents (n = 9). age of crl makerspace or lab—p3 answer options responses % less than 6 months 1 11.1 6–12 months 1 11.1 1–2 years 4 44.4 2–3 years 3 33.3 more than 3 years 0 0.0 total responses 9 100.0 priorities and rationale the reasons behind p3 decisions to make a makerspace were slightly different from those of p2. while “promoting literacy and learning” was still a top priority, two other reasons, “promoting the maker culture of making” and “providing access to expensive machinery,” were deemed equally important (6 respondents, or 66.7%, for each). other significant priorities included “promoting community outreach” (4 respondents, or 44.4%), “promoting the library as relevant” and in “direct response to community requests” (3 respondents, or 33.3%, for each). (see figure 7.) current trends and goals in the development of makerspaces | davis 103 https://doi.org/10.6017/ital.v37i2.9825 figure 7. rationale behind p3 respondents’ decision to develop and maintain a makerspace (n = 9). the answer of “other” was also given top priority (5 respondents, or 55.6%). i conclude that this indicated a strong desire among respondents to express in their own words their library’s unique decisions and circumstances. (their free responses to this question are discussed below.) a familiar theme in the responses of the five respondents who elaborated on their choice of “other” was the desire to situate a makerspace in the central and open environment of the campus library. as one participant noted, there were “other access points and labs on campus,” but those labs were “more siloed” or cut off from the general population. by contrast, the campus library aimed to serve a broader population and anticipated a general “student need.” later, the same respondent added that the makerspace was an opportunity to promote social justice, cultivate student clubs, and encourage engagement at the hub of the campus community. this type of ecumenical thinking was manifested in a similar remark that the library’s role was to reinforce other learning environments on campus. one respondent saw the makerspace as an additional resource “that complemented the maker opportunities that we have had in our curriculum resource center for decades.” likewise, the library makerspace was intended to offer opportunities to a range of users on campus and beyond. funding, staffing, and service models when prompted to discuss how they gathered the resources for their makerspaces, the largest group (4 respondents) stated that a significant means for funding was through gifts and donations. thus, the majority of crl makerspaces in new england depended primarily on contributions from friends of the library, university/college alumni, and donors. the second most common source (3 respondents) was through the library budget, including surplus money at the end of the year. making use of grant money and cost recovery were mentioned by two library participants, and internal and constituent support was useful for two libraries. (see figure 8.) information technology and libraries | june 2018 104 figure 8. p3 methods for gathering and financing a makerspace (n = 9). among these, a particularly noteworthy case was a makerspace that had originated from a new student club focused on 3d printing. originally based in a student dorm, the club was funded by a campus student union, which allocated grant money to students through a budget derived from the college tuition. as the club quickly grew, it found significant support in the library, which subsequently provided space (on the top floor of the library), staff, and financial support from surplus funds in the library budget. as this example would suggest, the sum of the responses showed that financing the makerspaces depended on a combination of strategies. one participant summarized it best: “we’ve slowly accumulated resources over time, using different funding for different pieces. some grant funding. mostly annual budget.” regarding service models, more than half of these libraries (five) currently offer a combination of programming and open lab time where users could make appointments or just drop in. by contrast, two of the libraries offered programs only, and did not offer an open lab; another two did the opposite, offering no programming but an open makerspace at designated times. of the latter, one is open monday to friday from 8 a.m. to 4 p.m., and the other is open during regular hours, with spaces that “can be booked ahead for classes or projects.” most labs supported drop-in visitors and were open evenings and weekends. at one makerspace, where there was increasingly heavy demand, the staff required students to submit proposals with project goals. (see table 2.) while some libraries brought in community experts, others held faculty programs, and some scheduled lab time for individual classes. one makerspace prioritized not only the campus, but also the broader community, and thus featured programs for local high schools and seniors. responses from this library emphasized the social justice thread that inspired their work and the community culture that they aimed to foster. current trends and goals in the development of makerspaces | davis 105 https://doi.org/10.6017/ital.v37i2.9825 table 2. model for services offered in the crl makerspace or 3d printing lab do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? answer options responses % yes, we offer the following types of programs. 2 22.2 no, we simply leave the makerspace/lab open at the specific times. 2 22.2 we do both. we offer the programs and leave the makerspace/lab open at specific times. 5 55.6 as this data would suggest, most makerspaces were used by students (undergraduates and graduates) and faculty, in addition to local experts and generational groups. survey responses showed that undergraduate students were the most common users (9 of 9 respondents checked this group as the most frequent type of user), and faculty and graduate students were the second and third most common (8 of 9 respondents checked these groups as most frequent) user groups in the labs. local entrepreneurs, artists, designers, craftspeople, and campus and library staff also use the makerspaces. (see figure 9.) when prompted to identify “other” categories, one respondent specifically listed “learners, makers, sharers, studiers, [and] clubs.” figure 9. of the different types of users listed above, p3 respondents ranked them in order of who used the makerspace or equivalent lab most often (n = 9). the number and type of staff that managed and operated the makerspaces also varied widely at the nine crls in p3. seven of the crls employed full-time, dedicated staff, among whom four participants checked off the “dedicated staff”–only options. of the remaining two crls, one information technology and libraries | june 2018 106 reported staffing the makerspace with only one student, and one reported not having any staff working in the makerspace. i assume that the makerspace with no employees is managed by staff and students who are assigned to other, unspecified library departments or work groups. (see figure 10.) figure 10. the staffing situations at the p3 respondents (n = 9), where each respondent is assigned a letter from “a” to “i.” library programing was also diverse in terms of targeted audiences, speakers, and learning objectives. instructional workshops varied from 3d scanning and printing to soldering, felt making, sewing, knitting, robotics, and programming (e.g., raspberry pi.) the type of equipment contained in each lab is likely correlated to the range in programming; however, investigating these links was beyond the scope of this study. regarding this equipment, the size and activity of the participant crls varied considerably. some responses were more specific than others, and thus the resulting dataset was incomplete (see table 3.) challenges and philosophies of crl makerspaces the final portion of the survey invited participants to freely offer their thoughts about operating a crl makerspace. what follows below is a summary of the two most prominent themes that emerged: the challenges of building the lab and the social philosophies that framed these initiatives. in terms of challenges, the most common hurdle noted was the tremendous learning curve involved in establishing, maintaining, and promoting a makerspace. setting up some of the 3d printers, for example, required knowledge about electrical networks, computer systems, and safety policies at a federal and local level. once the hardware was running, lab managers needed to know how the machines interfaced with different and challenging software applications. communication skills were also critical, as one respondent reported, “printing anything and everything takes knowledge, experience.” communicating with stakeholders and users in accessible and proactive ways required strong teaching and customer service skills. current trends and goals in the development of makerspaces | davis 107 https://doi.org/10.6017/ital.v37i2.9825 table 3. the types of tools and equipment used at p3 crl respondents (n = 8), which are assigned letters from a to h. major equipment offered by individual library makerspaces or equivalent labs—path 3 crl label response text a die cut machine, 3d printer, 3d pens, raspberry pi, arduino, makey makey, art supplies, sewing supplies, pretty much anything anyone asks for we will try to get. b 2 makerbot replicators, 1 digital scanner, 1 othermill c 3d printing, 3d scanning, and laser cutting. d 3d printing, 3d scanning, laser cutting, vinyl cutting, large format printing, cnc machine, media production/postproduction. e no response f 3 creatorx, 1 powerspec, 3 m3d, 2 replicator 2, 1 replicator2x, 1 makergear, 1 leapfrogxl, 1 ultimaker, 1type a,1 deltaprinter, 1 delta maker, 2 printrbot, 2 filabots, 2x-box kinect for scanning, 2 oculus rifts, embedded systems cabinet with soldering stations, solar panels and micro controllers etc, 1 formlabs sla, 1 muve sla, rova 5, a bunch of quadcopters g 3d printers (4 printers, 3 models), 3d scanning/digitizing equipment (3 models), raspberry pi, arduino, a laser cutter and engraving system, poster printer, digital drawing tablets, gopro, a variety of editing and design software, a number of tools (e.g. dremel, soldering iron, wrenches, pliers, hammers, etc.), and a number of consumable or misc. items (e.g. paint, electrical tape, acetone, safety equipment, led lights, screws and nails, etc.) h 48 printers (all makerbot brand), 35 replicator 5th gen (a moderate size printer, 5 replicator z18 printers (larger built size), and 5 replicator minis, 3 replicator 2x) 5 makerbot digitzers (turntable scanners 8" by 8") 1 cubify sense hand scanner 7 still cameras for photogrammetry 21 i-mac computers 2 mac pros 2 wacom graphics tablets (thinking about complementing other resources at other labs on campus) another challenge that often came up was that of managing resources. as one respondent warned, crls should beware the “early adoption of certain technologies,” which can become “quickly information technology and libraries | june 2018 108 outdated by a rapidly growing field.” for others, it was a challenge to recruit the right staff that could run and fix machines in constant need of repair. in addition to hiring people with manufacturing and teaching skills, a successful lab required individuals who were savvy about outreach and community needs. despite such challenges, many respondents were eager to discuss the aspirations and rewards of crl makerspaces. above all, respondents focused on the pedagogical opportunities on the one hand, and the potential for outreach and social justice on the other. one participant conceded that measuring advances in literacy and education was “intangible,” but he saw great value in “giving students the experience of seeing their ideas come to fruition.” the excitement that this created for one student manifested in a buzz, and subsequently a “fever” or groundswell, in which more users came in to tinker and learn. meanwhile, the learning that took place among future professionals on campus was “critical,” even when results did not “go viral.” the aspiration to create human connections within and beyond campus was another striking theme. according to one respondent, the makerspace had “enabled some incredibly fruitful collaborations with different departments on campus.” this “fantastic outcome” was becoming more and more visible as the maker community grew. other crl makerspaces took pride in fostering a type of learning that was explicitly collaborative, exciting, and even “fun” for users. this in turn meant that some libraries were becoming “very popular,” generating a lot of “good pr,” and becoming central in the lives of new types of library users. along these lines, some respondents aimed to leverage the power of the makerspace to achieve social justice goals that resonated with core values of librarianship. according to one enthusiastic participant, the ethos of sharing was alive and strong among the staff and the many students who saw their participation in the lab as a lifestyle and culture of collaborating. in another initiative, the respondent looked forward to eventually offering grants to those users who proposed meaningful ways to use the makerspace to create practical value for the community. from this perspective, there was added value in having the 3d printing lab situated specifically on a college or university campus. according to this respondent, the unique quality of the crl makerspace was that by virtue of its location amid numerous and energetic young people, it was ripe for exploitation by those “who had great ideas and time and energy to do good.” discussion the aim of this study was to explore why and which types of crls had developed makerspaces (or an equivalent space) for their communities. of the 56 respondents, roughly half (46%) were p2 and p3 libraries who were currently developing or operating a makerspace, respectively. data from this survey indicated that none of the p2 or p3 crls fit a mold or pattern in terms of their size, educational models, or classifications. upon analyzing the data, i found that the differentiators between the three groups were less clearly defined than originally anticipated. in one example of blurred lines, at least two respondents in p1 indicated that they were more actively engaged with makerspaces than two respondents in p2. despite not having physical labs within their libraries, these p1 respondents were in the process of actively supporting or making plans for a makerspace within their crl community. one p1 respondent, for example, served on the planning board for a local community makerspace and had therefore “thoroughly investigated and used” the makerspace at a current trends and goals in the development of makerspaces | davis 109 https://doi.org/10.6017/ital.v37i2.9825 neighboring university. based on his knowledge, he decided to develop a complementary initiative (e.g., a book arts workshop) at his university library. although his library did not yet have a formal makerspace, he felt confident that the diffusion of 3d printers would come to his library in the near future. another p1 respondent was responsible for administering faculty teaching and innovation grants. among the recent grant recipients were two faculty collaborators who used the library’s funds to build a makerspace at a campus location that was separate from the library. although the makerspace was not directly developed by the respondent’s library, it was nevertheless a direct product of his library’s programmatic support. the respondent reported that for this reason, his library did not want to compete with its own faculty initiatives. in another example of blurred distinctions, one librarian in p2 was as deeply immersed in providing access and education on makerspaces as his colleagues in p3. although he was not clear on when or how his library would finance a future makerspace, his library already offered many of the same services and workshops as p3 libraries. as a “maker in the library,” he offered noncredit-bearing 3d printing seminars to students and offered trial 3d printing services in the library for graduates of the 3d printing seminar. in addition, he made appearances at relevant campus events. when the university museum ran a 3d printing day, for instance, he participated as an expert panelist and gave public demonstrations on library-owned 3d printers and a scanner kinect bar. in sum, despite the respondents’ categorization in p1 and p2, they sometimes shared more in common with the cohorts in p2 and p3, respectively. given their library’s programmatic involvement in creating and endorsing the maker movement, these respondents were more than just “interested” or “open to” the prospect of creating a makerspace. while only 16% of crls (p3 = 9) responded as actively operating a makerspace, another 30% (p2 = 17) were involved in developing a makerspace in the near future. moreover, the number of crls formally involved with the diffusion of maker technologies was not limited to just these two groups. although some makerspaces were not directly run by the library, they had come to fruition because of librarybased funding, grants, and professional support. and although some libraries did not have immediate plans for a makerspace, they were already promoting maker technologies and the maker ethos in other significant ways. conclusion this study is one of the first comprehensive and comparative studies on crl makerspace programs and their respective goals, policies, and outcomes. while the number of current crl makerspaces is relatively low, the data suggests that the population is increasing; a growing number of crls are involved in the makerspace movement. more than two dozen crls were planning to develop makerspaces in the near future, helping to diffuse maker technologies through crl programming, and/or supporting nonlibrary maker initiatives on campus and beyond. in addition, some crls were buying equipment, hiring dedicated staff, offering relevant workshops and demonstrations, and supporting community efforts to build labs beyond the library. although the author aimed to find structural commonalities between crls in groups p2 and p3, none were found. respondents in these groups came from institutions of all sizes , a wide variety information technology and libraries | june 2018 110 of endowment levels, and both public and private funding models, and they ranged in emphasis from the liberal arts to professional certifications and graduate-level research. although a majority of crl respondents were not currently making plans to create a makerspace, many respondents were enthusiastic about current trends, and some even promoted the maker movement in unexpected ways. acknowledging the steady diffusion of 3d printers, many anticipated using such technologies in the future to promote traditional library values and goals. respondents in p2 and p3 indicated that their primary rationale for developing a makerspace was to promote learning and literacy. other prominent reasons included promoting library outreach and the maker culture of learning. data from crls with makerspaces indicated that these benefits were often symbiotic and correlated to strong ideas about universal access to emergent tools and practices in learning. unexpected challenges for developing and operating makerspaces include staffing them with highly skilled, knowledgeable, and service-oriented employees. learning the necessary skills— including operating the printers, troubleshooting models, and maintaining a safe environment, to name a few—was time-consuming and labor intensive. the majority of funding for crls with or planning maker labs came from internal budgets, gifts and donors, and some grants. while some p1 crls indicated that their reason for not developing makerspaces was a lack of community interest, p2 and p3 crls were not necessarily motivated by user requests or needs, nor was lack of explicit need or interest a deterrent. on the contrary, a few reported a desire to promote the campus library as ahead of the curve by keeping in front of student and community needs. in a similar contradiction, some p1 respondents reported that their libraries did not want to compete with other labs on campus. respondents from p2 and p3, however, wanted to offer an alternative to the more siloed or structured model of departmentor lab-funded makerspaces. although makerspaces were sometimes forming in other parts of campus, some p2 and p3 crls felt there was a gap in accessibility and therefore aimed to offer more open and flexible spaces. a final salient theme among p2 and p3 respondents was their commitment to equity of access and issues of social justice. above all, they saw a unique fit for makerspaces in their crl philosophies to serve the greater good. among other advantages, crls were in a unique position to leverage the power of the makerspaces to take advantage of campus communities of “cognitive surplus” and millennial aspirations to share and create spontaneous communities of knowledge. given the amount of resources that are required to create and maintain a makerspace, this research will be useful for crls considering such a space in the future. the present data suggests that no one type of library currently has a monopoly on maker spaces; regardless of size or funding levels, the common thread among p2 and p3 crls was simply a commitment to providing access to emergent technologies and supporting new literacies. while annual budgets and grant applications were critical for some libraries, the majority of crls funded the bulk of their makerspaces through gifts and donations. future studies on the characteristics and challenges of p2 and p3 populations beyond those in new england will certainly amplify our understanding of these trends. current trends and goals in the development of makerspaces | davis 111 https://doi.org/10.6017/ital.v37i2.9825 appendix: survey questions informed consent current trends in the development of makerspaces and 3d printing labs at new england college and research libraries consent for the participation in a research study southern connecticut state university purpose you are invited to participate in a research project conducted by ann marie l. davis, a masters student in library and information studies at southern connecticut state university. the purpose of this project is to investigate the experiences and goals of college and research libraries (crls) that currently have or are making plans to have an open makerspace (or an equivalent room or space). the results from this study will be included in a special project report for the mls degree and the basis for an article to submit for peer-review. procedures if you decide to participate, you will volunteer to take a fifteen-minute online survey. risks and inconveniences there are no known risks associated with this research; other than taking a short amount of time, the survey should not burden you or infringe on your privacy in any way. potential benefits and incentive by participating in this research, you will be contributing to our understanding of current trends and practices with regards to community learning labs in crls. in addition, you will be providing useful knowledge that can support other libraries in making more informed decisions as they potentially develop their own makerspaces in the future. voluntary participation your participation in this research study is voluntary. you may choose not to participate and you may withdraw your consent to participate at any time. you will not be penalized in any way should you decide not to participate or withdraw from this study. protection of confidentiality the survey is anonymous and does not ask for sensitive or confidential information. contact information before you consent, please ask any questions on any aspect of this study that is unclear to you. you may contact me at my student email address at any time: xxx@owls.southernct.edu. if you have questions regarding your rights as a research participant, you may contact the southern connecticut state institutional review board at (203) xxx-xxxx. information technology and libraries | june 2018 112 consent by proceeding to the next page, you confirm that you understand the purpose of this research, the nature of this survey and the possible burdens and risks as well as benefits that you may experience. by proceeding, this indicates that you have read this consent form, understand it , and give your consent to participate and allow your responses to be used in this research. acrl survey on makerspaces and 3d printers q1. what is the size of your college or university? • 4,999 students or less • 5,000–9,999 students • 10,000–19,999 students • 20,000–29,999 students • 30,000 students or more q2. how would you categorize your institution? (please check all that apply) • private • public • doctorate-granting university (awards 20 or more doctorates) • master’s college or university (awards 50 or more master’s degrees, but fewer than 20 doctorates) liberal arts and sciences college • other q3. do any of the libraries at your institution have a makerspace or equivalent hands-on learning lab (including a 3-d printing station or lab)? • yes [if “yes,” respondents are directed to question 14] • no [if “no,” respondents are directed to question 4] q4. do any of the libraries at your institution have plans to develop a makerspace or equivalent learning lab in the near future? • yes [if “yes,” respondents are directed to question 8] • no [if “no,” respondents are directed to question 5] path one (crls with no makerspace, no plans for makerspace) q5. are there specific reasons why your institution has decided not to pursue developing a makerspace or equivalent lab in the near future? • no reasons. we have not given much thought to makerspaces for our library. • yes q6. thank you for your participation. would you like a copy of the results when the report is completed? if yes, please enter your email address in the space provided. current trends and goals in the development of makerspaces | davis 113 https://doi.org/10.6017/ital.v37i2.9825 • no • yes (please enter your email address below) q7. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. path two [crls with plans to build a makerspace] q8. what are the main goals that motivated your library’s decision to develop a makerspace or equivalent lab? (please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q9. of these goals, please rank them in order of their level of priority for your library. (choose “n/a” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q10. what is your library’s time frame for developing a makerspace or equivalent lab? q11. what are your library’s current plans for gathering and/or financing the resources needed for developing and maintaining the makerspace or equivalent lab? q12. thank you for your participation. would you like a copy of the results when the report is completed? • no • yes (please enter your email address below) q13. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. information technology and libraries | june 2018 114 path three [crls with a makerspace] q14. how long have you had your makerspace or equivalent learning lab? • less than 6 months • 6–12 months • 1–2 years • 2–3 years • more than 3 years q15. what were the main goals that motivated your library's decision to develop a makerspace or equivalent lab? (please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs other q16. of these goals, please rank them in order of their level of priority for your library. (choose “n/a” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q17. how did your library gather and/or finance the resources needed for developing and maintaining the makerspace or equivalent learning lab? q18. do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? • yes, we offer the following types of programs: • no, we simply leave the makerspace/lab open at the following times (please note times and/or if a reservation is required): • we do both. we offer the following types of programs and leave the makerspace/lab open at the following times (please note types of programs, times open, and if a reservation is required): current trends and goals in the development of makerspaces | davis 115 https://doi.org/10.6017/ital.v37i2.9825 q19. what type of community members tend to use your library's makerspace or equivalent lab most? (please check all that apply) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other q20. of the cohorts chosen above, please rank them in order of who uses the makerspace or equivalent lab most often. (use “n/a” for cohorts that are not relevant to your space or lab) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other q21. how many dedicated staff does your library currently employ for the makerspace or equivalent? • 0 • 1 • 2 • 3 • other q22. where is your makerspace or equivalent lab located? q23. what is the title or name of your makerspace or equivalent lab, and if known, what were the reasons behind this particular name? q24. what major equipment and services does your library makerspace or equivalent lab provide? q25. what unexpected considerations, challenges, or failures has your library faced in developing and maintaining the makerspace or equivalent lab? q26. how would you assess the benefits or “return on investment” of having a makerspace or equivalent lab? q27. thank you for your participation. would you like a copy of the final results when the report is completed? if yes, please enter your email address in the space provided. information technology and libraries | june 2018 116 • no • yes (please enter your email address below) q28. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. references and notes 1 laura britton, “a fabulous laboratory: the makerspace at fayetteville free library,” public libraries 51, no. 4 (july/august 2012): 30–33, http://publiclibrariesonline.org/2012/10/afabulous-labaratory-the-makerspace-at-fayetteville-free-library/; madelynn martiniere, “hack the world: how the maker movement is impacting innovation: from diy geige,” medium, october 27, 2014, https://medium.com/@mmartiniere/hack-the-world-how-the-makermovement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz. 2 david v. loertscher, “maker spaces and the learning commons,” teacher librarian 39, no. 6 (october 2012): 45–46, accessed december 9, 2016, library, information science & technology abstracts with full text, ebscohost; jon kalish, “libraries make room for high-tech ‘hackerspaces,’” national public radio, december 25, 2011, http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-techhackerspaces; diane slatter and zaana howard, “a place to make, hack, and learn: makerspaces in australian public libraries,” australian library journal 62, no. 4: 272–84, https://doi.org/10.1080/00049670.2013.853335. 3 sharon crawford barniskis, “makerspaces and teaching artists,” teaching artist journal 12, no. 1: 6–14. 4 anne wong and helen partridge, “making as learning: makerspaces in universities,” australian academic & research libraries 47, no. 3 (september 2016): 143–59, https://doi.org/10.1080/00048623.2016.1228163. 5 erich purpur et al., “refocusing mobile makerspace outreach efforts internally as professional development,” library hi tech 34, no. 1 (2016): 130–42. 6 britton, “a fabulous laboratory,” 30. 7 tj mccue, “first public library to create a maker space,” forbes, november 15, 2011, http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-makerspace/. 8 phillip torrone, “is it time to rebuild and retool public libraries and make ‘techshops’?,” make:, march 20, 2011, http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-publiclibraries-and-make-techshops/. 9 r. david lankes, “killing librarianship,” (keynote speech, new england library association annual conference, october 3, 2011, burlington, vermont), https://davidlankes.org/killinglibrarianship/. http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces https://doi.org/10.1080/00049670.2013.853335 https://doi.org/10.1080/00048623.2016.1228163 http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ https://davidlankes.org/killing-librarianship/ https://davidlankes.org/killing-librarianship/ current trends and goals in the development of makerspaces | davis 117 https://doi.org/10.6017/ital.v37i2.9825 10 janet l. balas, “do makerspaces add value to libraries?,” computers in libraries 32, no. 9 (november 2012): 33. 11 balas, “do makerspaces add value to libraries?,” 33; adrian g smith et al., “grassroots digital fabrication and makerspaces: reconfiguring, relocating and recalibrating innovation?” (working paper, university of sussex, spru working paper swps, falmer, brighton, september 2013), https://doi.org/10.2139/ssrn.2731835. 12 the number of and interval between emails corresponded roughly with dillman’s “five-contact framework” as outlined in carolyn hank, mary wilkins jordan, and barbara m. wildemuth, “survey research,” in applications of social research methods to questions in information and library science, edited by barbara wildemuth, 256–69 (westport, ct: libraries unlimited, 2009), 261. 13 in choosing these priorities, respondents were asked to select as many of the reasons that applied to their own crl. https://doi.org/10.2139/ssrn.2731835 abstract introduction and overview literature review research design and method survey population survey design data collection results crls with no makerspace (p1 = 29) crls with plans for a makerspace in the near future (p2 = 17) crls with operating makerspaces (p3 = 9) priorities and rationale funding, staffing, and service models challenges and philosophies of crl makerspaces discussion conclusion appendix: survey questions informed consent purpose procedures risks and inconveniences potential benefits and incentive voluntary participation protection of confidentiality contact information consent acrl survey on makerspaces and 3d printers path one path two path three references and notes december_ital_fifarek_final president’s message: focus on information ethics aimee fifarek information technologies and libraries | december 2016 1 just a few weeks ago we held yet another successful lita forum1, this time in fort worth, tx. tight travel budgets and time constraints mean that only a few hundred people get to attend forum each year, but that is one of the things that make it a great conference. because of its size you have a realistic chance of meeting everyone there, whether it’s at game night, one of the many networking dinners, or just for during hallway chitchat after a session. and the sessions really do give you something to talk about. this year i couldn’t help but notice a theme. among all the talk about makerspace technologies, analytics, and specific software platforms, the one bubble that kept rising to the surface was information ethics. why are you doing what you are doing with the information you have, and should you really be doing it? have you stopped to think what impact collecting, posting, sharing that information is going to have on the world around you? in a post-election environment replete with talk of fake news and other forms of deliberate misinformation, lita forum presenters seem to have tapped in to the zeitgeist. tara robertson, in her closing keynote2, talked about the harm digitizing analog materials can do when what is depicted is sensitive to individuals and communities. waldo jaquith of us open data talked about how a government decision to limit options on a birth certificate to either “white” or “colored” effectively wiped the native population out of political existence in virginia. and sam kome from claremont colleges talked about how well-meaning librarians can facilitate privacy invasion merely by collecting operational statistics3. there were many other examples brought out by forum speakers but these in particular emphasized the real consequences the serious consequences the use of data – intentional or not – can have on people. i think it is time for librarians4 to get more vocal about information ethics and the role we play in educating the population about humane information use. our profession has always been forward thinking about information literacy and is traditionally known for helping our communities make judgements about the information they consume. but we have not done enough to declare our expertise in the information economy, to stand up and say “we’re librarians – this is what we do.” now, more than ever, people need the skills to think critically about the information they are consuming via all kinds of media, understand the consequences of allowing algorithms to shape their information universe, and make quality judgments about trading their personal information for goods and services. to quote from unesco: aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v35i4.9602 2 changes brought about by the rapid development of information and communication technologies (ict) not only open tremendous opportunities to humankind but also pose unprecedented ethical challenges. ensuring that information society is based upon principles of mutual respect and the observance of human rights is one of the major ethical challenges of the 21st century.5 i challenge all librarians to make a commitment to propagating information ethics, both personally and professionally. make an effort to get out of your social media echo chamber6 and engage with uncomfortable ideas. when you see biased information being shared consider it a “teachable moment” and highlight the spin or present more neutral information. and if your library is not actively making information literacy and information ethics part of its programming and instruction, then do what you can to change it. offer to be on a panel, create a curriculum, or host a program that includes key concepts relating to information “ownership, access, privacy, security, and community”7. the focus of the libraries transform campaign this year is all about our expertise: “because the best search engine in the library is the librarian”8 it’s our time to shine. references 1. http://forum.lita.org/home/ 2. http://forum.lita.org/speakers/tara-robertson/ 3. http://forum.lita.org/sessions/patron-activity-monitoring-and-privacy-protection/ 4. as always, when i use the term “librarian” my intention is to include any person who works in a library and is skilled in information and library science, not to limit the reference to those who hold a library degree. 5. http://en.unesco.org/themes/ethics-information 6. https://www.wnyc.org/story/buzzfeed-echo-chamber-online-news-politics/ 7. https://en.wikipedia.org/wiki/information_ethics 8. http://www.ilovelibraries.org/librariestransform/ laneconnex | ketchell et al. 31 laneconnex: an integrated biomedical digital library interface debra s. ketchell, ryan max steinberg, charles yates, and heidi a. heilemann this paper describes one approach to creating a search application that unlocks heterogeneous content stores and incorporates integrative functionality of web search engines. laneconnex is a search interface that identifies journals, books, databases, calculators, bioinformatics tools, help information, and search hits from more than three hundred full-text heterogeneous clinical and bioresearch sources. the user interface is a simple query box. results are ranked by relevance with options for filtering by content type or expanding to the next most likely set. the system is built using component-oriented programming design. the underlying architecture is built on apache cocoon, java servlets, xml/xslt, sql, and javascript. the system has proven reliable in production, reduced user time spent finding information on the site, and maximized the institutional investment in licensed resources. m ost biomedical libraries separate searching for resources held locally from external database searching, requiring clinicians and researchers to know which interface to use to find a specific type of information. google, amazon, and other web search engines have shaped user behavior and expectations.1 users expect a simple query box with results returned from a broad array of content ranked or categorized appropriately with direct links to content, whether it is an html page, a pdf document, a streaming video, or an image. biomedical libraries have transitioned to digital journals and reference sources, adopted openurl link resolvers, and created institutional repositories. however, students, clinicians, and researchers are hindered from maximizing this content because of proprietary and heterogeneous systems. a strategic challenge for biomedical libraries is to create a unified search for a broad spectrum of licensed, open-access, and institutional content. n background studies show that students and researchers will use the search path of least cognitive resistance.2 ease and speed are the most important factors for using a particular search engine. a university of california report found that academic users want one search tool to cover a wide information universe, multiple formats, full-text availability to move seamlessly to the item itself, intelligent assistance and spelling correction, results sorted in order of relevance, help navigating large retrievals by logical subsetting and customization, and seamless access anytime, anywhere.3 studies of clinicians in the patient-care environment have documented that effort is the most important factor in whether a patient-care question is pursued.4 for researchers, finding and using the best bioinformatics tool is an elusive problem.5 in 2005, the lane medical library and knowledge management center (lane) at the stanford university medical center provided access to an expansive array of licensed, institutional, and open-access digital content in support of research, patient care, and education. like most of its peers, lane users were required to use scores of different interfaces to search external databases and find digital resources. we created a local metasearch application for clinical reference content, but it did not integrate result sets from disparate resources. a review of federated-search software in the marketplace found that products were either slow or they limited retrieval when faced with a broad spectrum of biomedical content. we decided to build on our existing application architecture to create a fast and unified interface. a detailed analysis of lane website-usage logs was conducted before embarking on the creation of the new search application. key points of user failure in the existing search options were spelling errors that could easily be corrected to avoid zero results; lack of sufficient intuitive options to move forward from a zero-results search or change topics without backtracking; lack of use of existing genre or role searches; confusion about when to use the resource, openurl resolver, or pubmed search to find a known item; and results that were cognitively difficult to navigate. studies of the web search engine and the pubmed search log concurred with our usagelog analysis: a single term search is the most common, with three words maximum entered by typical users.6 a pubmed study found that 22 percent of user queries were for known items rather than for a general subject, confirming our own log analysis findings that the majority of searches were for a particular source item.7 search-term analysis revealed that many of our users were entering partial article citations (e.g., author, date) in any query debra s. ketchell (debra.ketchell@gmail.com) is the former associate dean for knowledge management and library director; ryan max steinberg (ryan.max.steinberg@stanford .edu) is the knowledge integration programmer/architect; charles yates (charles.yates@stanford.edu) is the systems software developer; and heidi a. heilemann (heidi.heilemann@stanford .edu) is the former director for research & instruction and current associate dean for knowledge management and library director at the lane medical library & knowledge management center, information resources & technology, stanford university school of medicine, stanford, california. 32 information technology and libraries | march 2009 box expecting that article databases would be searched concurrently with the resource database. our displayed results were sorted alphabetically, and each version of an item was displayed separately. for the user, this meant a cluttered list with redundant title information that increased their cognitive effort to find meaningful items. overall, users were confronted with too many choices upfront and too few options after retrieving results. focus groups of faculty and students were conducted in 2005. attendees wanted local information integrated into the proposed single search. local information included content such as how-to information, expertise, seminars, grand rounds, core lab resources, drug formulary, patient handouts, and clinical calculators. most of this content is restricted to the stanford user population. users consistently described their need for a simple search interface that was fast and customized to the stanford environment. in late 2005, we embarked on a project to design a search application that would address both existing points of failure in the current system and meet the expressed need for a comprehensive discovery-andfinding tool as described in focus groups. the result is an application called laneconnex. n design objectives the overall goal of laneconnex is to create a simple, fast search across multiple licensed, open-access, and special-object local knowledge sources that depackages and reaggregates information on the basis of stanford institutional roles. the content of lane’s digital collection includes forty-five hundred journal titles and fortytwo thousand other digital resources, including video lectures, executable software, patient handouts, bioinformatics tools, and a significant store of digitized historical materials as a result of the google books program. media types include html pages, pdf documents, jpeg images, mp3 audio files, mpeg4 videos, and executable applications. more than three hundred reference titles have been licensed specifically for clinicians at the point of care (e.g., uptodate, emedicine, stat-ref, and micromedex clinical evidence). clinicians wanted their results to reflect subcomponents of a package (e.g., results from the micromedex patient handouts). other clinical content is institutionally managed (e.g., institutional formulary, lab test database, or patient handouts). more than 175 biomedical research tools have been licensed or selected from open-access content. the needs of biomedical researchers include molecular biology tools and software, biomedical literature databases, citation analysis, chemical and engineering databases, expertise-finding tools, laboratory tools and supplies, institutional-research resources, and upcoming seminars. the specific objectives of the search application are the following: n the user interface should be fast, simple, and intuitive, with embedded suggestions for improving search results (e.g., did you mean? didn’t find it? have you tried?). n search results from disparate local and external systems should be integrated into a single display based on popular search-engine models familiar to the target population. n the query-retrieval and results display should be separated and reusable to allow customization by role or domain and future expansion into other institutional tools. n resource results should be ranked by relevance and filtered by genre. n metasearch results should be hit counts and filtered by category for speed and breadth. results should be reusable for specific views by role. n finding a known article or journal should be streamlined and directly link to the item or “get item” option. n the most popular search options (pubmed, google, and lane journals) should be ubiquitous. n alternative pathways should be dynamic and interactive at the point of need to avoid backtracking and dead ends. n user behavior should be tracked by search term, resource used, and user location to help the library make informed decisions about licensing, metadata, and missing content. n off-the-shelf software should be used when available or appropriate with development focused on search integration. n the application should be built upon existing metadata-creation systems and trusted webdevelopment technologies. based on these objectives, we designed an application that is an extension of existing systems and technologies. resources are acquired and metadata are provided using the voyager integrated library system (ils). the sfx openurl link resolver provides full-text article access and expands the title search beyond biomedicine to all online journals at stanford. ezproxy provides seamless off-campus access. webtrends provides usage tracking. movable type is used to create faq and help information. a locally developed metasearch application provides a cross search with hit results from more than three hundred external and internal full-text sources. the technologies used to build laneconnex and integrate all of these systems include extensible stylesheet language laneconnex | ketchell et al. 33 transformations (xslt), java, javascript, the apache cocoon project, and oracle. n systems description architecture laneconnex is built on a principle of separation of concerns. the lane content owner can directly change the inclusion of search results, how they are displayed, and additional path-finding information. application programmers use java, javascript, xslt, and structured query language (sql) to create components that generate and modify the search results. the merger of content design and search results occurs “just in time” in the user’s browser. we use component-oriented programming design whereby services provided within the application are defined by simple contracts. in laneconnex, these components (called “transformers”) consume xml information and, after transforming it in some way, pass it on to some other component. a particular contract can be fulfilled in different ways for different purposes. this component architecture allows for easy extension of the underlying apache cocoon application. if laneconnex needs to transform some xml data that is not possible with built-in cocoon transformers, it is a simple matter to create a software component that does what is needed and fulfills the transformer contract. apache cocoon is the underlying architecture for laneconnex, as illustrated in figure 1. this java servlet is an xml–publishing engine that is built upon a component framework and uses a pipeline-processing model. a declarative language uses pattern matching to associate sets of processing components with particular request urls. content can come from a variety of sources. we use content from the local file system, network file system, http, and a relational database. the xslt language is used extensively in the pipelines and gives fine control of individual parts of the documents being processed. the end of processing is usually an xhtml document but can be any common mime type. we use cocoon to separate areas of concern so things like content, look and feel, and processing can all be managed as separate entities by different groups of people with little effect on another area. this separation of concerns is manifested by template documents that contain most of the html content common to all pages and are then combined with content documents within a processing pipeline. the declarative nature of the sitemap language and xslt facilitate rapid development with no need to redeploy the entire application to make changes in its behavior. the laneconnex search is composed of several components integrated into a query-and-results interface: oracle resource metadata, full-text metasearch application, movable type blogging software, “did you mean?” spell checker, ezproxy remote access, and webtrends tracking. n full-text metasearch integration of results from lane’s metasearch application illustrates cocoon’s many strengths. when a user searches laneconnex, cocoon sends his or her query to the metasearch application, which then dispatches the request to multiple external, full-text search engines and content stores. some examples of these external resources are uptodate, access medicine, micromedex, pubmed, and md consult. the metasearch application interacts with these external resources through jakarta commons http clients. responses from external resources are turned into w3c document object model (dom) objects, and xpath expressions are used to resolve hit counts from the dom objects. as result counts are returned, they are added to an xml–based result list and returned to cocoon. the power of cocoon becomes evident as the xml– based metasearch result list is combined with a separate display template. this template-based approach affords content curators the ability to directly add, group, and describe metasearch resources using the language and look that is most meaningful to their specific user communities. for example, there are currently eight metasearch templates curated by an informationist in partnership with a target community. curating these templates requires little to no assistance from programmers. in lane’s 2005 interface, a user’s request was sent to the metasearch application, and the application waited five seconds before responding to give external resources a chance to return a result. hit counts in the user interface included a link to refresh and retrieve more results from external resources that had not yet responded. usability studies showed this to be a significant user barrier, since the refresh link was rarely clicked. the initial five second delay also gave users the impression that the site was slow. the laneconnex application makes heavy use of javascript to solve this problem. after a user makes her initial request, javascript is used to poll the metasearch application (through cocoon) on the user’s behalf, popping in result counts as external resources respond. this adds a level of interactivity previously unavailable and makes the metasearch piece of laneconnex much more successful than its previous version. resource metadata laneconnex replaces the catalog as the primary discovery interface. metadata describing locally owned and 34 information technology and libraries | march 2009 licensed resources (journals, databases, books, videos, images, calculators, and software applications) are stored in the library’s current system of record, an instance of the voyager ils. laneconnex makes no attempt to replace voyager ’s strengths as an application for the selection, acquisition, description, and management of access to library resources. it does, however, replace voyager ’s discovery interface. to this end, metadata for about eight thousand digital resources is extracted from voyager ’s oracle database, converted into marcxml, processed with xslt, and stored in a simple relational database (six tables and twenty-nine attributes) to support fast retrieval speed and tight control over search syntax. this extraction process occurs nightly, with incremental updates every five minutes. the oracle text search engine provides functionality anticipated by our internet-minded users. key features are speed and relevance-ranked results. a highly refined results ranking insures that the logical title appears in the first few results. a user ’s query is parsed for wildcard, boolean, proximity, and phrase operators, and then translated into an sql query. results are then transformed into a display version. related services laneconnex compares a user’s query terms against a dictionary. each query is sent to a cocoon spell-checking component that returns suggestions where appropriate. this component currently uses the simple object figure 1. laneconnex architecture. laneconnex | ketchell et al. 35 access protocol (soap)–based spelling service from google. google was chosen over the national center for biotechnology information (ncbi) spelling service because of the breadth of terms entered by users; however, cocoon’s component-oriented architecture would make it trivial to change spell checkers in the future. each query is also compared against stanford’s openurl link resolver (findit@stanford). client-side javascript makes a cocoon-mediated query of findit@stanford. using xslt, findit@stanford responses are turned into javascript object notation (json) objects and popped into the interface as appropriate. although the vast majority of laneconnex searches result in zero findit@stanford results, the convenience of searching all of lane’s systems in a single, unified interface far outweighs the effort of implementation. a commercial analytics tool called webtrends is used to collect web statistics for making data-centric decisions about interface changes. webtrends uses client-side javascript to track specific user click events. libraries need to track both on-site clicks (e.g., the user clicked on “clinical portal” from the home page) and off-site clicks (e.g., the user clicked on “yamada’s gastroenterology” after doing a search for “ibs”). to facilitate off-site click capture, webtrends requires every external link to include a snippet of javascript. requiring content creators to input this code by hand would be error prone and tedious. laneconnex automatically supplies this code for every class of link (search or static). this specialized webtrends method provides lane with data to inform both interface design and licensing decisions. n results laneconnex version 1.0 was released to the stanford biomedical community in july 2006. the current application can be experienced at http://lane.stanford.edu. the figure 2. laneconnex resource search results. resource results are ranked by relevance. single word titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. uniform titles are used to co-locate versions (e.g., the three instances of science from different producers). journals titles are linked to their respective impact factor page in the isi web of knowledge. digital formats that require special players or restrictions are indicated. the metadata searched for ejournals, databases, ebooks, biotools, video, and medcalcs are lane’s digital resources extracted from the integrated library system into a searchable oracle database. the first “all” tab is the combined results of these genres and the lane site help and information. figure 3. laneconnex related services search enhancements. laneconnex includes a spell checker to avoid a common failure in user searches. ajax services allow the inclusion of search results from other sources for common zero results failures. for example, the stanford link resolver database is simultaneously searched to insure online journals outside the scope of biomedicine are presented as a linked result for the user. production version has proven reliable over two years. incremental user focus groups have been employed to improve the interface as issues arose. a series of vignettes will be used to illustrate how the current version of 36 information technology and libraries | march 2009 the “sunetid login” is required. n user query: “new yokrer.” a faculty member is looking for an article in the new yorker for a class reading assignment. he makes a typing error, which invokes the “did you mean?” function (see figure 3). he clicks on the correct spelling. no results are found in the resource search, but a simultaneous search of the link-resolver database finds an instance of this title licensed for the campus and displays a clickable link for the user. n user query: “pathway analysis.” a post–doc is looking for information on how to share an ingenuity pathway. figure 4 illustrates the integration of the locally created lane faqs. faqs comprise a broad spectrum of help and how-to information as described by our focus groups. help text is created in the movable type blog software, and made searchable through the laneconnex application. the movable type interface lowers the barrier to html content creation by any staff member. more complex answers include embedded images and videos to enable the user to see exactly how to do a particular procedure. cocoon allows for the syndication of subsets of this faq content back into static html pages where it can be displayed as both category-specific lists or as the text for scroll-over help for a link. having a single store of help information insures the content is updated once for all instances. n user query: “uterine cancer kapp.” a resident is looking for a known article. laneconnex simultaneously searches pubmed to increase the likelihood of user success (see figure 5). clicking on the pubmed tab retrieves the results in the native interface; however, the user sees the pubmed@stanford version, which includes embedded links to the article based on our openurl link resolver. the ability to retrieve results from bibliographic databases that includes article resolution insures that our biomedical community is always using the correct url to insure maximum full-text article access. user testing in 2007 found that adding the three most frequently used sources (pubmed, google, and lane catalog) into our one-box laneconnex search was a significant time saver. it addresses laneconnex meets the design objectives from the user’s perspective. n user query: “science.” a graduate student is looking for the journal science. the laneconnex results are listed in relevance order (see figure 2). singleword titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. results from local metadata are displayed by uniform title. for example, lane has three instances of the journal science, and each version is linked to the appropriate external store. brief notes provide critical information for particular resources. for example, restricted local patient education documents and video seminars note that figure 4. example of integration of local content stores. help information is managed in moveable type and integrated into laneconnex search results. laneconnex | ketchell et al. 37 the expectation on the part of our users that they could search for an article or a journal title in a single search box without first selecting a database. n user query: “serotonin pulmonary hypertension.” a medical student is looking for the correlation of two topics. clicking on the “clinical” tab, the student sees the results of the clinical metasearch in figure 6. metasearch results are deep searches of sources within licensed packages (e.g., textbooks in md consult or a specific database in micromedex), local content (e.g., stanford’s lab-test database), and openaccess content (e.g., ncbi databases). pubmed results are tailored strategies tiered by evidence. for example, the evidence-summaries strategy retrieves results from twelve clinical-evidence resources (e.g., buj, clinical evidence, and cochrane systematic reviews) that link to the full-text licensed by stanford. an example of the bioresearch metasearch is shown in figure 7. content selected for this audience includes literature databases, funding sources, patents, structures, clinical trials, protocols, and stanford expertise integrated with gene, protein, and phenotype tools. user testing revealed that many users did not click on the “clinical” tab. the clinical metasearch was originally developed for the clinical portal page and focused on clinicians in practice; however, the results needed to be exposed more directly as part of the laneconnex search. figure 8 illustrates the “have you tried?” feature that displays a few relevant clinical-content sources without requiring the user to select the “clinical” tab. this feature is managed by the smartsearch component of the laneconnex system. smartsearch sends the user’s query terms to pubmed, extracts a subset of articles associated with those terms, extracts the mesh headings for those articles, and computes the frequency of headings in the articles to determine the most likely mesh terms associated with the user’s query terms. these mesh terms are mapped to mesh terms associated with each metasearch resource. preliminary evaluation indicates that the clinical content is now being discovered by more users. figure 5. example of integration of popular search engines into laneconnex results. three of the most popular searches based on usage analysis are included at the top level. pubmed and google are mapped to lane’s link resolver to retrieve the full article. creating or editing metasearch templates is a curator driven task. programming is only required to add new sources to the metasearch engine. a curator may choose from more than three hundred sources to create a discipline-based layout using general templates. names, categories, and other description information are all at the curator ’s discretion. while developing new subspecialty templates, we discovered that clinicians were confused by the difference in layout of their specialty portal and their metasearch results (e.g., the cardiology portal used the generic clinical metasearch). to address this issue, we devised an approach that merges a portal and metasearch into a single entity as illustrated in figure 9. a combination of the component-oriented architecture of laneconnex and javascript makes the integration of metasearch results into a new template patterned after a portal easy to implement. this strategy will enable the creation of templates contextually appropriate to knowledge requests originating from electronic medical-record systems in the future. direct user feedback and usage statistics confirm that search is now the dominant mode of navigation. the amount of time each user spends on the website has dropped since the release of version 1.0. we speculate that the integrated search helps our users find relevant 38 information technology and libraries | march 2009 information more efficiently. focus groups with students are uniformly positive. graduate students like the ability to find digital articles using a single search box. medical students like the clinical metasearch as an easy way to look up new topics in texts and customized pubmed searches. bioengineering students like the ability to easily look up patient care–related topics. pediatrics residents and attendings have championed the development of their portal and metasearch focused on their patient population. medical educators have commented on their ability to focus on the best information sources. n discussion a review of websites in 2007 found that most biomedical libraries had separate search interfaces for their digital resources, library catalog, and external databases. biomedical libraries are implementing metasearch software to cross search proprietary databases. the university of california, davis is using the metalib software to federate searching multiple bibliographic databases.8 the university of south california and florida state university are using webfeat software to search clinical textbooks.9 the health sciences library system at the university of pittsburgh is using vivisimo to search clinical textbooks and bioresearch tools.10 academic libraries are introducing new “resource shopping” applications, such as the endeca project at north carolina state university, the summa project at the university of aarhus, and the vufind project at villanova university.11 these systems offer a single query box, faceted results, spell checking, recommendations based on user input, and asynchronous javascript and xml (ajax) for live status information. we believe our approach is a practical integration for our biomedical community that bridges finding a resource and finding a specific item through figure 6. integration of metasearch results into laneconnex. results from two general, role-based metasearches (bioresearch and clinical) are included in the laneconnex interface. the first image shows a clinician searching laneconnex for serotonin pulmonary hypertension. selecting the clinical tab presents the clinical content metasearch display (second image), and is placed deep inside the source by selecting a title (third image). laneconnex | ketchell et al. 39 a metasearch of multiple databases. the laneconnex application searches across digital resources and external data stores simultaneously and presents results in a unified display. the limitation to our approach is that the metasearch returns only hit counts rather than previews of the specific content. standardization of results from external systems, particularly receipt of xml results, remains a challenge. federated search engines do integrate at this level, but are usually slow or limit the number of results. true integration awaits health level seven (hl7) clinical decision support standards and national information standards organization (niso) metasearch initiative for query and retrieval of specific content.12 one of the primary objectives of laneconnex is speed and ease of use. ranking and categorization of results has been very successful in the eyes of the user community. the integration of metasearch results has been particularly successful with our pediatric specialty portal and search. however, general user understanding of how the clinical and biomedical tabs related to the genre tabs in laneconnex has been problematic. we reviewed web engines and found a similar challenge in presenting disparate format results (e.g., video or image search results) or lists of hits from different systems (e.g., ncbi’s entrez search results).13 we are continuing to develop our new specialty portal-and-search model and our smartsearch term-mapping component to further integrate results. n conclusion laneconnex is an effective and openended search infrastructure for integrating local resource metadata and full-text content used by clinicians and biomedical researchers. its effectiveness comes from the recognition that users prefer a single query box with relevance or categorically organized results that lead them to the most likely figure 7. example of a bioresearch metasearch. figure 8. the smartsearch component embeds a set of the metasearch results into the laneconnex interface as “have you tried?” clickable links. these links are the equivalent of selecting the title from a clinical metasearch result. the example search for atypical malignant rhabdoid tumor (a rare childhood cancer) invokes oncology and pediatric textbook results. these texts and pubmed provide quick access for a medical student or resident on the pediatric ward. figure 9. example of a clinical specialty portal with integrated metasearch. clinical portal pages are organized so metasearch hit counts can display next to content links if a user executes a search. this approach removes the dissonance clinicians felt existed between separate portal page and metasearch results in version 1.0. 40 information technology and libraries | march 2009 answer to a question or prospects in their exploration. the application is based on separation of concerns and is easily extensible. new resources are constantly emerging, and it is important that libraries take full advantage of existing and forthcoming content that is tailored to their user population regardless of the source. the next major step in the ongoing development of laneconnex is becoming an invisible backend application to bring content directly into the user’s workflow. n acknowledgements the authors would like to acknowledge the contributions of the entire laneconnex technical team, in particular pam murnane, olya gary, dick miller, rick zwies, and rikke ogawa for their design contributions, philip constantinou for his architecture contribution, and alain boussard for his systems development contributions. references 1. denise t. covey, “the need to improve remote access to online library resources: filling the gap between commercial vendor and academic user practice,” portal libraries and the academy 3 no.4 (2003): 577–99; nobert lossau, “search engine technology and digital libraries,” d-lib magazine 10 no. 6 (2004), www.dlib.org/dlib/june04/lossau/06lossau.html (accessed mar. 1, 2008); oclc, “college students’ perception of libraries and information resource,” www.oclc.org/reports/ perceptionscollege.htm (accessed mar 1, 2008); and jim henderson, “google scholar: a source for clinicians,” canadian medical association journal 12 no. 172 (2005). 2. covey, “the need to improve remote access to online library resources”; lossau, “search engine technology and digital libraries”; oclc, “college students’ perception of libraries and information resource.” 3. jane lee, “uc health sciences metasearch exploration. part 1: graduate student gocus group findings,” uc health sciences metasearch team, www.cdlib.org/inside/assess/ evaluation_activities/docs/2006/draft_gradreport_march2006. pdf (accessed mar. 1, 2008). 4. karen k. grandage, david c. slawson, and allen f. shaughnessy, “when less is more: a practical approach to searching for evidence-based answers,” journal of the medical library association 90 no. 3 (2002): 298–304. 5. nicola cannata, emanuela merelli, and russ b. altman, “time to organize the bioinformatics resourceome,” plos computational biology 1 no. 7 (2005): e76. 6. craig silverstein et al., “analysis of a very large web search engine query log,” www.cs.ucsb.edu/~almeroth/ classes/tech-soc/2005-winter/papers/analysis.pdf (accessed mar. 1, 2008); anne aula, “query formulation in web information search,” www.cs.uta.fi/~aula/questionnaire.pdf (accessed mar. 1, 2008); jorge r. herskovic, len y. tanaka, william hersh, and elmer v. bernstam, “a day in the life of pubmed: analysis of a typical day’s query log,” journal of the american medical informatics association 14 no. 2 (2007): 212–20. 7. herskovic, “a day in the life of pubmed.” 8. davis libraries university of california, “quicksearch,” http://mysearchspace.lib.ucdavis.edu/ (accessed mar. 1, 2008). 9. eileen eandi, “health sciences multi-ebook search,” norris medical library newsletter (spring 2006), norris medical library, university of southern california, www.usc.edu/hsc/ nml/lib-information/newsletters.html (accessed mar. 1, 2008); maguire medical library, florida state university, “webfeat clinical book search,” http://med.fsu.edu/library/tutorials/ webfeat2_viewlet_swf.html (accessed mar. 1, 2008). 10. jill e. foust, philip bergen, gretchen l. maxeiner, and peter n. pawlowski, “improving e-book access via a librarydeveloped full-text search tool,” journal of the medical library association 95 no. 1 (2007): 40–45. 11. north carolina state university libraries, “endeca at the ncsu libraries,” www.lib.ncsu.edu/endeca (accessed mar. 1, 2008); hans lund, hans lauridsen, and jens hofman hansen, “summa—integrated search,” www.statsbiblioteket.dk/ publ/summaenglish.pdf (accessed mar. 1, 2008); falvey memorial library, villanova university, “vufind,” www.vufind.org (accessed mar. 1, 2008). 12. see the health level seven (hl7) clinical decision support working committee activities, in particular the infobutton standard proposal at www.hl7.org/special/committees/dss/ index.cfm and the niso metasearch initiative documentation at www.niso.org/workrooms/mi (accessed mar 1, 2008). 13. national center for biotechnology information (ncbi) entrez cross-database search, www.ncbi.nlm.nih.gov/entrez (accessed mar. 1, 2008). acrl 5 alcts 15 lita cover 2, cover 3 jaunter cover 4 index to advertisers the lc/marc record as a national standard 159 the desire to promote exchange of bibliographic data has given rise to a rather cacophonous debate concerning marc as a "standard," and the definition of a marc compatible record. much of the confusion has arisen out of a failure to carefully separate the intellectual content of a bibliographic record, the specific analysis to which it is subjected in an lc/marc format, and its physical representation on magnetic tape. in addition, there has been a tendency to obscure the different requirements of users and creators of machine-readable bibliographic data. in general, the standards making process attempts to find a consensus among both groups based on existing practice. the process of standardization is rarely one which relies on enlightened legislation. rather, a more pragmatic approach is taken based on an evaluation of the costs to manufacturers weighed against costs to consumers. even this modest approach is not invested with lasting wisdom. ansi standards, for example, are subject to quinquennial review. standards, as already pointed out, have as their basis common acceptance of conventions. thus, it might prove useful to examine the conventions employed in an lc/marc record. the most important of these is the anglo-american cataloging rules as interpreted by lc. the use of these rules for descriptive cataloging and choice of entry is universal enough that they may safely be considered a standard. similar comments may be made concerning the subject headings used in the dictionary catalog of the library of congress. the physical format within which machine-readable bibliographic data may be transmitted is accepted as a codified national and international standard (ansi z39.2-1971 and iso 2709-1973 (e) ) . this standard, which is only seven pages in length, should be carefully read by anyone seriously concerned with the problems of bibliographic data interchange. ansi z39.2 is quite different from the published lc/ marc formats. it defines little more than the structure of a variable length record. simply stated, ansi z39.2 specifies only that a record shall contain a leader specifying its physical attributes, a directory for identifying elements within the record by numeric tag (the values of the tags are not defined), and optionally, additional designators which may be used to provide further information regarding fields and subfields. this structure is completely general. within this same structure one could transmit book 160 1 oumal of library automation vol. 7 i 3 september 197 4 orders, a bibliographic record, an abstract, or an authority record by adopting specific conventions regarding the interpretation of numeric tags. thus, we come to the crux of the problem, the meanings of the content designators. content designators (numeric tags, subfields, delimiters, etc.) are not synonymous with elements of bibliographic description; rather, they represent the level of explicitness we wish to achieve in encoding a record. it might safely be said that in the most common use of a marc record-card production-scarcely more than the paragraph distinctions on an lc card are really necessary. if we accept such an argument, then we can simply define compatibility with lc/marc by defining compatibility in terms of a particular class of applications, e.g., card, book, or crt catalog creation. a record may be said to be compatible with lcjmarc if a system which accepts a record as created by lc produces from the compatible 1·ecord products not discernibly different from those created from an lc/marc record. thus, what is called for is a family of standards all downwardly compatible with lc/marc, employing ansi z39.2 as a structural base. this represents the only rational approach. the alternative is to accept lc/ marc conventions as worthy of veneration as artistic expression. s. michael malinconico adding value to the university of oklahoma libraries history of science collection through digital enhancement maura valentino information technology and libraries | march 2014 25 “in getting my books, i have been always solicitous of an ample margin; this not so much through any love of the thing itself, however agreeable, as for the facility it affords me of penciling suggested thoughts, agreements and differences of opinion, or brief critical comments in general.” —edgar allan poe abstract much of the focus of digital collections has been and continues to be on rare and unique materials, including monographs. a monograph may be made even rarer and more valuable by virtue of hand written marginalia. using technology to enhance scans of unique books and make previously unreadable marginalia readable increases the value of a digital object to researchers. this article describes a case study of enhancing the marginalia in a rare book by copernicus. background the university of oklahoma libraries history of science collections holds many rare books and other objects pertaining to the history of science. one of the rarest holdings is a copy of nicolai copernici torinensis de revolvtionibvs orbium coelestium (on the revolutions of the heavenly spheres), libri vi, a book famous for copernicus’ revolutionary astronomical theory that rejected the ptolemaic earth-centered universe and promoted a heliocentric, sun-centered model. the history of science collections’ copy of this manuscript contains notes added to the margins. similar notes were made in eight different existing copies, and the astrophysicist owen gingerich determined that these notes were created by a group of astronomers in paris known as the offusius group.1 the notes are of significant historical importance as they offer information on the initial reception of copernicus’ theories by the catholic community. having been created almost five hundred years ago in 1543, the handwriting is faded and the ink has absorbed into the paper. maura valentino (maura.valentino@oregonstate.edu) is metadata librarian, oregon state university, corvalis, oregon. previously she was digital initiatives coordinator at the university of oklahoma. mailto:maura.valentino@oregonstate.edu adding value to collections through digital enhancement | valentino 26 written in cursive script, the letters have merged as the ink has dispersed, adding to the difficulties inherent in reading these valuable annotations. the book had previously been digitized, and while some of the margin notes were readable, many of the notes were barely visible. therefore much of the value of the book was being lost in digital form. to rectify this situation the decision was made to enhance the marginalia. it was further decided that once the margin notes were enhanced, two digital representations of each page that contained notes would be included in the digital collection. one copy would present the main text in the most legible fashion (figure 1) and the second copy would highlight the marginalia and ensure that these margin notes were as legible as possible even if in doing so the readability of the main text was diminished (figure 2). figure 1. text readable. figure 2. marginalia enhanced. while creating a written transcript of the marginalia was considered and would have added some value to the digital object, this solution was rejected in favor of digital enhancement for the following reasons. many of the notes contained corrections with lines drawn to the area of text that was being changed, or bracket numbers (figure 3). in addition, some of the notes are corrections of numbers or tables, so a transcript of the text would do little to demonstrate the writer’s intentions in creating the margin note (figure 4). figure 3 .bracketed corrections. figure 4. numerical corrections. information technology and libraries | march 2014 27 also, sometimes there was bleed through from the reverse page, further disrupting the clarity of the marginalia (figures 5 and 6). therefore it was determined that making the notes more readable through digital enhancement would provide the collection’s users with the most useful resource. figure 5. highlighted—bleed through reduced figure 6. bleed through behind. marginalia. the book can be viewed in its entirety here: http://digital.libraries.ou.edu/cdm/landingpage/collection/copernicus literature review “modification of photographs to enhance or change their meaning is nothing new. however, the introduction of techniques for digitizing photographic images and the subsequent development of powerful image editing software has both broadened the possibilities for altering photographs and brought the means of doing so with the reach of anyone with imagination and patience.”2 —richard s. croft the primary goal of this project was to give researchers in the history of science the ability to clearly decipher the marginalia created by the astronomers of the offusius group as they annotated the book using the margins as an editing space. the literature agrees that marginalia is an important piece of history worth preserving. hauptman states, “the thought that produces the necessity for a citation or remark leads directly into the marginal notation.”3 he also adds, “their close proximity to the text allows for immediate visual connection.”4 howard asserts, “for writers and scholars, the margins and endpapers became workshops in which to hammer out their own ideas, and offered spaces in which to file and store information.”5 she also adds that marginalia can “serve as a form of opposition.”6 this is true in this case as some of the marginalia http://digital.libraries.ou.edu/cdm/landingpage/collection/copernicus adding value to collections through digital enhancement | valentino 28 contradicts copernicus. nikolova-houston argues for the historical aspect: “each of the marginalia and colophons is a unique production by its author, and exists in only one copy.”7 she goes on to add, “manuscript marginalia and colophons possess historical value as primary historical sources. they are treated as historical evidence along with other written and oral traditions.”8 such ideas provide a strong justification for the implementation of marginalia enhancement in digital collections. as mentioned above, it was determined that a transcription would not have had the same effect as digital enhancement of the margin notes. this approach is also supported by the literature. for example, ferrari argues for the digital publication of the marginalia that fernando pessoa, the portuguese writer, made while reading. one of the cornerstones of his argument is that digital representation of marginalia allows the reader not only to see the words but also the underlining and other symbols that are not easily put into a transcript. in this way, the user of the digital collection obtains a more complete view of the author of the marginalia’s intent.9 another goal of this project was the general promotion of the university of oklahoma’s history of science collections. johnson, in his new york times article, notes that marginalia lend books an historical context while enabling users to infer other meanings from their texts.10 he also quotes david spadafora, president of the newberry library in chicago, who proclaims that “the digital revolution is a good thing for the physical object.” as more people access historical artifacts in electronic form, he notes, “the more they’re going to want to encounter the real object.”11 in this way, enhancement of the marginalia in digital collections can lead to further exposure for the collection and to greater use of the physical objects themselves. using digital enhancement is not a new idea. morgan asserts, “the innovation of the world wide web is its exciting capacity for space that, while not limitless, is weightless and far less limited that that of the printed book.”12 le, anderson and agarwala also add, “local manipulation of color and tone is one of the most common operations in the digital imaging workflow.”13 the literature shows that other projects have used enhancement of the digital object to increase the usefulness of the original artifact. one of the projects pursued during the library of congress’s american memory initiative involved the digitization of the work of photographer solomon butcher. in this case, technicians were able to enhance an area of one photograph that was blurry in normal photographic processes and allow the viewer to see inside a building.14 the archivo general de indias also used digital enhancement to remove stains and bleed-through from ancient manuscripts and render previously unreadable manuscripts readable.15 in an article advocating for a digital version of william blakes’s poem the four zoas, morgan notes that some features of the manuscript can only be seen in the digital version rather than a transcription: “sections of the manuscript show intense revision, with passages rubbed out, moved earlier or later in the manuscript, and often, added in the margins.”16 information technology and libraries | march 2014 29 digital processing is not limited to the use of photo editing software. although giralt asserts that it is a common method, “the ample potential for image control and manipulation provided by digital technology has stirred a great interest in postproduction, and digital editing.”17 other projects have used various technologies to enhanced images to give added meaning to a digital image. once again, in her article advocating for the digitizing of william blake’s the four zoas, morgan asserts that various enhancement technologies would help readers obtain the greatest benefit from the manuscript. for example, providing “the added benefit of infra-red photography,” would allow “readers to see many of the erased illustrations.”18 she even hopes coding will enhance the usefulness of a digital object: “our impulse to use xml in order to richly encode a text works against passivity. with coding we clarify a work down to its smallest units, and illuminate specific aspects of its structure, aspects that are often less obvious when the work is presented in the form of a printed book.”19 method locating the marginalia each page of the book had been previously scanned and was stored in tagged image file format (tiff). each digital page (tiff image) was carefully examined for marginalia. this was achieved by examining the image in adobe photoshop using the zoom tool to enlarge the image as necessary. as many notes were barely visible, the entirety of each page had to be examined in detail to ensure that margin notes were not overlooked. enlargement of the image in photoshop greatly facilitated this process. enhancing the marginalia once all the pages with marginalia were identified, each page was loaded into adobe photoshop for digital processing and enhancement. the following procedure was used (note: the specific directions that follow reference adobe photoshop cs4 for windows but can be generally applied to most software programs intended for photo editing): 1. using the zoom tool, the image was enlarged to facilitate examination and interaction with the marginalia. 2. individual margin notes were selected using the rectangular marquee tool. the area selected included any lines that were drawn from the notes to the original text so it would be clear to what text the margin note referred. 3. as the handwritten margin notes were orange in tone, a blue filter was applied (as blue is the contrasting color to orange) by selecting adjustments from the image menu and then choosing black and white to display the black and white dialog box. in the black and white dialog box, blue filter was selected from the preset drop-down menu. this small adjustment greatly enhanced the readability of the margin notes. 4. with the area still selected, adjustments was again selected from the image menu. from that adjustments submenu, brightness and contrast was selected. adjustments were made to both these values using the sliders presented by the resulting dialog box to adding value to collections through digital enhancement | valentino 30 further enhance the margin notes legibility. for this particular project, the values selected were generally negative twenty for contrast and positive twenty for brightness. file naming conventions each enhanced image was saved with the same filename as the digital image of the original manuscript page, but with an a (for annotated) added to the end of the filename. this naming scheme enabled a distinction between pages with and without enhanced marginalia. this series of steps was repeated for each page (see table 1). page name explanation book spine pictures of the covers book cover inside cover blank page with ruler to measure page folio 001 page 1 as originally scanned folio 001 verso page 1, reverse side, as originally scanned folio 001 verso a page 1, reverse side, with highlighted marginalia folio 002 page 2 as originally scanned folio 002a page 2 with highlighted marginalia folio 002 v page 2, reverse side folio 002 v a page 2, reverse side, with highlighted marginalia table 1. filenames. importing into the digital management system contentdm was the digital management system selected for this project. all original manuscript page images and enhanced marginalia page images were imported into contentdm following their creation. the next step was to bring all the pages into contentdm as one compound object. a microsoft excel spreadsheet was created with a line for each page, annotated or not. only three fields were used: title, rights, and filename. a description of the book was placed on its history of science digital collections webpage with a link to the compound object in contentdm, so further metadata was not necessary and can always be added later. the first row only contained the title of the book (no filename). there were tiffs available of the cover, the bookend, the inside cover, and the book with a ruler. these were the next rows. then we began with the pages and titled them as the pages were numbered. there were ten pages numbered with roman numerals and then the pages began with alphanumeric page numbers. each page that had handwritten notes had the original page (page 2, for example) and the page with the information technology and libraries | march 2014 31 notes highlighted (page 2 annotated). this would allow the viewer to view the pages in their original form or with the notes highlighted or both, depending on each user’s research interests. once the excel file was complete with each page and its filename entered in a row, the file was saved as a tab-delimited file. import into contentdm required that all the tiff files be in one folder. once the files were moved, the contentdm compound object wizard was used to import. this book was imported as a compound object with no hierarchy. as this book was published in 1593, it has no chapters. to specify page names, the choice to label pages using tab-delimited object for printing was used. the filenames did not contain page numbers, and the choice to label pages in a sequence was not an option, as two copies of each annotated page existed. each object imported into contentdm has a thumbnail image associated with it. contentdm will create this image, but the cover of this book is not attractive, so a jpeg file was created using an image from the book that is often associated with copernicus (see figure 3). conclusions this project resulted in a digital representation of the physical book that is much more useful to researchers than the original, unenhanced digital object. this history of science collection holds not only the first edition of books important to the history of science, but the subsequent editions so that researchers can see how the ideas of science have changed over time. this new digital edition of de revolutionibus allows researchers to see how another scientist made corrections in copernicus’ book as one step in the change in theory over time and insight into the reaction of the catholic church. the format that contentdm creates for the object and a clear naming scheme allow the user to view the pages with or without the marginalia, thus making this a useful object for many types of users (see figure 4). however, using photoshop to highlight areas of a page allowed the digital initiatives department to understand the power of this tool. in understanding the utility and power of photoshop, the digital initiatives department has determined it to be a useful tool in other projects. a project to eliminate some images of people’s fingers that inadvertently were photographed along with pages in a book or manuscript has been added to the queue. in future, digitized books or manuscripts with useful notes will undergo these enhancement processes. adding value to collections through digital enhancement | valentino 32 references 1. owen gingerich, “the master of the 1550 radices: jofrancus offusius,” journal for the history of astronomy 11 (1993): 235–53, http://adsabs.harvard.edu/full/1993jha....24..235g. 2. richard s. croft, “fun and games with photoshop: using image editors to change photographic meaning” (in: visual literacy in the digital age: selected readings from the annual conference of the international visual literacy association (rochester, ny october 13-17, 1993)): 3-10. 3. robert hauptman, documentation: a history and critique of attribution, commentary, glosses, marginalia, notes, bibliographies, works-cited lists, and citation indexing and analysis (jefferson, nc: mcfarland, 2008). 4. ibid. 5. jennifer howard, “scholarship on the edge,” chronicle of higher education 52, no. 9 (2005). 6. ibid. 7. tatiana nikolova-houston,“marginalia and colophons in bulgarian manuscripts and early printed books,” journal of religious & theological information 8, no. 1/2, (2009), http://www.tandfonline.com/doi/abs/10.1080/10477840903459586#preview. 8. ibid. 9. patricio ferrari, “fernando pessoa as a writing-reader: some justifications for a complete digital edition of his marginalia,” portuguese studies 24, no. 2 (2008): 69–114, http://www.jstor.org/stable/41105307. 10. dirk johnson, “book lovers fear dim future for notes in the margins,” new york times, february 20, 2011, http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y & 11. ibid 12. paige morgan, “the minute particular in the immensity of the internet: what coleridge, hartley and blake can teach us about digital editing,” romanticism 15, no. 2 (2009), http://www.euppublishing.com/doi/abs/10.3366/e1354991x09000774. 13. y. li, e. adelson, and a. agarwala, “scribbleboost: adding classification to edge-aware interpolation of local image and video adjustments,” eurographics symposium on rendering27, no. 4 (2008), http://www.mit.edu/~yzli/eg08.pdf. 14. s. michael malinconico, “digital preservation technologies and hybrid libraries,” information services & use 22, no. 4 (2002): 159–74, http://iospress.metapress.com/content/gep1rx9rednylm2n. http://adsabs.harvard.edu/full/1993jha....24..235g http://www.tandfonline.com/doi/abs/10.1080/10477840903459586%23preview http://www.jstor.org/stable/41105307 http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y& http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y& http://www.euppublishing.com/doi/abs/10.3366/e1354991x09000774 information technology and libraries | march 2014 33 15. ibid. 16. morgan, “minute particular.” 17. gabriel f. giralt, “realism and realistic representation in the digital age,” journal of film & video 62, no. 3 (2010): 3, http://muse.jhu.edu/journals/journal_of_film_and_video/v062/62.3.giralt.html. 18. morgan, “minute particular.” 19. morgan, “minute particular.” http://muse.jhu.edu/journals/journal_of_film_and_video/v062/62.3.giralt.html reducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” microsoft word 13389 20211217 galley.docx article hackathons and libraries the evolving landscape 2014–2020 meris mandernach longmeier information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13389 meris mandernach longmeier (longmeier.10@osu.edu) is head of research services, the ohio state university libraries. © 2021. abstract libraries foster a thriving campus culture and function as “third space,” not directly tied to a discipline.1 libraries support both formal and informal learning, have multipurpose spaces, and serve as a connection point for their communities. for these reasons, they are an ideal location for events, such as hackathons, that align with library priorities of outreach, data and information literacy, and engagement focused on social good. hackathon planners could find likely partners in either academic or public libraries as their physical spaces accommodate public outreach events and many are already providing similar services, such as makerspaces. libraries can act solely as a host for events or they can embed in the planning process by building community partnerships, developing themes for the event, or harnessing the expertise already present in the library staff. this article, focusing on years from 2014 to 2020, will highlight the history and evolution of hackathons in libraries as outreach events and as a focus for using library materials, data, workflows, and content. introduction as a means of introduction to hackathons for those unfamiliar with these events, the following definition was developed after reviewing the literature. hackathons are time-bound events where participants gather to build technology projects, learn from each other and experts, and create innovative solutions that are often judged for prizes. while hacking can have negative connotations when it comes to security vulnerabilities, typically for hackathon events hacking refers to modifying original lines of code or devices with the intent of creating a workable prototype or product. events may have a specific theme (use of a particular dataset or project based on a designated platform) or may be open-ended with challenges focused on innovation or social good. while hackathons have been a staple in software and hardware design for decades, the first hackathons with a library focus were sponsored by vendors, focused on topics such as accessibility and adaptive technology for their content and platforms.2 other industry hackathons focused on re-envisioning the role of the book in 2013 and 2014.3 as hackathons became more popular at colleges and universities, library participation evolved from content provider to event host. these partnerships were beneficial to libraries interested in shifting the perception of libraries from books to newer areas of expertise around data and information literacy. however, many libraries realized that by partnering in planning the events greater possibilities existed to educate participants about library content and staff expertise. some examples include working with public library communities to highlight text as data, having academic subject librarians work with departmental faculty to embed events within curriculum and assignments, and for both academic and public libraries to promote library-produced and publicly available datasets.4 information technology and libraries december 2021 hackathons and libraries |longmeier 2 there are many roles that libraries can take in these events. libraries can act as event hosts where they provide the space at a cost or for free.5 in other cases, library staff become collaborators and in addition to space may assist with planning logistics, judging, building partnerships, and have some staff present at the events.6 in public libraries this often includes building relationships with the city or specific segments of the community based on the theme of the event. on college campuses, it may be a partnership with a specific disciplines or campus it or an outside sponsor. in this way, the libraries are building and sustaining the event due to aligned priorities with the other partners. another option would be for the library to be the primary sponsor, where the library may provide prizes, the theme for the hackathon, as well as many of the items listed above.7 however, instead of specific categories, it should be viewed as a continuum of partnership and the amount of involvement with the event should align with the library’s priorities of what it hopes to accomplish through the event. how involved in event planning specific libraries want to be may depend on the depth of the existing partnerships as well as how many resources the library wants to commit to the event. libraries have always existed as curators and distributors of knowledge. some libraries are using hackathons to advance both their image and their practices. libraries are evolving into new roles and have grown to support more creative endeavors, such as the maker movement. this shift of libraries from book-provider to social facilitator and information co-creator aligns with hackathon events. the physical spaces themselves are ideal to support public outreach events and libraries are already providing makerspaces or similar services that would overlap with a hackathon audience.8 additionally, the spaces afforded by libraries allow flexibility and creativity to flourish, ideas to be exchanged, and different disciplines to mingle and co-produce. library staff focused on software development may have projects that would benefit from outside perspectives as well. in recent years libraries have become stewards of digital collections that can be used and reused in innovative ways. many libraries have chosen wikipedia edit-a-thons as a means of engaging with the public and enhancing access to materials.9 similarly, the collections-as-data movement is blossoming and allowing galleries, libraries, archives, and museum (glam) institutions to rethink the possible ways of interacting with collections. many public libraries are partnering with local or regional governments to build awareness of data sources and build bridges with the community around how they would like to interact with the data.10 additionally, as data science continues to grow in importance in both public and academic libraries, data fluency, data cleaning, and data visualization could be themes for a hackathon or data-thon.11 for those unfamiliar with these events, table 1 provides some generalized definitions created by the author of the different types of events and their intended purpose. for some organizations, there are ways to support these events that consume fewer resources or require less technical knowledge, such as an edit-a-thons or code jams. information technology and libraries december 2021 hackathons and libraries |longmeier 3 table 1. defining common hackathon and hackathon-like events, purpose, and typical size of events type of event definition purpose size of event hackathon a team-based sprint-like event focused on hardware or software that brings together programmers, graphic designers, interface designers, project managers or domain experts; can be open ended idea generation or for a specific provided theme build a working prototype, typically software up to 1,000 participants, usually determined by space available idea fest business pitch competition where individuals or teams pitch a solution or new company (startup) idea to a panel of judges deliver an elevator pitch for an idea, could be to secure funding <100 coding contest or code jam an individual or team competition to work through algorithmic puzzles or on specific code provided learning to code or solve challenges through coding; may produce a pitch at the end rather than a product 20–50 edit-athon an event where users improve content in a specific online community; can focus on a theme (art, country, city) or type of material (map) improving information in online communities such as wikipedia, openstreetmap, or localwiki 20–100 datathon a data-science–focused event where participants are given a dataset and a limited amount of time to test, build, and explore solutions usually a visualization or software development around a particular dataset 50–100 makeathon hardware focused hackathon build working prototype of hardware up to 300 participants methods to find articles in the library and information science literature related to hackathons and libraries, the author searched the association for computing machinery (acm) digital library, scopus, library literature and information science, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: (hackathon* or makeathon*) and library; in library literature and information science and lista databases, the most successful searches included: information technology and libraries december 2021 hackathons and libraries |longmeier 4 hackathon or makeathon or “coding contest.” the author also searched google scholar in an attempt to locate other studies or reports, some of which came from institutional repositories. while this search strategy was not meant to be exhaustive, it uncovered many articles about hackathons and libraries and others were found by chaining citations in the articles reference lists. based on search locales, international articles were found but only those where the text was available in english were included which meant that articles from asia, africa, and the global south may have been inadvertently overlooked. only two of the articles found in the search results were not held in library locations, did not use library/archival materials, or were not an outreach event where library staff were integral in planning (these were discarded.) findings the author grouped the literature into two categories: library as place and library as source. in the realm of library as place, the literature consisted of reporting on hackathons where the library was the host location for the event, those where the hackathon was an outreach event, and those where the hackathon was an extension of the libraries’ teaching/education mission. for most of these articles the majority were case studies and often shared tips for other libraries to consider when hosting a hackathon in library spaces. the second category, those that use library as source, focused on highlighting library spaces or services, workflows, or collections as the theme of the events. additionally, there were a few articles in the second category that discussed how to prepare or clean library data or library sources before the event to ensure that participants were able to use the materials during the time-bound event. in some cases where the source materials were from the libraries, the event also occurred in the library; thus, some articles fit into both categories and are highlighted in both sections. results: library as place the following summaries of hackathons and libraries as places for events will be grouped into two subgenres: library spaces and outreach events. libraries, both public and academic, are ideal locations for hosting large, technology-driven events given the usual amenities of ample parking, ubiquitous wi-fi, adequate outlets, and at times already having 24-hour spaces built into their infrastructure. more and more libraries are offering generous food and drink policies, a benefit as sustenance is a mainstay at these multiday events. additionally, libraries already host a number of outreach events and serve as a community information hub. using libraries as event hosts for hackathons a number of articles detail the use of library spaces to host hackathon events.12 the university of michigan library, a local hackerspace (all hands active), and the ann arbor district library teamed up to host a hackathon focused on oculus rift.13 this event grew out of a larger partnership with the community and sought to mix teams to include participants from all three areas. the 2018 article by demeter et al. highlights lessons learned from florida state university library and many of the planning steps involved when hosting large outreach events in library spaces.14 while the library initially hosted a 36-hour event, hackfsu, as a favor to the provost in the first year, they continue to host the event, providing library staff as mentors and logistical support. after the first year they started charging the student organization for use of the space and direct staffing costs for the hours beyond normal operating hours. while focused primarily on providing a central campus space, the library also sees it as a way to highlight the teaching and learning role of the library. similarly, nandi and mandernach detail the steps involved in planning information technology and libraries december 2021 hackathons and libraries |longmeier 5 hackathon events and some benefits of choosing the library as a location for the event.15 at ohio state, hackathon events in 2014 and 2015 were held in the library due to twenty-four-hour spaces, interest by the libraries in supporting innovative endeavors on campus, and a participant size (100–200 attendees) that could be accommodated in the space. other events chose academic libraries as locations for hackathons due to their central location on campus.16 an initial summary of library hackathons was captured by r. c. davis who detailed that libraries may be motivated to host such events as they align with library principles of “community, innovation, and outreach.”17 she points out that libraries are ideal locations because of small modular workspaces paired with a large space for final presentations. additionally, adequate and sufficiently strong wi-fi or hardwired connections, a multitude of power outlets, and 24-hour spaces are appealing for these kinds of events. event planners should know that the necessities include free food and multidisciplinary involvement. davis details ways to plan smaller events, such as code days or edit-a-thons, if staffing does not allow for a large hackathon event. in all cases, the libraries serve a purpose to either campus or community as the location and sometimes also provide staff for the events. hackathons as library outreach hackathon events are a great way to reach out to the community and provide a fresh look into libraries as purveyors of information focused on more than books. at the 2014 computers in libraries conference, chief library officer mary lee kennedy delivered a keynote sharing stories of the new york public libraries experiences hosting wikipedia editathons and other hackathons at various branches since 2014.18 the goals for these outreach events were to highlight strategic priorities around making knowledge accessible, re-examine the library purpose, and spark connections. early library hackathon events focused on outreach included topics of accessibility or designing library mobile apps.19 more recent events have focused on outreach but with an eye toward sharing content as part of the coding contest.20 even library associations have hosted preconference hacking events to highlight what libraries are doing to foster innovation.21 the future libraries product forge, a four-day event, was hosted in collaboration with the scottish library and information council and delivered by product forge, a company focused on running hackathons that tackle challenging social issues. the 2016 event focused specifically on public libraries in scotland and seven teams, comprised mainly of students from a local university, worked with public library staff and users as well as regional experts in technology, design, and business.22 the goals of the event were to raise awareness of digital innovation with library services, generate enthusiasm for approaches to digital service design, and codesign new services around digital ventures. participants created novel products including digital signage, a game for young readers, a tool for collecting user stories about library services, and an app to reserve specific library spaces. another common focus for library hackathon outreach events is the theme of data and data literacy. in july 2016, the los angeles public library hosted the civic information lab’s immigration hackathon.23 this outreach event gathered 100 participants to address local issues around immigration. the library, motivated by establishing itself as a “welcoming, trusting environment,” wanted to be a “prominent destination of immigrant and life-enrichment information and programs and services.”24 newcastle libraries ran two-day-long events focused on promoting data they released under an open license as part of the commons are forever project.25 they used both events to educate users about tools such as github, a gif-making session with historical photographs, and data visualization tools. similarly, toronto public library hosted a series of open data hackathons to highlight the role of the libraries in civic issue information technology and libraries december 2021 hackathons and libraries |longmeier 6 discourse, data literacy, and data education.26 their events combined the hackathon with other panel presentations and resources focused on mentorship and connection-building in the technology sector. the library also used the event to promote their open data policy, build awareness around the data provided by the library for the community, and highlight their role in facilitating conversations around civic issues through data literacy and data education. edmonton public library hosted its first hackathon in 2014 for international open data day. one of the main drivers was to build the relationship with their local government.27 they built their event around the tenets laid out in the open data hackathon how-to guide and by a blog post about the city of vancouver’s 2013 international open data day hackathon.28 they took a structured approach to documenting expectations of both partners around areas such as resources, staffing, and costs, which served as a roadmap for the hackathon and the partnership. the library provided the event space, coffee and pizza, an emcee, tech help and wi-fi, door prizes and “best idea” prize, and promotional material. the city recruited participants and provided an orientation, promotional banners, and a keynote. the event led to a deeper partnership with the city and additional hacking events. in these ways, the hackathon served a greater purpose of community building and awareness around data, the role the library plays in interpreting data, and how the libraries serve as a resource hub to the community. events supporting library teaching mission at academic institutions, the events often focus on outreach to their own campus community. in 2015, adelphi university hosted their first hackathon and the libraries funded the event themselves rather than seeking outside funding.29 the article details the considerable lessons learned through the process as well as a step-by-step guide to planning a smaller event. similarly, york university science and engineering library hosted hackfests in the library and embedded an event as part of an introductory computer science course.30 shujah highlighted some of the benefits to the library hosting a hackathon included: establishing libraries as part of the research landscape, providing a constructive space for innovation and innate collaborative environment, highlighting the commitment to openness and democratizing knowledge, and acknowledging the library’s role in boosting critical thinking and information literacy concepts. shin, vela, and evans highlight a community hackathon at washington state university college of medicine where a group of librarians from multiple institutions staffed a research station throughout the event.31 while the station was underutilized by participants, as only seven questions were asked during the event, the libraries deemed their participation a success as it worked as an outreach and promotion mechanism for both library services and expertise. at some public libraries, the focus of the hackathon is on education and teaching basic coding skills. whether called a coding contest, hackathon, or tech sandbox, there are opportunities for programming with a focus on learning and skill-building and fun.32 santa clara county library district used a peer-to-peer approach for mentoring and hosted a hackathon in 2015 for middle and high-school students.33 the library staff facilitated the event planning and recruited judges from the community, but the bulk of the event was coordinated by the students. considerations when hosting events in library spaces a couple of substantive reports provide overarching recommendations and considerations for hosting hackathons in library spaces, including planning checklists, tips on getting funding, building partnerships with local community officials, and thinking through the event systematically. recently, the digital public library of america (dpla) created a hackathon information technology and libraries december 2021 hackathons and libraries |longmeier 7 planning guide that details a number of logistical issues to address during the planning phases, both preand post-event.34 this report highlights specific considerations for galleries, libraries, archives and museums that are looking to host a hackathon. after hosting a successful hackathon, librarians at new york university created a libguide called hack your library which is a planning guide for other libraries considering hosting a similar event.35 the engage respond innovate final report: the value of hackathons in public libraries was put together following an event the carnegie uk trust sponsored.36 this guide highlights some of the challenges present with hackathons, including: intellectual property of the creations, prizes, participant diversity, and complications that arise from either approach of using specific themes or open-ended challenges. it also highlights some of the main reasons a library should consider hackathons and other coding events, including ways to promote new roles of libraries within communities, promote specific collections, capitalize on community expertise, gain insight about users, help users build new skills and improve digital literacy, and develop tools that increase access to materials. finally, the report points out that hosting an event will not be the only solution for a library’s innovation problem. yet if the library is clear on why it wants to hold a hackathon, being planful about expectations and outcomes the library is trying to achieve will increase the chances for success. results: library as source the other category of articles about hackathons and libraries focuses on the library as the source for the challenge or theme of the hackathon. the following summaries highlight articles include those where the libraries provided the challenges around library spaces or services, library datasets, workflows or collections as the theme for the hackathon. this section also details steps involved in cleaning data for use/re-use in time-bound events. using hackathons to improve library services and spaces a few articles discuss libraries that proposed hackathon themes around improving library services. a 2016 article describes how adelphi university libraries hosted a hackathon and provided the theme of developing library mobile apps and web software applications.37 the winning student team created an app for library group study meetups. similarly, the librarians from university of illinois tried three approaches for library app development: a student competition, a project in a computer science course, and a coding camp. with the adventure code camp, students co-designed with librarians over the course of two days.38 they advertised to specific departments and courses and ten students were selected with six ultimately participating in the two-day coding camp. students were sent a package of library data, available apis, and brief tutorials on coding languages that may be useful. mentors and coaches were available throughout the coding camp. the authors provided tips for others trying to replicate their approach as well as insights from the students about interest in developing apps that include library data but that don’t solely focus on library services. the following year the librarians hosted a coding contest focused specifically on app development related to library services and spaces.39 the library sponsored the event and served as both a traditional client and partner in the design process. ultimately six teams with a total of 26 individuals participated and each app was “required to address student needs for discovery of and access to information about library services, collections, and/or facilities” but not duplicate existing library mobile apps. they based their approach on massachusetts institute of technology’s entrepreneurship competition. through this process, co-ownership was preferred and many teams set up a licensing agreement as part of the competition to handle intellectual property for the software. students had two weeks to complete the apps and were judged by both library and campus it administration. this article details what information technology and libraries december 2021 hackathons and libraries |longmeier 8 they learned through the process given the amount of attrition from selection of teams to final product presentations. the new york university school of engineering worked with the libraries and used a hackathon theme of noise issues to coincide with the renovation of the library.40 the libraries created a libguide to provide structured information about the event itself (https://guides.nyu.edu/hackdibner). they used the event to market the new maker space and held workshops there leading up to the event. in the inaugural year they held the event over the course of two semesters and saw a lot of attrition due to the event length. in the second year, following focus groups with participants, they designed a library hackathon with four goals: 1) appeal to a large base of the student population, 2) create a triangle of engagement between the student and the library, the library and the faculty, and the faculty and the students, 3) provide an adaptable model to other libraries, and 4) highlight the development of student information literacy skills.41 the second year’s approach required more work by the participants due to pitching an initial concept, providing a written proposal, and giving a final presentation. library staff and guest speakers offered workshops to help students hone their skills. the planners evaluated the event through surveys and student focus groups. overall the students applied what they learned about information literacy and were highly engaged with the codesign approach to library service improvements. similarly, mcgowan highlights two hackathons at purdue that focused on inclusive healthcare and how the libraries applied design thinking processes as part of the events.42 the librarian wanted to encourage health sciences students to examine health data challenges. to examine this issue, she applied the blended librarians adapted addie model (blaam) as a guide to developing a service to prepare students to participate in a hackathon. a number of pre-event training sessions were held in the libraries and covered topics such as research data management, openrefine and data cleaning, gephi for data visualization, and javascript. while this initial approach was in tandem with the hackathon events, students reported that they needed assistance in finding and cleaning datasets for use. in this case, developing library services to prepare for hackathon events ended up out of alignment with both the library’s mission and the participants’ expectations. using library materials for hackathon themes several events have focused on library as source where the library’s materials or processes serve as the theme of the hackathon, particularly around digital humanities (dh) topics.43 in september 2016, over 100 participants worked with materials from the special collections of hamburg state and university library, a space that serves both the university and the public.44 it followed the process established by coding da vinci (https://codingdavinci.de/en), an event that occurred in 2014 and 2015. the event at hamburg state and university library had a kick-off day for sharing available datasets, brainstorming projects using library materials, and team building opportunities. the event had a second day of programming and then teams had six weeks to complete their projects. some exemplary products included a sticker printer that would print old photographs, a quiz app based on engraving plates, and using a social media platform to bring the engravings to the public. the event was successful and resulted in opening additional data from the institution. several examples focus on highlighting digital humanities approaches as part of the events. in 2016, four teams from across european institutions participated over five days in kibbutz lotan in the arava region of israel to develop linguistic tools for tibetan buddhist studies with the goal of information technology and libraries december 2021 hackathons and libraries |longmeier 9 revealing their collections to the public.45 the planning team recruited international scholars to participate in prestructured teams (teams consisted of computer scientists as well as a tibetan scholar) in israel. although it was less of a traditional hackathon, this event being more akin to an event/coding contest around a specific task, it highlighted tools and methods for understanding literary texts. the format of the event for encouraging interdisciplinary efforts in the computational humanities was deemed successful and it was repeated the next year on manuscripts and computer-vision approaches. recently the university of waterloo detailed a series of datathons using archives unleashed to engage the community in an open-source digital humanities project.46 the goal of the events was to engage dh practitioners with the web archive analysis tools and attempt to build a web archiving analysis community. in 2016, the american museum of natural history in new york hosted their third annual hackathon event, hack the stacks, with more than 100 participants.47 the event focused on creating innovative solutions for libraries or archives and to “animate, organize, and enable greater access to the increasing body of digitized content.” ten tasks were available for participants to work on and ranged from a unified search interface, reassembling fragments of scientific notebooks, and creating timelines of archival photos of the museum itself. in addition to planning the tasks, the library staff ensured that the databases and applications could handle the additional traffic. a multitude of platforms were provided (omeka, dspace, the catalog, apis, archivespace, etc) for hackers to use. all prototypes that were developed were open source and deposited on github at “hack the stacks.”48 some cultural institutions have used hackathons as a means of outreach and publicity and then have showcased the outputs at the museums. vhacks, a hackathon at the vatican, was held in 2018 and gathered 24 teams from 30 countries for a 36-hour event.49 the three themes for the event focused on social inclusion, interfaith dialogue, and migrants and refugees. a winner was announced for each thematic area and sponsors enticed participants to continue working on projects by having a venture capitalist pitch a few weeks after the event. another program, museomix, concentrates on a three-day rapid prototyping event where outputs are highlighted in the museum or cultural institution.50 this event has happened annually in november since 2011 and the goal is to create interdisciplinary networks and encourage innovation and community partnership. improving library workflows and processes other hackathons have focused on library staff working on library processes themselves. bergland, davis, and traill detail a two-day event, catdoc hack doc, hosted by the university of minnesota data management and access department focused on increasing documentation by library staff.51 this article details logistics of preparing for the event as well and a summary of the work completed. they based their approach on the islandora collaboration group’s template on how to run a hack/doc.52 they were pleased with the workflow overall, refined some of the steps, and held it again for library staff the following year. similarly, dunsire highlights using a hackathon format to encourage adoption of a cataloging approach of research description and access (rda) through a “jane-athon.”53 events occurred at library conferences or in conjunction with other hackathon events, such as the thing-athon at harvard, with the intention of promoting the use of rda, to help users understand the utility of rda, and to spark discussions. this approach proved useful in uncovering some limitations with rda as well as valuable feedback that could be incorporated into its ongoing development. information technology and libraries december 2021 hackathons and libraries |longmeier 10 considerations when using libraries as source if libraries are interested in hosting a hackathon where the library plays a more central role, there are several options of ready-to-use library and museum data that could allow the host to also serve as the content provider. the digital public library of america released a hackathon guide, glam hack-in-a-box: a short guide for helping you organize a glam hackathon with several sources at the end for finding data related to libraries.54 the university of glasgow began a project called the global history hackathons that seeks to improve access and excitement around global history research.55 additionally, candela et al. detail the new array of sources for sharing glam data for reuse in multiple ways, including using data in hackathon projects.56 planners could look to the collections-as-data conversations for other data sources that could be adapted for hackathon projects.57 when thinking about hackathons and cultural institutions, sustainability of projects and choice of platforms is an important consideration for planners.58 ultimately, the top priority when providing a dataset is to ensure that it is clean and enough details about the dataset are available for participants to make use of it in their designs given the time constraints of most events. discussion hackathons often have a dual purpose of educating the participants and serving as an advertisement for the sponsor for either a platform or content. participants will develop a working prototype or improving their coding abilities; sponsors, including libraries, can benefit from rapid prototyping and idea generation using either their platforms or content. while usable apps or new ideas are a welcome outcome, even if the applications are not used, the events still feed into the larger goal of marketing libraries and their data, building relationships with local communities, or drawing attention to social good. there are benefits to libraries in either hosting or collaborating on the events. in both areas, those of library as space and library as source, hackathons help realign user expectations of libraries. if libraries choose to become involved with hackathons or other coding or data contests, the library should be deliberate in its goals and intended outcomes as those will help shape both the event and its planning. libraries are naturally aligned with teaching and learning, are already offering co-curricular programming, and typically serve as physical and communication hubs for campus. libraries already prioritize outreach and engagement with constituents both on campuses and in the community. therefore, when programs align with library priorities of data literacy, data fluency, and information evaluation, it is a natural fit to propose involvement in hosting hackathons. many libraries are able to customize their spaces, services, and vendor interfaces, which is a benefit when thinking about having libraries as a theme for an event. other benefits exist for the hackathon event planners when partnering with a library. hackathon planners should consider reaching out to libraries as they already serve as a cross-disciplinary event spaces, host many other outreach events, and are often connected to other campus and community stakeholders and communication outlets. since students from all disciplines and colleges already use the library spaces on college campuses, they are an ideal location for fostering collaborations from different colleges and majors. public libraries function as community gathering spots as well. as libraries consider hosting events, several articles provide overarching tips for planning and hosting hackathons and other time-bound events.59 table 2 provides an overview of articles and the areas of coverage for planning topics. information technology and libraries december 2021 hackathons and libraries |longmeier 11 table 2. selected articles for tips on planning hackathon events based on common article theme areas article author location details sample agenda + timelines power and computing mentors/ judging further readings carruthers (2014) x x x x nelson & kashyap (2014) x x x x x jansen-dings, dijk, van westen (2017) x x x bogdanov & isaacmendard (2016) x x x nandi & mandernach (2016) x x x grant (2017) x x x x x as library data becomes more open and reusable, hackathons will be a way to highlight data availability, promote its use and reuse, and reach out to the community. the issues present when considering library collections as potential hackathon themes are that libraries will need to ensure the data are cleaned and contain sufficient metadata so that the data are ready to use. additionally, if there are programming language restrictions for ongoing maintenance by the library after the event, those should be specified when advertising the event. ultimately, the libraries will likely not control the intellectual property (ip) of the tool or visualization developed, but several libraries have specified the ultimate ip as part of the event details either as open source or co-owned.60 often the goal of the event is the promotion of specific materials or building awareness of a collection rather than any biproduct created during the event. however, it is important for the library to be clear about their intent when advertising to participants. the collections-as-data movement will continue to evolve and there will be a multitude of library resources that could be mined for use at hackathons or other similar events. while libraries provide an ideal location and have access to data that can be used for an event, they can also leverage their wealth of experts. library staff can serve as judges, mentors, and connectors to the wider campus or community. events could highlight specific expertise when hackathons focus on particular approaches (data visualization), processes (metadata management or documentation), or codesign of services (physical spaces). table 3 provides examples of hackathon events from a variety of library contexts. hackathons are a great way for libraries to serve as a connector to others on campus or in their communities. if libraries are not interested or able to host an event themselves, library staff can act as mentors or event judges. at smaller schools, library staff can partner with other campus units to plan a hackathon; similarly, smaller public libraries could work with community organizations to host events. at a smaller scale if staffing is a concern or full hackathons are unrealistic, a coding contest or datathon, both of which typically have a shorter duration, might be an option. edit-a-thons are even easier to host as they require only an introduction to the editing process, ample computer space (or laptop hook-ups), and a small food budget. some edit-a-thon events happen in a single afternoon. information technology and libraries december 2021 hackathons and libraries |longmeier 12 table 3. selected hackathon event summaries from various library contexts based on themes and products of the event article author type of library size of event time for event purpose of event role of the library output carruthers (2014) public + city 29 participants 1 day highlight open data from the libraries event space, coffee + pizza, emcee, some prizes, assessment building partnerships with the city, getting dataset requests ward, hahn, mestre (2015) academic 6 teams; 25 participants 2 weeks develop apps using library data event sponsor, mentor app development for library using library data mititelu & grosu (2016) academic 100 participants 48 hours bring together tech students event space app development for sponsors nandi & manderna ch (2016) academic 200 participants 36 hours bring together tech students event space, planning logistics, judges various apps, not library related baione (2017) private museum 100 participants 2 days animate, organize, and enable greater access to digitized content from the library create challenges, event space, judges open source apps for glam institutions theise (2017) academic + public 100 participants 2 days + 6 week sprint cultural hackathon to highlight library data and resources event space, challenges, datasets for hacking highlighted data available for use, created apps focused on library materials almogi et al. (2019) academic 23 participants 5 days develop linguistic tools for buddhist studies provided cleaned datasets for manipulation linguistic tools for buddhist studies one area for iteration around these events relates to timing. while most hackathons last 24–36 hours, some are run over the course of a oneor two-month period where coding happens information technology and libraries december 2021 hackathons and libraries |longmeier 13 remotely with a few scheduled check-ins with mentors before judging and presentations. this notion of a remote event may have more appeal for collections-as-data–themed events as experts are more likely to be available for keynotes or mentoring. if the process instead of the product is the focus of the event, then providing a flexible structure may be more appealing to participants. if a library has more limited resources or capacity, stretching the event out over a longer period would allow for sustained interactions. however, libraries should be aware that the longer the event period, the greater the attrition of the participants. an area for future research includes assessment of library participation in events. a couple of articles highlighted the value the libraries found in the events, but it is unclear whether the participants also gained value from the libraries.61 typically, post-event surveys have focused on the participant experience or the overall event space, rather than whether it affected participants’ view of the libraries, which would another area of interest for future research.62 conclusion in the realm of hackathons and libraries, originally hackathon themes were a way that vendors could highlight new content or improve interfaces. libraries followed this trend and used events to reach out to constituents, make connections with their communities, and highlight evolving library services. with the growth of flexible spaces, ample technology support and more relaxed food policies, libraries have become ideal event locations. as the collections-as-data movement evolves, there will be more opportunities to develop services related to these data and other library data which would lend themselves easily as themes for hackathons, edit-a-thons, or datathons. libraries thinking about hosting events will need to weigh the amount of time and resources they want to invest with the intended goals of hosting an event. planning is essential whether the library is the event host, a collaborator, or a sponsor of a hackathon. for those libraries that are unable to host a full hackathon, smaller events, such as a datathon or edit-a-thon, are possibilities to provide support without the same time and resource commitment. given the growing popularity of hackathons and other coding contests, they may be a catch-all for solving several library issues simultaneously: updating the library’s image as being more than book-centric, supporting the collections-as-data movement, and a new way of engaging community partners. acknowledgements thank you to jody condit fagan for providing valuable suggestions on a draft of this paper and to the two anonymous reviewers whose feedback improved the quality of this manuscript. endnotes 1 james k. elmborg, “libraries as the spaces between us: recognizing and valuing the third space,” reference and user services quarterly 50, no. 4 (2011): 338–50. 2 “a brief open source timeline: roots of the movement,” online searcher 39, no. 5 (2015): 44–45; patrick timony, “accessibility and the maker movement: a case study of the adaptive technology program at district of columbia public library,” in accessibility for persons with disabilities and the inclusive future of libraries, advances in librarianship, vol. 40, (emerald group publishing limited, 2015), 51–58; kurt schiller, “elsevier challenges library community,” information today 28, no. 7 (july 2011): 10; eric lease morgan, “worldcat information technology and libraries december 2021 hackathons and libraries |longmeier 14 hackathon,” infomotions mini-musings (blog), last modified november 9, 2008, http://infomotions.com/blog/2008/11/worldcat-hackathon/; margaret heller, “creating quick solutions and having fun: the joy of hackathons,” acrl techconnect (blog), last modified july 23, 2012, http://acrl.ala.org/techconnect/post/creating-quick-solutions-andhaving-fun-the-joy-of-hackathons. 3 clemens neudecker, “working together to improve text digitization techniques: 2nd succeed hackathon at the university of alicante,” impact centre of confidence in digitisation blog, last updated april 22, 2014, https://www.digitisation.eu/succeed-2nd-hackathon/; porter anderson, “futurebook hack,” bookseller no. 5628 (june 20, 2014): 20–21; sarah shaffi, “inaugural hack crowns its diamond project,” bookseller no. 5628 (june 20, 2014): 18–19. 4 rose sliger krause, james rosenzweig, and paul victor jr. “out of the vault: developing a wikipedia edit-a-thon to enhance public programming for university archives and special collections,” journal of western archives 8, no. 1 (2017): 3; stanislav bogdanov and rachel isaac-menard, “hack the library: organizing aldelphi [sic] university libraries’ first hackathon,” college and research libraries news 77, no. 4 (2016): 180–83; matt enis, “civic data partnerships,” library journal 145, no. 1 (2020): 26–28; alex carruthers, “open data day hackathon 2014 at edmonton public library,” partnership: the canadian journal of library & information practice & research 9 no. 2 (2014): 1–13, https://doi.org/10.21083/partnership.v9i2.3121; sarah shujah, “organizing and embedding a library hackfest into a 1st year course,” information outlook 18, no. 5 (2014): 32–48; lindsay anderberg, matthew frenkel, and mikolaj wilk, “project shhh! a library design contest for engineering students,” in american society for engineering education 2018 annual conference proceedings (2018): paper id 21058, https://cms.jee.org/30900. 5 michelle demeter et al., “send in the crowds: planning and benefiting from large-scale academic library events,” marketing libraries journal 2 no. 1 (2018): 86–95, https://bearworks.missouristate.edu/cgi/viewcontent.cgi?article=1089&context=articles-lib. 6 jamie lausch vander broek and emily puckett rodgers, “better together: responsive community programming at the um library,” journal of library administration 55, no. 2 (2015): 131–41; arnab nandi and meris mandernach, “hackathons as an informal learning platform,” in sigcse 2016 – proceedings of the 47th acm technical symposium on computing science education (february 2016): 346–51, https://doi.org/10.1145/2839509.2844590; lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: engage students in information literacy through a technology-themed competition,” in american society for engineering education 2019 annual conference proceedings, (2019): paper id 26221, https://peer.asee.org/32883; anna grant, hackathons: a practical guide, insights from the future libraries project forge hackathon (carnegieuk trust, 2017), https://www.carnegieuktrust.org.uk/publications/hackathons-practical-guide/; carruthers, “open data day hackathon 2014 at edmonton public library”; chad nelson and nabil kashyap, glam hack-in-a-box: a short guide for helping you organize a glam hackathon (digital public library of america, summer 2014), http://dpla.wpengine.com/wpcontent/uploads/2018/01/dpla_hackathonguide_forcommunityreps_9-4-14-1.pdf. information technology and libraries december 2021 hackathons and libraries |longmeier 15 7 david ward, james hahn, and lori mestre, “adventure code camp: library mobile design in the backcountry,” information technology and libraries 33, no. 3 (2014): 45–52; david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces 4, no. 1 (2015): 30–40. 8 ann marie l. davis, “current trends and goals in the development of makerspaces at new england college and research libraries,” information technology and libraries 37, no. 2 (2018): 94–117, https://doi.org/10.6017/ital.v37i2.9825; mark bieraugel and stern neill, “ascending bloom’s pyramid: fostering student creativity and innovation in academic library spaces,” college & research libraries 78, no. 1 (2017): 35–52; elyssa kroski, the makerspace librarian’s sourcebook (chicago: ala editions, 2017); angela pashia, “empty bowls in the library: makerspaces meet service,” college & research libraries news 76 no. 2 (2015): 79–82; h. michele moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93; adetoun a. oyelude, “virtual reality (vr) and augmented reality (ar) in libraries and museums,” library hi tech news 35, no. 5 (2018) 1–4. 9 krause, rosenzweig, and victor jr., “out of the vault”; ed yong, “edit-a-thon gets women scientists into wikipedia,” nature news (october 22, 2012), https://doi.org/10.1038/nature.2012.11636; angela l. pratesi et al., “rod library art+feminism wikipedia edit-a-thon,” community engagement celebration day (2018): 10, https://scholarworks.uni.edu/communityday/2018/all/10; maitrayee ghosh, “hack the library! a first timer’s look at the 29th computers in libraries conference in washington, dc,” library hi tech news 31, no. 5 (2014): 1–4, https://doi.org/10.1108/lhtn-05-20140031. 10 carruthers, “open data day hackathon 2014 at edmonton public library”; bob warburton, “civic center,” library journal 141, no. 15 (2016): 38. 11 matt burton et al., shifting to data savvy: the future of data science in libraries (project report, university of pittsburgh, pittsburgh, pa, 2018): 1–24, https://d-scholarship.pitt.edu/33891/. 12 vander broek and rodgers, “better together”; nandi and mandernach, “hackathons as an informal learning platform”; robin camille davis, “hackathons for libraries and librarians,” behavioral & social sciences librarian 35, no. 2 (2016): 87–91; bogdanov and isaac-menard, “hack the library”; ward, hahn, and mestre, “adventure code camp”; ward, hahn, and mestre, “designing mobile technology to enhance library space use”; demeter et al., “send in the crowds”; carruthers, “open data day hackathon 2014 at edmonton public library.” 13 vander broek and rodgers, “better together.” 14 demeter et al., “send in the crowds.” 15 nandi and mandernach, “hackathons as an informal learning platform.” 16 eduard mititelu and vlad-alexandru grosu, “hackathon event at the university politehnica of bucharest,” international journal of information security & cybercrime 6, no. 1 (2017): 97–98; information technology and libraries december 2021 hackathons and libraries |longmeier 16 orna almogi et al., “a hackathon for classical tibetan,” journal of data mining and digital humanities, episciences.org, special issue on computer-aided processing of intertextuality in ancient languages, hal-01371751v3 (2019): 1–10, https://jdmdh.episciences.org/5058/pdf. 17 davis, “hackathons for libraries and librarians.” 18 ghosh, “hack the library!” 19 timony, “accessibility and the maker movement”; ward, hahn, and mestre, “adventure code camp.” 20 gérald estadieu and carlos sena caires, “hacking: toward a creative methodology for cultural institutions,” (presented at the viii lisbon summer school for the study of culture “cuber+cipher+culture”, september 2017); andrea valdez, “the vatican hosts a hackathon,” wired magazine, last updated march 7, 2018, https://www.wired.com/story/vaticanhackathon-2018/; leonardo moura de araujo, “hacking cultural heritage: the hackathon as a method for heritage interpretation,” (phd diss., university of bremen, 2018): 181–231, 235– 38. 21 thomas finley, “innovation lab: a conference highlight,” texas library journal 94, no. 2 (summer 2018): 61–62. 22 grant, hackathons: a practical guide. 23 warburton, “civic center.” 24 warburton, “civic center.” 25 aude charillon and luke burton, “engaging citizens with data the belongs to them,” cilip update magazine (november 2016). 26 enis, “civic data partnerships.” 27 carruthers, “open data day hackathon 2014 at edmonton public library.” 28 kevin mcarthur, herb lainchbury, and donna horn, “open data hackathon how to guide v. 1.0,” october 2012, https://docs.google.com/document/d/1fbuisdtiibaz9u2tr7sgv6gddlov_ahbafjqhxsknb0/e dit?pli=1; david eaves, “open data day 2013 in vancouver,” eaves.ca (blog), march 11, 2013, https://eaves.ca/2013/03/11/open-data-day-2013-in-vancouver/. 29 bogdanov and isaac-menard, “hack the library.” 30 shujah, “organizing and embedding a library hackfest into a 1st year course.” 31 nancy shin, kathryn vela, and kelly evans, “the research role of the librarian at a community health hackathon—a technical report,” journal of medical systems 44 (2020): 36. 32 geri diorio, “programming by the book,” voices of youth advocates 35, no. 4, (2012): 326–327. information technology and libraries december 2021 hackathons and libraries |longmeier 17 33 lauren barack and matt enis, “where teens teach,” school library journal (april 2016): 30. 34 nelson and kashyap, glam hack-in-a-box. 35 lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: a library competition toolkit,” june 6, 2019, https://wp.nyu.edu/hackyourlibrary/; anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technologythemed competition.” 36 anna grant, engage. respond. innovate. the value of hackathons in public libraries (carnegieuk trust, 2020), https://www.carnegieuktrust.org.uk/publications/engage-respond-innovatethe-value-of-hackathons-in-public-libraries/. 37 bogdanov and isaac-menard. “hack the library.” 38 ward, hahn, and mestre, “adventure code camp.” 39 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” 40 anderberg, frenkel, and wilk, “project shhh!” 41 anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technology-themed competition.” 42 bethany mcgowan, “the role of the university library in creating inclusive healthcare hackathons: a case study with design-thinking processes,” international federation of library associations and institutions 45, no. 3 (2019): 246–53, https://doi.org/10.1177/0340035219854214. 43 marco büchler et al., “digital humanities hackathon on text re-use ‘don’t leave your data problems at home!’” electronic text reuse acquisition project, event held july 27–31, 2015, http://www.etrap.eu/tutorials/2015-goettingen/; helsinki centre for digital humanities, “helsinki digital humanities hackathon 2017 #dhh17,” event held may 15–19, 2017, https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/dhh-hackathon/helsinkidigital-humanities-hackathon-2017-dhh17. 44 antje theise, “open cultural data hackathon coding da vinci–bring the digital commons to life,” in ifla wlic 2017 wroclaw poland, session 231—rare books and special collections (2017), http://library.ifla.org/id/eprint/1785. 45 almogi et al., “a hackathon for classical tibetan.” 46 samantha fritz et al., “fostering community engagement through datathon events: the archives unleased experience,” digital humanities quarterly 15, no. 1 (2021): 1–13, http://digitalhumanities.org/dhq/vol/15/1/000536/000536.html. 47 tom baione, “hackathon & 21st-century challenges.” library journal 142, no. 2 (2017): 14–17. information technology and libraries december 2021 hackathons and libraries |longmeier 18 48 american museum of natural history, “hack the stacks,” https://www.amnh.org/learnteach/adults/hackathon/hack-the-stacks, https://github.com/amnh/hackthestacks/wiki, https://github.com/hackthestacks. 49 andrea valdez, “inside the vatican’s first-ever hackathon: this is the holy see of the 21st century,” wired magazine, march 12, 2018, https://www.wired.com/story/inside-vhacksfirst-ever-vatican-hackathon/. 50 museomix, “concept,” accessed march, 29, 2021, https://www.museomix.org/en/concept/. 51 kristi bergland, kalan knudson davis, and stacie traill, “catdoc hackdoc: tools and processes for managing documentation lifecycle, workflows, and accessibility,” cataloging and classification quarterly 57, no. 7–8 (2019): 463–95. 52 islandora collaboration group, “templates: how to run a hack/doc,” last modified december 5, 2017, https://github.com/islandora-collaborationgroup/icg_information/tree/master/templates_how_to_run_a_hack_doc. 53 gordon dunsire, “toward an internationalization of rda management and development,” italian journal of library and information science 7, no. 2 (may 2016): 308–31. http://dx.doi.org/10.4403/jlis.it-11708 54 nelson and kashyap, glam hack-in-a-box. 55 hannah-louise clark, “global history hackathons information,” accessed april 19, 2021, https://www.gla.ac.uk/schools/socialpolitical/research/economicsocialhistory/projects/glob al%20historyhackathons/history%20hackathons/. 56 gustavo candela et al., “reusing digital collections from glam institutions,” journal of information science (august 2020): 1–10, https://doi.org/10.1177/0165551520950246. 57 thomas padilla, “on a collections as data imperative,” uc santa barbara, 2017, https://escholarship.org/uc/item/9881c8sv; rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101; sandra tuppen, stephen rose, and loukia drosopoulou, “library catalogue records as a research resource: introducing ‘a big data history of music,’” fontes artis musicae 63, no. 2 (2016): 67–88. 58 moura de araujo, “hacking cultural heritage.” 59 grant, hackathons: a practical guide; grant, engage. respond. innovate.; joshua tauberer, “hackathon guide,” accessed march 26, 2021, https://hackathon.guide/; alexander nolte et al., “how to organize a hackathon—a planning kit,” arxiv preprint arxiv:2008.08025 (2020), https://arxiv.org/abs/2008.08025v2; ivonne jansen-dings, dick van dijk, and robin van westen, hacking culture: a how-to guide for hackathons in the cultural sector, waag society, (2017): 1–41. https://waag.org/sites/waag/files/media/publicaties/es-hacking-culturesingle-pages-print.pdf. 60 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” information technology and libraries december 2021 hackathons and libraries |longmeier 19 61 mcgowan, “the role of the university library in creating inclusive healthcare hackathons.” 62 nandi and mandernach, “hackathons as an informal learning platform”; carruthers, “open data day hackathon 2014 at edmonton public library.” automatic extraction of figures from scientific publications in high-energy physics piotr adam praczyk, javier nogueras-iso, and salvatore mele information technology and libraries | december 2013 25 abstract plots and figures play an important role in the process of understanding a scientific publication, providing overviews of large amounts of data or ideas that are difficult to intuitively present using only the text. state-of-the-art digital libraries, which serve as gateways to knowledge encoded in scholarly writings, do not yet take full advantage of the graphical content of documents. enabling machines to automatically unlock the meaning of scientific illustrations would allow immense improvements in the way scientists work and the way knowledge is processed. in this paper, we present a novel solution for the initial problem of processing graphical content, obtaining figures from scholarly publications stored in pdf. our method relies on vector properties of documents and, as such, does not introduce additional errors, unlike methods based on raster image processing. emphasis has been placed on correctly processing documents in high-energy physics. the described approach distinguishes different classes of objects appearing in pdf documents and uses spatial clustering techniques to group objects into larger logical entities. many heuristics allow the rejection of incorrect figure candidates and the extraction of different types of metadata. introduction notwithstanding the technological advances of large-scale digital libraries and novel technologies to package, store, and exchange scientific information, scientists’ communication pattern has changed little in the past few decades, if not the past few centuries. the key information of scientific articles is still packaged in a form of text and, for several scientific disciplines, in a form of figures. new semantic text-mining technologies are unlocking the information in scientific discourse, and there exist some remarkable examples of attempts to extract figures from scientific publications,1 but current attempts do not provide a sufficient level of generality to deal with figures from high energy physics (hep) and cannot be applied in a digital library like inspire, which is our main piotr adam praczyk (piotr.praczyk@gmail.com) is a phd student at universidad de zaragoza, spain, and research grant holder at the scientific information service of cern, geneva, switzerland. javier nogueras-iso (jnog@unizar.es) is associate professor, computer science and systems engineering department, universidad de zaragoza, spain. salvatore mele (salvatore.mele@cern.ch) is leader of the open access section at the scientific information service of cern, geneva, switzerland. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 26 point of interest. scholarly publications in hep tend to contain highly specific types of figures (as any type of graphical content illustrating the text and referenced from it). in particular, they contain a high volume of plots, which are line-art images illustrating a dependency of a certain quality on a parameter. the graphical content of scholarly publications allows much more efficient access to the most important results presented in a publication.2,3 the human brain perceives the graphical content much faster than reading an equivalent block of text. presenting figures with the publication summary when displaying search results would allow more accurate assessment of the article content and in turn lead to a better use of researchers’ time. enabling users to search for figures describing similar quantities or phenomena could become a very powerful tool for finding publications describing similar results. combined with additional metadata, it could provide knowledge about evolution of certain measurements or ideas over time. these and many more applications created an incentive to research possible ways to integrate figures in inspire. inspire is a digital library for hep,4 the application field of this work. it provides a large-scale digital library service (1 million records, fifty-thousand users), which is starting to explore new mechanisms of using figures in articles of the field to index, retrieve, and present information.5,6 as a first step, direct access to graphical content before accessing the text of a publication can be provided. second, a description of graphics (“blue-band plot,” “the yellow shape region”) could be used in addition to metadata or full-text queries to retrieve a piece of information. finally, articles could be aggregated into clusters containing the same or similar plots in a possible alternative automated answer to a standing issue in information management. the indispensable step to realize this vision is an automated, resilient, and high-efficiency extraction of figures from scientific publications. in this paper, we present an approach that we have developed to address this challenge. the focus has been put on developing a general method allowing the extraction of data from documents stored in portable document format (pdf). the results of the algorithm consist of metadata, raster images of a figure, but also vector graphics, which allows easier further processing. the pdf format has been chosen as the input of the algorithm because it is a de facto standard in scientific communication. in the case of hep, mathematics, and other exact sciences, the majority of publications are prepared using the latex document formatting system and later compiled into a pdf file. the electronic versions of publications from outstanding scientific journals are also provided in pdf. the internal structure of pdf files does not always reveal the location of graphics. in some cases, images are included as external entities and easily distinguishable from the rest of a document’s content, but other times they are mixed with the rest of the content. therefore, to miss any figures, the low-level structure of a pdf had to be analyzed. the work described in this paper focuses on the area of hep. however, with minor variations, the described methods could be applicable to a different area of knowledge. information technology and libraries | december 2013 27 related work over years of development of digital libraries and document processing, researchers came up with several methods of automatically extracting and processing graphics appearing in pdf documents. based on properties of the processed content, these methods can be divided into two groups. the attempts of the first category deal with pdf documents in general, not making any assumptions about the content of encoded graphics or document type. the methods from the second group are more specific to figures from scientific publications. our approach belongs to the second group. tools include command line programs like pdf-images (http://sourceforge.net/projects/pdfimages/) or web-based applications like pdf to word (http://www.pdftoword.com/). these solutions are useful for general documents, but all suffer from the same difficulties when processing scientific publications: graphics that are recognized by such tools have to be marked as graphics inside pdf documents. this is the case with raster graphics and some other internally stored objects. in the case of scholarly documents, most graphics are constructed internally using pdf primitives and thus cannot be correctly processed by tools from the first group. moreover, general tools do not have the necessary knowledge to produce metadata describing the extracted content. with respect to specific tools for scientific publications it must be noted first that important scientific publishers like springer or elsevier have created services to allow access to figures present in scientific publications: the improvement of the sciverse science direct site (http://www.sciencedirect.com) for searching images in the case of elsevier7 and the springerimages service (http://www.springerimages.com/) in the case of springer.8 these services allow searches triggered from a text box, where the user can introduce a description of the required content. it is also possible to browse images by categories such as types of graphics (image, table, line art, video, etc.). the search engines are limited to searches based on figure captions. in this sense, there is little difference between the image search and text search implemented in a typical digital library. most of existing works aiming at the retrieval and analysis of figures use the rasterized graphical representation of source documents as its basis. browuer et al. and kataria et al. describe a method of detecting plots by means of wavelet analysis.9,10 they focus on the extraction of data points from identified figures. in particular, they address the challenge of correctly identifying overlapping points of data in plots. this problem would not manifest itself often in the case of vector graphics, which is the scenario proposed in our extraction method. vector graphics preserve much more information about the documents content than simple values of pixel colours. in particular, vector graphics describe overlapping objects separately. raster methods are also much more prone to additional errors being introduced during the recognition/extraction phase. the methods described in this paper could be used with kataria’s method for documents resulting from a digitization process.11 http://sourceforge.net/projects/pdf-images/) http://sourceforge.net/projects/pdf-images/) http://www.pdftoword.com/). http://www.sciencedirect.com/ http://www.springerimages.com/ automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 28 liu et al. present a page box-cutting algorithm for the extraction of tables from pdf documents.12 their approach is not directly applicable, but their ideas of geometrical clustering of pdf primitives are similar to the ones proposed in our work. however, our experiments with their implementation and hep publications have shown that the heuristics used in their work cannot be directly applied to hep, showing the need for an adapted approach, even in the case of tables. a different category of work, not directly related to graphics extraction but useful when designing algorithms, has been devoted to the analysis of graph use in scientific publications. the results presented by cleveland describe a more general case than hep publications.13 even if the data presented in the work came from scientific publications before 1984, included observations—for example, typical sizes of graphs—were useful with respect to general properties of figures and were taken into account when adjusting parameters of the presented algorithm. finally, there exist attempts to extract layout information from pdf documents. the knowledge of page layout is useful to distinguish independent parts of the content. the approach of layout and content extraction presented by chao and fan is the closest to the one we propose in this paper.14 the difference lies in the fact that we are focusing on the extraction of plots and figures from scientific documents, which usually follow stricter conventions. therefore we can make more assumptions about their content and extract more precise data. for instance, our method emphasizes the role of detected captions and permits them to modify the way in which graphics are treated. we also extract portions of information that are difficult to be extracted using more general methods, such as captions of figures. method pdf files have a complex internal structure allowing them to embed various external objects and to include various types of metadata. however, the central part of every pdf file consists of a visual description of the subsequent pages. the imaging model of pdf uses a language based on a subset of the postscript language. postscript is a complete programming language containing instructions (also called operators) allowing the rendering of text and images on a virtual canvas. the canvas can correspond to a computer screen or to another, possibly virtual, device used to visualize the file. the subset of postscript, which was used to describe content of pdfs, had been stripped from all the flow control operations (like loops and conditional executions), which makes it much simpler to interpret than the original postscript. additionally, the state of the renderer is not preserved between subsequent pages, making their interpretation independent. to avoid many technical details, which are irrelevant in this context, we will consider a pdf document as a sequence of operators (also called the content stream). every operator can trigger a modification of the graphical state of the pdf interpreter, which might be drawing a graphical primitive, rendering an external attached object, or modifying a position of the graphical pointer15 or a transformation matrix.16 the outcome of an atomic operation encoded in the content stream depends not only on parameters of the operation, but also on the way previous operators modified information technology and libraries | december 2013 29 the state of the interpreter. such a design makes a pdf file easy to render but not necessarily easy to analyze. figure 1 provides an overview of the proposed extraction method. at the very first stage, the document is pre-processed and operators are extracted (see “pre-processing of operators” below). later, graphical17 and textual18 operators are clustered using different criteria (see “inclusion of text parts” and “detection and matching of captions” below), and the first round of heuristics rejects regions that cannot be considered figures. in the next phase, the clusters of graphical operators are merged with text operators representing fragments of text to be included inside a figure (see “inclusion of text parts” below). the second round of heuristics detects clusters that are unlikely to be figures. text areas detected by the means of clustering text operations are searched for possible figure captions (see “detection and matching of captions” below). captions are matched with corresponding figure candidates, and geometrical properties of captions are used to refine the detected graphics. the last step generates data in a format convenient for further processing (see “generation of the output” below). figure 1. overview of the figure extraction method. additionally, it must be noted that another important pre-processing step of the method consists of the layout detection. an algorithm for segmenting pages into layout elements called page divisions is presented later in the paper. this considerably improves the accuracy of the extraction method because elements from different page divisions can no longer be considered to belong to the same cluster (and subsequently figure). this allows the method to be applied separately to different columns of a document page. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 30 pre-processing of operators the proposed algorithm considers only certain properties of a pdf operator rather than trying to completely understand its effect. considered properties consist of the operators’ type, the region of the page where the operator produces output and, in the case of textual operations, the string representation of the result. for simplicity, we suppress the notion of coordinate system transformation, inherent for the pdf rendering, and describe all operators in a single coordinate system of a virtual 2-dimensional canvas where operations take effect. transformation operators19 are assigned an empty operation region as they do not modify the result directly but affect subsequent operations. in our implementation, an existing pdf rendering library has been used to determine boundaries of operators. rather than trying to understand all possible types of operators, we check the area of the canvas that has been affected by an operation. if the area is empty, we consider the operation to be a transformation. if there exists a non-empty area that has been changed, we check if the operator belongs to a maintained list of textual operators. this list is created based on the pdf specification. if so, the operators argument list is scanned searching for a string and the operation is considered to be textual. an operation that is neither a transformation nor a textual operation is considered to be graphical. it might happen that text is generated using a graphical operator. however, such a situation is unusual. in the case of operators triggering the rendering of other operators, which is the case when rendering text using type-3 fonts, we consider only the top-level operation. in most cases, separate operations are not equivalent to logical entities considered by a human reader (such as a paragraph, a figure, or a heading). graphical operators are usually responsible for displaying lines or curve segments while humans think in terms of illustrations, data lines, etc. similarly, in the case of text, operators do not have to represent complete or separate words or paragraphs. they usually render parts of words and sometimes parts of more than one word. the only assumption we make about the relation between operators and logical entities is that a single operator does not trigger rendering of elements from different detected entities (figures, captions). this is usually true because logical entities tend to be separated by a modification of the context—there is a distance between text paragraphs or an empty space between curves. clustering of graphical operators the clustering algorithm the representation of a document as a stream of rectangles allows the calculation of more abstract elements of the document. in our model, every logical entity of the document is equivalent to a set of operators. the set of all operators of the document is divided into disjoint subsets in the process called clustering. operators are decided to belong to the same cluster based on the position of their boundaries. the criteria for the clustering are based on a simple but important observation: information technology and libraries | december 2013 31 operations forming a logical entity have boundaries lying close to each other. groups of operations forming different entities are separated by empty spaces. algorithm 1. the clustering algorithm. the clustering of textual operations yields text paragraphs and smaller objects like section headings. however, in the case of graphical operations, we can obtain consistent parts of images, but usually not complete figures yet. outcomes of the clustering are utilized during the process of figures detection. algorithm 1 shows the pseudo-code of the clustering algorithm. the input of the algorithm consists of a set of pre-processed operators annotated with their affected area. the output is a division of the input set into disjoint clusters. every cluster is assigned a boundary equal to the smallest rectangle containing boundaries of all included operations. in the first stage of the algorithm (lines 6–20), we organize all input operations in a data structure of forest of trees. every tree describes a separate cluster of operations. the second stage (lines 21– 29) converts the results (clusters) into a more suitable format. 1: input: operationset input_operations {set of operators of the same type} 2: output: map {spatial clusters of operators} 3: intervaltree tx ← intervaltree() 4: intervaltree ty ← intervaltree() 5: map parent ← map() 6: for all operation op ∈ input_operations do 7: rectangle boundary ← extendbymargins(op.boundary) 8: repeat 9: operationset int_opsx ← tx.getintersectingops(boundary) 10: operationset int_opsy ← ty.getintersectingops(boundary) 11: operationset int_ops ← int_opsx ∩ int_opsy 12: for all operation int_op ∈ int_ops do 13: rectangle bd ← tx[int_op] × ty[int_op] 14: boundary ← smallestenclosing(bd, boundary) 15: parent[int_op] ← op 16: tx.remove(int_op); ty.remove(int_op) 17: end for 18: until int_ops = ∅ 19: tx.add(boundary, op); ty.add(boundary, op) 20: end for 21: map results ← map() 22: for all operation op ∈ input_operations do 23: operation root_ob ← getroot(parent, op) 24: rectangle rec ← tx[int_ob] × ty[int_ob] 25: if not results.has_key(rec) then 26: results[rec] ← list() 27: end if 28: results[rec].add(op) 29: end for 30: return results automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 32 the clustering of operations is based on the relation of their rectangles being close to each other. definition 1 formalizes the notion of being close, making it useful for the algorithm. definition 1: two rectangles are considered to be located close to each other if they are intersecting after expanding their boundaries in every direction by a margin. the value by which rectangles should be extended is a parameter of the algorithm and might be different in various situations. to detect if rectangles are close to each other, we needed a data structure allowing the storage a set of rectangles. this data structure was required to allow retrieving all stored rectangles that intersect a given one. we have constructed the necessary structure using an important observation about the operation result areas. in our model all bounding rectangles have their edges parallel to the edges of the reference canvas on which the output of the operators is rendered. this allowed us to reduce our problem from the case of 2-dimensional rectangles to the case of 1-dimensional intervals. we can assume that edges of the rectangular canvas define the coordinates system. it is easy to prove that two rectangles of edges parallel to the axis of the coordinates system intersect only if both their projections in the directions of axis intersect. the projection of a rectangle into an axis is always an interval. the observation made above has allowed us to build the required 2-dimensional data structure by remembering two 1-dimensional data structures that recall a number of intervals and for a given interval return the set of intersecting ones. such a 1-dimensional data structure has been provided by interval-trees.20 every interval inside the tree has an arbitrary object assigned to it, which in this case is a representation of the pdf operator. this object can be treated as an identifier of the interval. the data structure also implements a dictionary interface, mapping objects to actual intervals. at the beginning, the algorithm initializes two empty interval trees representing projections on the x and y axes, respectively. those trees store values about projections of the biggest so-far calculated areas rather than about particular operators. each cluster is represented by the most recently discovered operation belonging to it. during the algorithm execution, each operator from the input set is considered only once. the order of processing is not important. the processing of a single operator proceeds as follows (the interior of the outermost “for all” loop of the algorithm). 1. the boundary of the operation is extended by the width of margins. the spatial data structure described earlier is utilized to retrieve boundaries of all already detected clusters (lines 9–10) 2. the forest of trees representing clusters is updated. the currently processed operation is added without a parent. roots of all trees representing intersecting clusters (retrieved in previous step) are attached as children of the new operation. information technology and libraries | december 2013 33 3. the boundary of the processed operation is extended to become the smallest rectangle containing all boundaries of intersecting clusters and the original boundary. finally, all intersecting clusters are removed from the spatial data structure. 4. lines 9–17 of the algorithm are repeated as long as there exist areas intersecting the current boundary. in some special cases, more than one iteration may be necessary. 5. finally, the calculated boundary is inserted into the spatial data structure as a boundary of a new cluster. the currently processed operation is designed to represent the cluster and so is remembered as a representation of the cluster. after processing all available operations, the post–processing phase begins. all the trees are transformed into lists. the resulting data structure is a dictionary having boundaries of detected clusters as keys and lists of belonging operations as values. this is achieved in lines 21–29. during the process of retrieving the cluster to which a given operation belongs, we use a technique called path compression, known from the union-find data structure.21 filtering of clusters graphical areas detected by a simple clustering usually do not directly correspond to figures. the main reason for this is that figures may contain not only graphics, but also portions of text. moreover, not all graphics present in the document must be part of a figure. for instance, common graphical elements not belonging to a figure include logos of institutions and text separators like lines and boxes; various parts of mathematical formulas usually include graphical operations; and in the case of slides from presentations, the graphical layout should not be considered part of a figure. the above shows that the clustering algorithm described earlier is not sufficient for the purpose of figures detection and it yields a results set wider than expected. in order to take into account the aforementioned characteristics, pre-calculated graphical areas are subject to further refinement. this part of the processing is highly domain-dependent as it is based on properties of scientific publications in a particular domain, in this case publications of hep. in the course of the refinement process, previously computed clusters can be completely discarded, extended with new elements, or some of their parts might be removed. in this subsection we discuss the heuristics applied for rejecting and splitting clusters of graphical operators. there are two main reasons for rejecting a cluster. the first of them is a size being too small compared to a page size. the second is the figure candidate having its aspect ratio outside a desired interval of values. the first heuristic is designed to remove small graphical elements appearing for example inside mathematical formulas, but also small logos and other decorations. the second one discards text separators and different parts of mathematical equations, such as a line-separating numerator from a denominator inside a fraction. the thresholds used for filtering are provided as automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 34 configurable properties of the algorithm and their values are assigned experimentally in a way maximising the accuracy of figures detection. additionally, the analysis of the order of operations forming the content stream of a pdf document may help to split clusters that were incorrectly joined by algorithm 1. parts of the stream corresponding to logical parts of the document usually form a consistent subsequence. this observation allows the construction of a method of splitting elements incorrectly clustered together. we can assign content streams not only to entire pdf documents or pages, but also to every cluster of operations. the clustering algorithm presented in algorithm 1 returns a set of areas with a list of operations assigned to each of them. the content stream of a cluster consists of all operations from such a set ordered in the same manner as in the original content stream of the pdf document. the usage of the original content stream allows us to define a distance in the content stream as follows: definition 2. if o 1 and o 2 are two operations appearing in the content stream of the pdf document, by the distance between these operations we understand the number of textual and graphical operations appearing after the first of them and before the second of them. to detect situations when a figure candidate contains unnecessary parts, the content stream of a figure candidate is read from the first to the last operation. for every two subsequent operations, the distance between them in the sense of the original content stream is calculated. if the value is larger than a given threshold, the content stream is split into two parts, which become separate figure candidates. for both candidates, a new boundary is calculated. this heuristic is especially important in the case of less formal publications such as slides from presentations at conferences. presentation slides tend to have a certain number of graphics appearing on every page and not carrying any meaning. simple geometrical clustering would connect elements of page style with the rest of the document content. measuring the distance in the content stream and defining a threshold on the distance facilitates the distinction between the layout and the rest of the page. this technique also might be useful to automatically extract the template used for a presentation, although this transcends the scope of this publication. clustering of textual operators the same algorithm that clusters graphical elements can cluster parts of text. detecting larger logically consistent parts of text is important because they should be treated as single entities during subsequent processing. this comprises, for example, inclusion inside a figure candidate (e.g., captions of axes, parts of a legend) and classification of a text paragraph as a figure caption. inclusion of text parts the next step in figures extraction involves the inclusion of lost text parts inside figure candidates. information technology and libraries | december 2013 35 at the stage of operations clustering, only the operations of the same type (graphical or textual) were considered. the results of those initial steps become the input to the clustering algorithm that will detect relations between previously detected entities. by doing this, we move one level farther in the process of abstracting from operations. we start from basic meaningless operations. later we detect parts of graphics and text, and finally we are able to see the relations between both. not all clusters detected at this stage are interesting because some might consist uniquely of text areas. only those results that include at least one graphical cluster may be subsequently considered figure candidates. another round of heuristics marks unnecessary intermediate results as deleted. applied methods are very similar to those described in “filtering of clusters” (above), only thresholds deciding on the rejections must change because we operate on geometrically much larger entities. also the way of application is different—candidates rejected at this stage can be later restored to the status of a figure. instead of permanently removing, heuristics of this stage only mark figure candidates as rejected. this happens in the case of the candidates having incorrect aspect ratio, incorrect sizes or consisting only of horizontal lines (which is usually the case with mathematical formulas but also tables). in addition to using the aforementioned heuristics, having clusters consisting of a mixture of textual and graphical operations allows the application of new heuristics. during the next phase, we analyze the type of operations rather than their relative location. in some cases, steps described earlier might detect objects that should not be considered a figure, such as text surrounded by a frame. this situation can be recognized by the calculation of a ratio between the number of graphical and textual operations in the content stream of a figure candidate. in our approach we have defined a threshold that indicates which figure candidates should be rejected because they contain too few graphics. this allows the removal of, for instance, blocks of text decorated with graphics for aesthetic reasons. the ratio between numbers of graphical and textual operations is smaller for tables than for figures, so extending the heuristic with an additional threshold could improve the table–figure distinction. another heuristic analyzes ratio between the total area of graphical operations and the area of the entire figure candidate. subsequently, we mark as deleted the figure candidates containing horizontal lines as the only graphical operations. these candidates describe tables or mathematical formulas that have survived previous steps of the algorithm. tables can be reverted to the status of figure candidates in later stages of processing. figure candidates that survive all the phases of filtering are finally considered to be figures. figure 2 shows a fragment of a publication page with indicated text areas and final figure candidates detected by the algorithm. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 36 figure 2. a fragment of the pdf page with boxes around every detected text area and each figure candidate. dashed rectangles indicate figure candidates. solid rectangles indicate text areas. detection and matching of captions the input of the part of the algorithm responsible for detecting figure captions consists of previously determined figures and all text clusters. the observation of scientific publications shows that, typically, captions of figures start with a figure identifier (for instance see the grammar for figure captions proposed by bathia, lahiri, and mitra.22 the identifier usually starts with a word describing a figure type and is followed by a number or some other unique identifier. in more complex documents, the figure number might have a hierarchical structure reflecting, for example, the chapter number. the set of possible figure types is very limited. in the case of hep publications, the most usual combinations include words “figure”, “plot,” and different variations of their spelling and abbreviating. information technology and libraries | december 2013 37 during the first step of the caption detection, all text clusters from the publication page are tested for the possibility of being a caption. this consists of matching the beginning of the text contained in a textual cluster with a regular expression determining what is a figure caption. the role of the regular expression is to elect strings starting with one of the predefined words, followed by an identifier or beginning of a sentence. the identifier is subsequently extracted and included in the metadata of a caption. the caption detection has to be designed to reject paragraphs of the type “figure 1 presents results of (. . .)”. to achieve this, we reject the possibility of having any lowercase text after the figure identifier. having the set of all the captions, we start searching for corresponding figures. all previous steps of the algorithm take into account the division of a page into text columns (see “detection of the page layout” below). when matching captions with figure candidates, we do not take into account the page layout. matching between figure candidates and captions happens at every document page separately. we consider every detected caption once, starting with those located at the top of the page and moving down toward the end. for every caption we search figure candidates lying nearby. first we search above the caption and, in the case of failure, we move below the caption. we take into account all figure candidates, including those rejected by heuristics. in the case of finding multiple figure candidates corresponding to a caption, we merge them into a single figure, treating previous candidates as subfigures of a larger figure. we also include small portions of text and graphics previously rejected from figure candidates that lie between figure and caption and between different parts of a figure. these parts of text usually contain identifiers of the subfigures. the amount of unclustered content that can be included in a figure is a parameter of the extraction algorithm and is expressed as a percentage of the height of the document page. it might happen that captions are located in a completely different location, but this case is rare and tends to appear in older publications. the distance from the figure is calculated based on the page geometry. the captions should not be too distant from the figure. generation of the output the choice of the format in which data should be saved at the output of the extraction process should take into account further requirements. the most obvious use case of displaying figures to end users in response to text-based search queries does not yield very sophisticated constraints. a simple raster graphic annotated with captions and possibly some extracted portions of metadata would be sufficient. unfortunately, the process of generating raster representations of figures might lose many important pieces of information that could be used in the future for an automatic analysis. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 38 to store as much data as possible, apart from storing the extracted figures in a raster format (e.g., png), we also decided to preserve their original vector character. vector graphics formats, similarly to pdf documents, contain information about graphical primitives. primitives can be organized in larger logical entities. sometimes rendering of different primitives leads to a modification of the same pixel of resulting image. such a situation might happen, for example, when circles are used to draw data points lying nearby on the same plot. to avoid such issues, we convert figures into scalable vector graphics (svg) format.23 on the implementation level, the extraction of vector representation of a figure proceeds in a manner similar to regular rendering of a pdf document. the interpreter preserves the same elements of the state and allows their modification by transformation operations. a virtual canvas is created for every detected figure. the content stream of the document is processed and all the transformation operations are executed modifying the interpreter’s state. the textual and graphical operators are also interpreted, but they affect only the appropriate canvas of the figure to which the operation belongs. if a particular operation does not belong to any figure, no canvas is affected. the behaviour of graphical canvases used during the svg generation is different from the case of raster rendering. instead of creating graphical output, every operation is transformed into a corresponding primitive and saved within an svg file. pdf was designed in such a manner that the number of external dependencies of a file is minimized. this design decision led to the inclusion of the majority of fonts in the document itself. it would be possible to embed font glyphs in the svg file and use them to render strings. however, for the sake of simplicity, we decided to omit font definitions in the svg output. a text representation is extracted from every text operation, and the operation is replaced by a svg text primitive with a standard font value. this simplification affects what the output looks like, but the amount of formatting information that is lost is minimal. moreover, this does not pose a problem because vector representations are intended to be used during automatic analysis of figures rather than for displaying purposes. a possible extension of the presented method could involve embedding complete information about used glyphs. finally, the generation of the output is completed with some metadata elements. an exhaustive categorization of the metadata that can be compiled for figures could be the customization of the one proposed by liu et al. for table metadata.24 in the case of figures, the following categories could be distinguished: (1) environment/geography metadata (information of the document where the figure is located); (2) affiliated metadata (e.g., captions, references, or footnotes); (3) layout metadata (information about the original visualization of the figure); (4) content data; and (5) figure type metadata. for the moment, we compile only environment/geography metadata and affiliated metadata. the geography/environment metadata consists of the document title, the document authors, the document date (creation and publication), and the exact location of a figure inside a publication information technology and libraries | december 2013 39 (page and boundary). most of these elements are provided by simply referencing the original publication in the inspire repository. the affiliated metadata consists of the text caption and the exact location of the caption in the publication (page and boundary). in the future, metadata from other categories will be annotated for each figure. detection of the page layout figure 3. sample page layouts that might appear in a scientific publication. the black color indicates areas where content is present. in this section we discuss how to detect the page layout, an issue which has been omitted in the main description of the extraction algorithm, but which is essential for an efficient detection of figures. figure 3 depicts several possibilities of organising content on the page. as mentioned in previous sections, the method of clustering operations based on their geometrical position may fail in the case of documents having a complex page layout. the content appearing in different columns should never be considered belonging to the same figure. this cannot be assured without enforcing additional constrains during the clustering phase. to address this difficulty, we enhanced the figure extractor with a pre-processing phase of detecting the page layout. being able to identify how the document page is divided into columns enables us to execute the clustering within every column separately. it is intuitively obvious, what can be understood as a page layout, although to provide a method of calculating such, we need a more formal definition, which we provide below. by the layout of a page, we understand a particular division of a page into areas called columns. each area is a sum of disjoint rectangles. the division of a page into areas must satisfy a set of conditions summarized in definition 3. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 40 definition 3: let p be a rectangle representing the page. the set d containing subareas of a page is called a page division if and only if � 𝑄 = 𝑃 𝑄∈𝐷 ∀𝑥,𝑦∈𝐷𝑥 ∩ 𝑦 = ∅ ∀𝑄∈𝐷𝑄 ≠ ∅ ∀𝑄∈𝐷∃𝑅=�𝑥:𝑥 𝑖𝑠 𝑎 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒,∀𝑦∈r\{x} 𝑦∩x=∅ �𝑄 = � 𝑥𝑥∈𝑅 every element of a division is called a page area. to be considered a page layout, borders of areas from the division must not intersect the content of the page. definition 3 does not guarantee that the layout is unique. a single page might be assigned different divisions satisfying the definition. additionally, not all valid page layouts are interesting from the point of view of figures detection. the segmentation algorithm calculates one of such divisions, imposing additional constraints on the detected areas. the layout-calculation procedure utilizes the notion of separators, introduced by definition 4. definition 4: a vertical (or horizontal) line inside a page or on its borders is called a separator if its horizontal (vertical) distance from the page content is larger than a given constant value. the algorithm consists of two stages. first, the vertical separators of a sufficient length are detected and used to divide the page into disjoint rectangular areas. each area is delimited by two vertical lines, each of which forms a consistent interval inside of one of the detected vertical separators. at this stage, horizontal separators are completely ignored. figure 4 shows a fragment of a publication page processed by the first stage of the layout-detection. the upper horizontal edge of one of the areas lies too close too close to two text lines. with the constant of the definition 4 chosen to be sufficiently large, this edge would not be a horizontal separator and thus the generated division of the page would require additional processing to become a valid page layout. the second stage of the algorithm transforms the previously detected rectangles into a valid page layout by splitting rectangles into smaller parts and by joining appropriate rectangles to form a single area. information technology and libraries | december 2013 41 figure 4. example of intermediate layout-detection results requiring the refinement. algorithm 2 shows the pseudo-code of the detection of vertical separators. the input of the algorithm consists of the image of the publication page. the output is a list of vertical separators aggregated by their x-coordinates. every element of this list consists of two elements: an integer indicating the x-coordinate and the list of y-coordinates describing the separators. the first element of this list indicates the y-coordinate of the beginning of the first separator. the second element is the y-coordinate of the end of the same separator. the third and fourth elements describe the second separator and the same mechanism is used for the remaining separators (if they exist). the algorithm proceeds according to the sweeping principle known from the computational geometry.25 the algorithm reads the publication page starting from the left. for every xcoordinate value, a set of corresponding vertical separators is detected (lines 9–18). vertical separators are searched as consistent sequences of blank points. a point is considered blank if all the points in its horizontal surrounding of the radius defined by the constant from definition 5 are of the background colour. not all blank vertical lines can be considered separators. short, empty spaces usually delimit lines of text or different small units of the content. in line 11 we test detected vertical separators for being long enough. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 42 if a separator has been detected in a particular column of a publication page, the adjacent columns also tend to contain similar separators. lines 19–31 of the algorithm are responsible for electing the longest candidate among the adjacent columns of the page. the maximization is performed across a set of adjacent columns for which at least one separator exists. algorithm 2. detecting vertical separators. the detected separators are used to create the preliminary division of the page, similar to the one from the example of figure 4. as with the previous step, separators are considered one by one in the order of increasing x coordinate. at every moment of the execution, the algorithm maintains a division of the page into rectangles. this division corresponds only to the already detected vertical separators. updating the previously considered division is facilitated by processing separators in a particular well-defined order. 1: input: the page image 2: output: vertical separators of the input page 3: list> separators ← ∅ 4: int max_weight ← 0; 5: boolean maximizing ← false 6: for all x ∈ {minx … maxx} do 7: emptyb ← 0, current_eval ← 0 8: empty_areas ← list() 9: for all y ∈ {0 … page_height} do 10: if point at (x, y) is not blank then 11: if y – emptyb – 1 > heightmin then 12: empty_areas.append(emptyb) 13: empty_areas.append(y = page_height? y : y-1) 14: current_eval ← current_eval + y emptyb 15: end if 16: emptyb ← y + 1 17: end if 18: end for {we have already processed the entire column. now we are comparing with adjacent already processed columns} 19: if max_weight < current_eval then 20: max_weight ← current_eval 21: max_separators ← empty_areas 22: maxx ← x 23: end if 24: if maximising then 25: if empty_areas = ∅ then 26: separators.add() 27: maximising ← false, max_weight ← 0 28: end if 29: else 30: maximising ← (empty_areas ≠ ∅) 31: end if 32: end for 33: return separators information technology and libraries | december 2013 43 before presenting the final outcome, the algorithm must refine the previously calculated division. this happens in the second phase of the execution. all the horizontal borders of the division are then moved along adjacent vertical separators until they become horizontal separators in the sense of definition 4. typically, moving the horizontal borders result in dividing already existing rectangles into smaller ones. if such a situation happens, both newly created parts are assigned to different page layout areas. sometimes when moving separators is not possible, different areas are combined together, forming a larger one. tuning and testing the extraction algorithm described here has been implemented in java and tested on a random set of scientific articles coming from the inspire repository. the testing procedure has been used to evaluate the quality of the method, but also allowed to tweak the parameters of the algorithm to maximize the outcomes. preparation of the testing set to prepare the testing set, we randomly selected 207 documents stored in inspire. in total, these documents consisted of 37,28 pages which contained 1,697 figures altogether. the records have been selected according to a uniform probability distribution across the entire record space. this way, we have created a collection that is representative for the entire inspire including historical entries. currently, inspire consists of: 1,140 records describing publications written before 1950; 4,695 between 1950 and 1960; 32,379 between 1960 and 1970; 108,525 between 1970 and 1980; 167,240 between 1980 and 1990; 251,133 between 1990 and 2000; and 333,864 in the first decade of the twenty-first century. in total, up to july 2012, inspire manages 952,026 records. it can be seen that the rate of growth has increased with time and most of inspire documents come from the last decade. the results on such a testing set should accurately estimate the efficiency of extraction for existing documents but not necessarily for new documents, being ingested into inspire. this is because inspire contains entries describing old articles which were created using obsolete technologies or scanned and encoded in pdf. the extraction algorithm is optimized for born-digital objects. to test the hypothesis that the extractors provides better results for newer papers, the testing set has been split into several subsets. the first set consists of publications published before 1980. the rest of the testing set has been split into subsets corresponding to decades of publication. to simplify the counting of correct figure detections and to provide a more reliable execution and measurement environment, every testing document has been split into many of pdf documents consisting of a single page. subsequently, every single page document has been manually annotated with the number of figures appearing inside. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 44 execution of the tests the efficient execution of the testing was possible thanks to a special script executing the plots extractor on every single page separately and then computing the total number of successes and failures. the script allows the execution of tests in a distributed heterogeneous environment and allows dynamic connection and disconnection of computing nodes. in the case of a software failure, the extraction request is resubmitted to a different computation node, allowing the avoidance problems related to a worker node configuration rather than to the algorithm implementation itself. during the preparation of the testing set, we manually annotated all the expected extraction results. subsequently, the script compared these metadata with the output of the extractor. using aggregated numbers from all extracted pages allowed us to calculate efficiency measures of the extraction algorithm. as quality measures, we used recall and precision.26 their definitions are included in the following equations: at every place where we needed a single comparable quality measure rather than two semiindependent numbers, we have used a harmonic average of the precision and the recall.27 table 1 summarizes the results obtained during the test execution for every subset of our testing set. figure 5 shows the dependency of recall and precision on the time of publication. the extractor parameters used in this test execution were chosen based on intuition and small number of manually triggered trials. in the next section we describe an automatic tuning procedure we have used to find the most optimal algorithm arguments. testsettheinpresentfigures figuresextractedcorrectly recall # # = figuresextracted figuresextractedcorrectly precision # # = information technology and libraries | december 2013 45 –1980 1980–90 1990–2000 2000–10 2010–12 number of existent figures 114 60 170 783 570 number of correctly detected figures 59 53 164 703 489 number of incorrectly detected figures 26 78 65 40 73 total number of pages 85 136 760 1919 828 number of correctly processed pages 20 44 712 1816 743 table 1. results of the test execution. figure 5. recall and precision as functions of decade of the date of the publication. it can be seen that, as expected, the efficiency increases with the increasing time of publication. a total recall and precision for all samples since 1990, which constitutes a majority of the inspire corpus, were both 88 percent. precision and recall based on the correctly detected figures do not give a full image of the algorithm efficiency because the extraction has been executed on a number of pages not containing any figures. the correctly extracted pages not having any figures do not appear in the recall and precision statistics because in their case the expected and detected number of figures are both equal to 0. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 46 besides recall and precision, figure 5 depicts also the fraction of pages that have been extracted correctly. taking into account the samples since 1990, 3,271 pages out of 3,507 have been detected completely correctly, which makes 93 percent success rate counted by number of pages. as it can be seen, this measure is higher than both the precision and the recall. the analysis of the extractor results in the case of failure shows that in many cases, even if results are not completely correct, they are not far from the expectation. there are different reasons of the algorithm failing. some of them may result from non-optimal choice of algorithm parameters, others from document layout being too far from the assumed one. in some rare cases, even manual inspection of the document does not allow an obvious identification of figures. the automatic tuning of parameters in previous section we have shown the results obtained by executing the extraction algorithm on a sample set. during this execution we were using extractor arguments which seemed to be the most correct based on our observation but also on other research (typical sizes of figures, margin sizes, etc.).28 this way of algorithm configuration was useful during the development, but is not likely to yield the best possible results. to find better parameters, we have implemented a method of automatic tuning. metrics described in the previous section provided a good method of measuring the efficiency of the algorithm running based on given parameters. the choice of optimal parameters can be relative to the choice of documents on which the extraction is to be performed. the way in which the testing set has been selected, allowed us to use it as representative for the hep publications. to tune the algorithm, we have used a described subset of testing set from the previous step as a reference. the subset consisted of all entries created after 1990. this allowed us to minimize the presence of scanned documents which, by design, cannot be correctly processed by our method. the adjustment of parameters has been performed by a dedicated script which has executed the extraction using various parameter values and has read results. the script has been configured with a list of tuneable parameters together with their type and allowed values range. additionally, the script had the knowledge of the believed best value, which was the one used in previous testing. to decrease the complexity of training, we have made several assumptions about the parameters. these assumptions are only an approximation of real nature of parameters, but the practice has shown that they are good enough to permit the optimization: • we assume that the precision and recall are continuous with respect to the parameters. this allows us to assume that efficiency of the algorithm for parameter values close to a given one will be close. the optimization has proceeded by sampling the parametric space in a number of points and executing tests using the selected points as parameter values. information technology and libraries | december 2013 47 having n parameters to optimize and dividing the space of every parameter into m regions leads to the execution of mn tests. execution of every test is a timely operation due to the size of the training set. • we assume that parameters are independent from each other. this means that we can divide the problem of finding an optimal solution in the n-dimensional space of n configuration arguments into finding n solutions in 1-dimensional subspaces. such an assumption seems to be intuitive and considerably reduces the number of necessary tests from o(mn) to o(m⋅n), where m is the number of samples taken from a single dimension. in our tests, the parametric space has been divided into 10 equal intervals in every direction. in addition to checking the extraction quality in those points, we have executed one test for the so-far best argument. in order to increase the level of fine-tuning of the algorithm, each test has been reexecuted in the region, where chances of finding a good solution were considered the highest. this consisted of a region centred around the highest result and having a radius of 10 percent of the parameter space. figure 6 and figure 7 show the dependency of the recall and the precision on an algorithm parameter. the parameter depicted in figure 6 indicates what minimal aspect ratio the figure candidate must have in order to be considered a correct figure. it can be seen that tuning this heuristic increases the efficiency of the extraction. moreover, the dependency of recall and precision on the parameter is monotonic which is the most compatible with the chosen optimization method. the parameter of figure 7 specifies which fraction of the area of the entire figure candidate has to be occupied by graphical operations. this parameter has a lower influence on the extraction efficiency. such a situation can happen when more than one heuristic influences the same aspect of the. this is contradictory with the assumption of parameter independence, but we have decided to use the present model for the simplicity. figure 6. effect of the minimal aspect ratio on precision and recall. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 48 figure 7. effect on the precision and recall of the area fraction occupied by graphical operations. after executing the optimization algorithm, we have managed to achieve a recall of 94.11 percent and a precision of 96.6 percent, which is a considerable improvement compared to previous results of 88 percent. conclusions and future work this work has presented a method for extracting figures from scientific publications in a machinereadable format, which is the main step toward the development of services enabling access and search of images stored in scientific digital libraries. in recent years, figures have been gaining increasing attention in the digital libraries community. however, little has been done to decipher the semantics of these graphical representations and to bridge the semantic gap between content, which can be understood by machines and this which is managed by digital libraries. extracting figures and storing them in uniform and machine-readable format constitutes the first step towards the extraction and the description of the internal semantics of figures. storing semantically described and indexed figures would open completely new possibilities of accessing the data and discovering connections between different types of publishing artefacts and different resources describing related knowledge.29 our method of detecting fragments of pdf documents that correspond to figures is based on a series of observations of the character of publications. however, tests have shown that additional work is needed to improve the correctness of the detection. also, the performance should be reevaluated after we have a large set of correctly annotated figures, confirmed by users of our system. the heuristics used by the algorithm are based on a number of numeric parameters that we have tried to optimize using automatic techniques. the tuning procedure has made several arbitrary assumptions on the nature of the dependency between parameters and extraction results. a future approach to the parameter optimization, requiring much more processing, could information technology and libraries | december 2013 49 involve the execution of a genetic algorithm that would treat the parameters as gene samples.30 this could potentially allow a discovery of a better parameter set because a smaller set of assumptions would be imposed on the parameters. a vector of algorithm parameters could play the role of a gene and random mutations could be introduced to previously considered and subsequently crossed genes. the evaluation and selection of surviving genes could be performed by the usage of the metrics described previously. another approach to improving the quality of the tuning could involve extending the present algorithm by a discovery of mutually dependent parameters and usage of special techniques (relaxing the assumptions) to fine-tune in subspaces spanned by these parameters. all of our experiments have been performed using a corpus of publications from hep. the usage of the extraction algorithm on a different corpus would require tuning the parameters for the specific domain of application. for the area of hep, we can also consider preparing several sets of execution parameters varying by decade of document publication or by other easy to determine characteristics. subsequently, we could decide which extraction method to run, based on those metrics. in addition to a better tuning of the existing heuristics, there are improvements that can be made at the level of the algorithm. for example, we could mention extending the process of clustering text parts. in the current implementation, the margins by which textual operations are extended during the clustering process are fixed as algorithm parameters. this approach proved to be robust in most cases. in fact, distances between text lines tend to be different depending on the currently utilized style. every text portion tends to have one style that dominates. an improved version of the text-clustering algorithm could use local rather than global properties of the content. this would not only allow to correctly handle the entire document written using different text styles, but also help to manage cases of single paragraphs differing from the rest of the content. another important, not-yet-implemented improvement related to figure metadata is the automatic extraction of figure references from the text content. important information about figure content might be stored in the surroundings of the place where publication text refers to a figure. furthermore, the metadata could be extended by the usage of some type of classifier that would assign a graphics type to the extracted result. currently, we are only distinguishing between tables and figures based on simple heuristics involving number and type of graphical areas and the text inside of the detected caption. in the future, we could detect line-plots from photos, histograms and so on. such a classifier could be implemented using artificial intelligence techniques such as support vector machines.31 finally, partial results of the figures extraction algorithm might be useful in performing other pdf analyses: • the usage of clustered text areas could allow a better interpretation and indexing of textual content stored in digital libraries with full-text access. clusters of text tend to describe automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 50 logical parts like paragraphs, section and chapter titles, etc. a simple extension of the current schema could allow the extraction of predominant formatting style of the text encoded in a page area. text parts written in different styles could be indexed in a different manner giving for instance more importance to segments written with larger font. • we mentioned that the algorithm detects not only figures, but also tables. a heuristic is being used in order to distinguish tables from different types of figures. our present effort concentrates on correct treatment of figures, but a useful extension could allow extraction of different types of entities. for instance, another common type of content ubiquitous in hep documents are mathematical formulas. thus, in addition to figures, it would be important to extract tables and formulas in structured format allowing a further processing. the internal architecture of the implemented prototype of the figure extractor allows easy implementation of extension modules which can compute other properties of pdf documents. acknowledgements this work has been partially supported by cern, and the spanish government through the project tin2012-37826-c02-01. references 1. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical comittee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 2. marti a. hearst et al., “exploring the efficacy of caption search for bioscene journal search interfaces,” proceedings of the workshop on bio nlp 2007: biological, translational and clinical language processing: 73–80, http://dl.acm.org/citation.cfm?id=1572406. 3. lisa johnston, “web reviews: see the science: scitech image databases,” sci-tech news 65, no. 3 (2011), http://jdc.jefferson.edu/scitechnews/vol65/iss3/11. 4. annette holtkamp et al., “inspire: realizing the dream of a global digital library in highenergy physics,” 3rd workshop conference: towards a digital mathematics library, paris, france (july 2010): 83–92. 5. piotr praczyk et al., “integrating scholarly publications and research data—preparing for open science, a case study from high-energy physics with special emphasis on (meta)data models,” metadata and semantics research—ccis 343 (2012): 146–57. 6. piotr praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries: the case study of invenio,” 4th workshop on very large digital libraries (vldl 2011), berlin, germany (2011). http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html http://dl.acm.org/citation.cfm?id=1572406 http://jdc.jefferson.edu/scitechnews/vol65/iss3/11 information technology and libraries | december 2013 51 7. “sciverse science direct: image search,” elsevier, http://www.info.sciverse.com/sciencedirect/using/searching-linking/image. 8. guenther eichhorn, “trends in scientific publishing at springer,” in future professional communication in astronomy ii (new york: springer, 2011), doi: 10.1007/978-1-4419-83695_5. 9. william browuer et al., “segregating and extracting overlapping data points in twodimensional plots,” proceedings of the 8th acm/ieee-cs joint conference on digital libraries (jcdl 2008), new york: 276–79. 10. saurabh kataria et al., “automatic extraction of data points and text blocks from 2dimensional plots in digital documents,” proceedings of the 23rd aaai conference on artificial intelligence, (2008) chicago: 1169–1174. 11. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical committee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 12. ying liu et al., “tableseer: automatic table metadata extraction and searching in digital libraries,” proceedings of the 7th acm/ieee-cs joint conference on digital libraries (jcdl’07), vancouver (2007): 91–100. 13. william s. cleveland, “graphs in scientific publications,” american statistician, 38, no. 4, (1984): 261–69, doi: 10.1080/00031305.1984.10483223. 14. hui chao and jian fan, “layout and content extraction for pdf documents,” document analysis systems vi, lecture notes in computer science 3163 (2004): 213–24. 15. at every moment of the execution of a postscript program, the interpreter maintains many variables. some of them encode current positions within the rendering canvas. such positions are used to locate the subsequent character or to define the starting point of the subsequent graphical primitive. 16. transformation matrices are encoded inside the interpreters’ state. if an operator requires arguments indicating coordinates, these matrices are used to translate the provided coordinates to the coordinate system of the canvas. 17. graphical operators are those that trigger the rendering of a graphical primitive. 18. textual operations are the pdf instructions that cause the rendering of the text. textual operations receive the string representation of the desired text and use the current font, which is saved in the interpreters’ state. 19. operations that do not produce any visible output, but solely modify the interpreters’ state. 20. herbert edelsbrunner and hermann a. maurer, “on the intersection of orthogonal objects,” information processing letters 13, nos. 4, 5 (1981): 177–81. http://www.info.sciverse.com/sciencedirect/using/searching-linking/image http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 52 21. thomas h. cormen, charles e. leiserson, and ronald l. rivest, introduction to algorithms, (cambridge: mit electrical engineering and computer science series, 1990). 22. sumit bhatia, shibamouli lahiri, and prasenjit mitra, “generating synopses for documentelement search,” proceedings of the 18th acm conference on information and knowledge management, new york (2009): 2003–6, doi: 10.1145/1645953.1646287. 23. jon ferraiolo, ed., “scalable vector graphics (svg) 1.0 specification,” w3c recommendation 01 september 2001, http://www.w3.org/tr/svg10/. 24. liu et al., “tableseer.” 25. cormen, leiserson, and rivest, introduction to algorithms. 26. ricardo a. baeza-yates and berthier ribeiro-neto, modern information retrieval,” (boston: addison-wesley, 1999). 27. ibid. 28. cleveland, “graphs in scientific publications.” 29. praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries.” 30. stuart russell and peter norvig, artificial intelligence: a modern approach (third edition) (prentice hall, 2009). 31. sergios theodoridis and konstantinos koutroumbas, pattern recognition (third edition) (boston, academic press, 2006). http://www.w3.org/tr/svg10/ pre-processing of operators clustering of graphical operators the clustering algorithm filtering of clusters clustering of textual operators inclusion of text parts detection and matching of captions generation of the output detection of the page layout preparation of the testing set execution of the tests the automatic tuning of parameters public access technologies in public libraries | bertot 81 john carlo bertot public access technologies in public libraries: effects and implications public libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s. the landscape of public-access internet and computing was substantially different in the 1990s as the world wide web was only in its initial development. at that time, public libraries essentially experimented with publicaccess internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources. this article explores the implications for public libraries of the provision of pat and seeks to look further to review issues and practices associated with pat provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access and concludes with recommendations that require continued exploration. p ublic libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s.1 in 1994, 20.9 percent of public libraries were connected to the internet, and 12.7 percent offered public-access computers. by 1998, internet connectivity in public libraries grew to 83.6 percent, and 73.3 percent of public libraries provided public internet access.2 the landscape of public-access internet and computing was substantially different in the 1990s, as the world wide web was only in its initial development. at that time, public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources.3 using case studies conducted at thirty-five public libraries in five geographically dispersed and demographically diverse states, this article explores the implications for public libraries of the provision of pat. the researcher also conducted interviews with state library agency staff prior to visiting libraries in each state. the goals of this article are to n explore the level of support pat requires within public libraries; n explore the implications of pat on public libraries, including management, building planning, staffing, and other support issues; n explore current pat support practices; n identify issues and challenges public libraries face in maintaining and supporting their pat infrastructure; and n identify factors that contribute to successful pat practices. this article seeks to look beyond the provision of pat by public libraries and review issues and practices associated with pat–provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access, and it concludes with recommendations that require continued exploration. n literature review quickly over time, public libraries increased their public-access provision substantially (see figures 1 and 2). connectivity grew from 20.9 percent in 1994 to nearly 100 percent in 2006.4 moreover, nearly all libraries that connected to the internet offered public-access internet services. simultaneously, the average number of publicaccess computers grew from 1.9 per public library in 1996 to 12 per public library in 2007.5 accompanying and in support of the continual growth of basic connectivity and computing infrastructure was a demand for broadband connectivity. indeed, since 1994, connectivity has progressed from dial-up phone lines to leased lines and other forms of high-speed connectivity. the extent of the growth in public-access services within public libraries is profound and substantive, leading to the development of new internet-based service roles for public libraries.6 and public access to the internet through public libraries provides a number of community benefits to different populations within served communities.7 overlaid onto the public-access infrastructure is an increasingly complex service mix that now includes access to digital content (e.g., databases and digital john carlo bertot (jbertot@umd.edu) is professor and director of the center for library innovation in the college of information studies at the university of maryland, college park. 82 information technology and libraries | june 2009 libraries), integrated library systems (ilss), voice over internet protocol (voip), digital reference, and a host of other services and resources—some for public access, others for back-office library operations. and patrons do use these services in increasing amounts—both in the library and in everyday life.8 in fact, 82.5 percent of public libraries report that they do not have an adequate number of public-access computers some or all of the time and have resorted to time limits and wireless access to extend public-access services.9 by 2007, as connectivity and public-access computer infrastructure grew, so ensued the need to provide a range of publicly available services and resources: n 87.7 percent of public libraries provide access to licensed databases n 83.4 percent of public libraries offer technology training n 74.1 percent of public libraries provide e-government services (e.g., locating government information and helping patrons complete online applications) n 62.5 percent of public libraries provide digital reference services n 51.8 percent of public libraries offer access to e-books10 the list is not exhaustive, but illustrative, since libraries do offer other services such access to homework resources, video content, audio content, and digitized collections. as public libraries expanded these services, management realized that they needed to plan and evaluate technology-based services. over the years, a range of technology management, planning, and evaluation resources emerged to help public libraries cope with their technology-based resources—those both publicly available and for administrative operations.11 but increasingly, public libraries report the strain that pat services promulgate. this centers on four key areas: n maintenance and management. the necessary maintenance and management requirements of pat places an additional burden on existing staff, many of whom do not possess technology expertise to troubleshoot, fix, and support internet-based services and resources that patrons access. n staff. libraries consistently cite staff expertise and availability as a barrier to the addition, support, and management of pat. indeed, as described in previous sections, some libraries have experienced a decline in library staff. n finances. there is evidence of stagnant funding for libraries at the local level as well as a shift in expenditures from staff and collections to operational costs such as utilities and maintenance. n buildings. the buildings are inadequate in terms of space and infrastructure (e.g., wiring and cabling) to support additional public access.12 this article explores these four areas through a sitevisit method in an effort to go beyond a quantitative assessment of pat within the public library community. though related in terms of topic area and author, this study was conducted separately from the public library internet surveys conducted since 1994 and offers insights into the provision of pat services and resources that a national survey cannot explore in such depth. figure 1. public-access internet connectivity from 1994 through 2008 figure 2. public-access internet workstations from 1996 through 2008 public access technologies in public libraries | bertot 83 n method the researcher visited thirty-five public libraries in five geographically and demographically diverse states between october 2007 and may 2008. the states were in the west, southwest, southeast, and mid-atlantic regions. the libraries visited included urban, suburban, rural, and native american public libraries that served populations ranging from a few hundred to more than half a million. the communities that the libraries served varied in terms of poverty, race, income, age, employment, and education demographics. prior to visiting the public library sites, the researcher conducted interviews with state library agency staff to better understand the public library context within each state and to explore overall pat issues, strategies, and other factors within the state. the following research questions guided the site visits: n what are the community and library contexts in which the library provides pat? n what are the pat services and resources that the library makes available to its community? n what pat services and resources does the library desire to provide to its community? n what is the relationship between provided and desired pat and the effect on the library (e.g., staff, finances, the building, and management)? n what are the perceived benefits to the library and its community gains through pat in the library? n what are the issues and barriers that the library encounters in providing pat services and resources? n how does the library manage and maintain its pat? the researcher visited each library for four to six hours. during that time, he interviewed the library director and/or branch manager and technology support staff (either a specific library position, designated library employee, or city or county it staff person), toured the library facility, and conducted a brief technology inventory. at some libraries, the researcher was able to meet with community partners that in some way collaborated with the library to provide pat services and resources (e.g., educational institutions that collaborated with libraries to provide access to broadband or volunteers who conducted technology training sessions). interviews were recorded and transcribed, and the technology inventories were entered into a microsoft excel spreadsheet for analysis. the transcripts were coded using thematic content analytic schemes to allow for the identification of key issues regarding pat areas.13 this approach enabled the researcher to use an iterative site-visit strategy that used findings from previous site visits to inform subsequent visits. to ensure valid and reliable data, the researcher used a three-stage strategy: 1. site-visit reports were completed and sent to th libraries for review. corrections from libraries were incorporated into a final site-visit report. 2. a final state-based site-visit report was compiled for distribution to state library agency staff and also incorporated their corrections. this provided a state-level reliability and validity check. 3. a summary of key findings was distributed to six experts in the public library technology environment, three of which were public library technology managers and three of which were technology consultants who worked with public libraries. in combination, this approach provided three levels of data quality checks, thus providing both internal (library and state) and external (technology expert) support for the findings. the findings in this article are limited to the libraries visited and interviews conducted with public librarians and state library agency staff. however, themes emerged early during the site-visit process and were reinforced through subsequent interviews and visits across the states and libraries visited. in addition, the use of external reviewers of the findings lends additional, but limited, support to the findings. n findings this section presents the results of the site visits and interviews with state library agency staff and public librarians. the article presents the findings by key areas surrounding pat in public libraries. the public-access context public libraries have a range of pat installed in their libraries for patron use. these technologies include public-access computers, wireless (wifi) access, ilss, online databases, digital reference, downloadable audio and video, and others. many of these services and resources are also available to patrons from outside library buildings, thus extending the reach (and support issues) of the library beyond the library’s walls. in addition, when libraries do not provide direct access to resources and services, they serve as access points to those services, such as online gaming and social networking. while libraries can and do deploy a number of technologies for public use, it is possible to group these 84 information technology and libraries | june 2009 technologies broadly into two overlapping categories: n hardware. library pat hardware can include public-access computers, public-access computing registration (i.e., reservation) systems, self-checkout stations, printers, faxes, laptops, and a range of other devices and systems. some of these technologies may have additional devices, such as those required for persons with disabilities. within the hardware grouping are networking technologies that include a range of hardware and software to enable a range of library networks to run (e.g., routers, hubs, switches, telecommunications lines, and networking software). n software. software can include device operating system software (e.g., microsoft windows, mac os, and linux), device application software (e.g., microsoft office, openoffice, graphics software, audio software, e-book readers, assistive software, and others), and functional software (e.g., web browsers, online databases, and digital reference). in short, public libraries make use of a range of technologies that the public uses in some way. each type of technology requires skills, management, implementation, and maintenance, all of which are discussed later. in the building, all of these products and services come together at the library’s public-access computers, or patron mobile device if wifi is available. moreover, patrons increasingly want to use their portable devices (e.g., usb drives, ipods, and others) with library technology. this places pressure on libraries to not just offer public-access computers, but also to support a range of technologies and services. thus the environment in which libraries offer pat is complex and requires substantial technical expertise, support, and maintenance in key areas of applications, computers, and networking. moreover, as discussed below, patrons are increasingly demanding market-based approaches to pat. these demands—which are largely about single-point access to a range of information services and resources—are often at odds with library technology that is based on stove-piped approaches (e.g, ils, e-books, and licensed resources) and that do not necessarily lend themselves to seamless integration. n external pressures on pats the advent and increased use by the public of google, amazon, itunes, youtube, myspace, second life, and other networked services affects public libraries in a number of ways. this article discusses these services and resources from the perspective of an information marketplace of which the public library is one entrant. interviewed librarians overwhelmingly indicated that users now expect library services to resemble those in the marketplace. users expect the look and feel, integration, service capabilities, interactivity, and personalization and customization that they experience while engaging in social networking, online searching, online purchasing, or other online activities. and within the library building, patrons expect the services to integrate at the public-access computer entry point—not distributed throughout the library in a range of locations, workstations, or devices. said differently, they expect to have a “mylibrary.com” experience that allows for seamless integration across the library’s services but also facilitates the use of personal technologies (e.g., ipods, mp3 players, and usb devices). thus users expect the library’s services to resemble those services offered by a range of information service providers. importantly, however, librarians indicated that library systems on which their services and resources reside by and large do not integrate seamlessly—nor were they designed to do so. public-access computers are gateways to the internet; the ils exists for patrons to search for and locate library holdings; and online databases, e-books, audiobooks, etc., are extensions of the library’s holdings but are not physical items under a library’s control and thus subject to a vendor’s information and business models. while library vendors and the library community are working to develop more integrated products that lead users to the information they seek, the technology is under development. there are three significant issues that libraries face because of market pressures: (1) the pressures all come together at a single point—the public-access computer; (2) users want a customized experience while using technology designed for the general public, not the individual user; and (3) users have choices in the information marketplace. one participant indicated, “if the library cannot match what users have access to on the outside, users will and do move on.” managing and maintaining public access managing the public-access computer environment for public libraries is an growing challenge. there are a number of management areas with which public librarians contend: n public-access computers—the computers and laptops (if applicable) themselves, which can include anything from keyboards and mice to troubleshooting a host of computer problems (it is important to note that these may be computers that often vary in age and composition, come from a range of vendors, run different operating systems, and often public access technologies in public libraries | bertot 85 have different application software versions). n peripheral management—the printers, faxes, scanners, and other equipment that are part of the library’s overall public access infrastructure. n public-access management software or systems—these may include online or in-building computer-based reservations (which encompasses specialized reservations such as teen machines, gaming computers, computers for seniors, and so on), time management (set to the library’s decided-upon time allotment), filtering, security, logins, virtual machines, etc. n wireless access—this may include logins and configurations for patrons to gain access to the library’s wireless network. n bandwidth management—this may include the need to allocate bandwidth differently as needs increase and decrease in a typical day. n training and patron assistance—for a vast array of services such as databases, online searching, e-government (e.g., completing government forms and seeking government information), and others. training can take place formally through classes, but also through point-of-use tutorials requested by patrons. to some extent, librarians commented that, while they do have issues with the public-access computers themselves from time to time, the real challenges that they face regard the actual management of the publicaccess environment—sign-ups, time limits, cost recovery for print jobs, helping patrons, and so on. one librarian commented that “the computers themselves are pretty stable. we don’t really have too many issues with them per se. it’s everything that goes into, out from, or around the computer that creates issues for us.” as a result of the management challenges, several libraries have adopted turn-key solutions, such as public-access management systems (e.g., comprise technology’s smart access manager [http://www.comprisetechnologies .com/product_29.html]) and all-encompassing public computing management systems that include networking and desktops (e.g., userful’s discoverstations [http:// userful.com/libraries/]). these systems allow for an allin-one sign-up, print cost recovery, filtering (if desired), and security approach. also, the discoverstations are a linux-based, all encompassing public-access management environment. a clear advantage to the discoverstation approach is that the discoverstation is connected to the internet and is accessible by userful staff remotely to update software and perform other maintenance functions. they also use open-source operating and application software. while these solutions do provide efficiencies, they also can create limitations. for example, the discoverstations are a thin-client system and are dependent on the server for graphics and memory, thus limiting their ability to access gaming and social-networking sites. the smart access manager, and similar programs, can rely on smart cards or other technology that users must purchase to print. another limitation is that the time limits are fixed, and, while users get warnings as time runs out, the session can end abruptly. these approaches are by and large adopted by libraries to ease the management associated with public-access computers and let staff concentrate on other duties and responsibilities. one librarian indicated that “until we had our management system, we would spend most of the day signing people up for the computers, or asking them to finish their work for the next person in line.” n planning for pat services and resources public libraries face a number of challenges when planning for pat services and resources. this is primarily because pat planning involves more than computers. any planning needs to encompass n building needs, requirements, limitations, and design; n technology assessment that considers the library’s existing technology, technology potential, current practices, and future trends; n planning for and supporting multiple technology platforms; n telecommunications and networking; n services and resources available in the marketplace—those specifically for libraries and those more broadly available to consumers and used by patrons; n specific needs and requirements of technology (e.g., memory, disk space, training, other); n requirements of other it groups with which the library may need to integrate, for example, city or county technology mandates; n support needs, including the need to enter into maintenance agreements for computer, network, and other equipment and software; n staff capabilities, such as current staff skill sets and their ability to handle the technologies under review or purchased; and n policy, such as requirements to filter because of local, state or federal mandates. the above list may not be exhaustive, but rather based on the main items that librarians identified during the site visits, and they serve to provide indicators of the challenges those planning library it initiatives face. 86 information technology and libraries | june 2009 n the endless upgrade and planning one librarian likened the pat environment to “being a gerbil on a treadmill. you go round and round and never really arrive,” a reference to the fact that public libraries are in a perpetual cycle of planning and implementing various pat services and resources. either hardware needs to be updated or replaced, or there is a software update that needs to be installed, or libraries are looking to the next technology coming down the road. in short, the technology planning to implementation cycle is perpetual. the upgrade and replacement cycle is further exacerbated by the funding situation in which most public libraries find themselves. increasingly, public library local and state funding, which combined can account for more than 90 percent of library funding, is flat or declining.14 the most recent series of public library internet studies indicates an increase in reliance by public libraries on fees and fines, fundraising, private foundation, and grant funding to finance collections and technology within libraries.15 this places key aspects of library operations in the realm of unreliable and one-time funding sources, thus making it difficult for libraries to develop multiyear plans for pat. n multiple support models to cope with pat management and maintenance issues, public libraries are developing various support strategies. the site visits found a number of technology-support approaches in effect, ranging from no it support to highly centralized statewide approaches. the following list describes the technology-support models encountered during the site visits: 1. no technology support. libraries in this group have neither technology-support staff nor any type of organized technology-support mechanism with existing library staff. nor do they have access to external support providers such as county or city it staff. libraries in this group might rely on volunteers or engage in ad hoc maintenance, but by and large have no formal approach to supporting or maintaining their technology. 2. internal library support without technology staff. in this model, the library provides its own technology support but does not necessarily have dedicated technology staff. rather, the library has designated one or more staff members to serve as the it person. usually this person has an interest in technology but has other primary responsibilities within the library. there may be some structure to the support—such as updating software (e.g., windows patches) once a week at a certain time— but it may be more ad hoc in approach. also, the library may try to provide its designated it person(s) with training to develop his or her skills further over time. 3. internal library support with technology staff. in this model, the library has at least one dedicated it staff person (partor full-time) who is responsible for maintaining and planning the library’s pat environment. the person may also have responsibilities for network maintenance and a range of technology-based services and resources. at the higher end of this approach are libraries with multiple it staff with differing responsibilities, such as networking, telecommunications, public-access computers, the ils, etc. libraries at this end of the spectrum tend to have a high degree of technology sophistication but may face other challenges (i.e., staffing shortages in key areas). 4. library consortia. over the years, public libraries have developed consortia for a range of services— shared ilss, resource sharing, resource licensing, and more. as public-library needs evolve, so too do the roles of library consortia. consortia increasingly provide training and technology-support services, and may be funded through membership fees, state aid, or other sources. 5. technology partners. while some libraries may rely on consortia for their technology support, others are seeking libraries that have more technology expertise, infrastructure, and abilities with whom to partner. this can be a fee-for-service arrangement that may involve sharing an ils, a maintenance agreement for network and public-access computer support, and a range of services. these arrangements allow the partner libraries to have some input into the technology planning and implementation processes without incurring the full expense of testing the technologies, having to implement them first, or hiring necessary staff (e.g., to manage the ils). the disadvantage to this model is that the smaller partner libraries are dependent on the technology decisions that the primary partner makes, including upgrade cycles, technology choices, migration time frames, etc. 6. city, county, or other agency it support. as city or county government agencies, some libraries receive technology support from the city or county it department (or in some cases the education department). this support ranges from a full slate of services and support available to the library to support only for the staff network and computers. public access technologies in public libraries | bertot 87 even at the higher end of the support spectrum, librarians gave mixed reviews for the support received from it agencies. this was primarily because of competing philosophies regarding the pat environment, with public librarians wanting an open-access policy to allow users access to a range of information service and resources and it agency staff wanting to essentially lock down the public-access environment and thus severely limit the functionality of the public-access computers and network services (i.e., wireless). other limitations might include prescribed pat, specified vendors, and bidding requirements. 7. state library support. one state library visited provides a high degree of service through its statewide approach to supporting public-access computing in the state’s public libraries. the state library has it staff in five locations throughout the state to provide support on a regional level but also has additional staff in the capital. these staff offer training, inhouse technical support, phone support, and can remote access the public-access computers in public libraries to troubleshoot, update, and perform other functions. moreover, this state built a statewide network through a statewide application to the federal e-rate program, thus providing broadband to all libraries. this model extends the availability of qualified technical support staff to all public libraries in the state—by phone as well as in person if need be. as a result, this enables public libraries to concentrate on service delivery to patrons. it is important to note that there are combinations of the above models in public libraries. for example, some libraries support their public-access networks and technology while the county or city it department supports the staff network and technology. it is clear, however, that there are a number of models for technology support in public libraries, and likely more than are presented in this article. the key issue is that public libraries are engaging in a broad spectrum of strategies to support, maintain, and manage their pat infrastructure. also of significance is that there are public libraries that have no technology-support services that provide pat services and resources. these libraries tend to serve populations of less than ten thousand, are rural, have fewer than five full-time equivalents (ftes), and are unlikely to be staffed by professional librarians. staff needs and pressures the study found a number of issues related to the effect of pat on library staff. this section of the findings discusses the primary factors affecting library staff as they work in the public-access context. n multiple skills needed not only is the pace of technological change increasing, but the change requires an ever-increasing array of skills because of the complexity of applications, technologies, and services. an example of such complexity is the library opac or ils. visited libraries indicated that such systems are becoming so complex and technologically sophisticated that there is a need for a full-time staff person to run and maintain the library ils. given the range of hardware, software, and networking infrastructure, as well as planning and pat management requirements, public librarians need a number of skills to successfully implement and maintain their pat environments. moreover, the skill needs depend on the librarian’s position—for example, an actual it staff person versus a reference librarian who does double duty by serving as the library’s it person. the skills required fall into technology, information literacy, service and facilities planning, management, and leadership and advocacy areas: n technology o general computer troubleshooting o basic maintenance, such as mouse and keyboard cleaning o basic computer repair, such as memory replacement, floppy drive replacement, disk defragmentation, etc. o basic networking, such as troubleshooting an “internet” issue versus a computer problem o telecommunications so as to understand the design and maintenance of broadband networks o integrated library systems o web design n information literacy o searching and using internet-based resources o searching and using library licensed resources o training patrons on the use of the publicaccess computers, general internet resources, and library resources o designing curriculum for various patron training courses n services and facilities planning o technology plan development and implementation (including budgeting) o telecommunications planning (including 88 information technology and libraries | june 2009 e-rate plan and application development) o building design so as to accommodate the requirements of public access technologies n management o license and contract negotiation for licensed resources, various public-access software and licenses, and maintenance agreements (service and repair agreements) o integration of pat into library operations o troubleshooting guidelines and process o policy development, such as acceptable use, filtering, filtering removal requests by patrons, etc. n leadership and advocacy o grant writing and partnership development so as to fund pat services and resources and extend out into the community that the library serves o advocacy so as to be able to demonstrate the value of pat in the library as a community good o leadership so as to build a community approach to public access with the library as one of the foundational institutions these items provide a broad cross section of the skills that public library staff may need to offer a robust pat environment. in the case of smaller, rural libraries, these requirements in general fall to the library director—along with all other duties of running the public library. in libraries that have separate technology, collections development, and other specialized staff, the skills and expertise may be dispersed throughout various areas in the library. n training public librarians receive a range of technology training— including none at all. in some cases, this might be a basic workshop on some aspect of technology at a state library association annual meeting or a regional workshop hosted by the library’s consortium. it could be an online course through webjunction (http://www.webjunction .org/). it could be a one-on-one session with a vendor representative or colleague. or it could be a formal, multiday class regarding the latest release of an ils. if available, public librarians have access to technology training that can take many forms, has a wide array of content (basic to expert), and can enhance staff knowledge about it with varying degrees of success. an issue raised by librarians was that having access to training and being able to take advantage of training are two separate things. regardless of the training delivery medium, librarians indicated that they were not always able to get release time to attend a training session. this was particularly the case for small, rural libraries that had less than five ftes spread out over several part-time individuals. for these staff to take advantage of training would require a substitute to cover public-service hours—or shut down the library. funding information technology as one might expect, there was a range of technology budgets in the public libraries visited or interviewed— from no technology budget to a substantial technology budget, and many points in between. some libraries had a dedicated it budget line item, others had only an operating budget out of which they might carve some funds for technology. libraries with dedicated it budgets by and large had at least one it staff person; libraries with no it budget largely relied on a staff person responsible for other library functions to manage their technology. in the smallest libraries, the library director served as the technology specialist in addition to being the general library operation manager. some libraries have established foundations through which they can raise funds for technology, among other library needs. many seek grants and thus devote substantial effort to seeking grant initiatives and writing grant proposals. some libraries held fundraisers and worked with their library friends groups to generate funds. other libraries engage in all of the above efforts to provide for their pat infrastructure, services, and resources. in short, there are several budgetary approaches public libraries use to support their pat environment. critical to note is that a number of libraries are increasingly relying on nonrecurring funds to support pats, a fact corroborated by the 2007 and 2008 public library internet surveys.16 the buildings when one visits public libraries, one is immediately struck by the diversity in design, functionality, and architecture of the buildings. public libraries often reflect the communities that they serve not only in the collection and service, but also in the facilities. this diversity serves the public library community well because it allows for a custom approach to libraries and their community. the building design, however, can also be a source of substantial challenge for public libraries. the increased integration of technology into library service places a range of stresses on buildings—physical space for workstations and other equipment and specialized furniture, power, server rooms, and cabling, for example. along with the library-based technology requirements come those of patrons—particularly the need for power so that public access technologies in public libraries | bertot 89 patrons may plug in their laptops or other devices. also important to note is that the building limitations also extend to staff and their access to computing and networked technologies. a number of librarians commented that they are “simply at capacity.” one librarian summed it up by stating that “there’s no more room at the inn. unless we start removing parts of our collection, we don’t have any more room for workstations.” another said that, “while we do have the space to add more computers, we don’t have enough power or outlets to support them. and, with our building, it’s not a simple thing to add.” in short, many libraries are reaching, or have reached, a saturation point as to just how much pat they can support. n discussion and implications over time, pat services have become essential services that public libraries provide their communities. with nearly all public libraries connected to the internet and offering public-access computers, the high percentage of libraries that offer internet-based services and resources, the overall usage of these resources by the public,17 and 73 percent of public libraries reporting that they are the only free provider of pat in their communities, it is clear that the provision of pat services is a key and critical service role that public libraries offer.18 it is also clear, however, that the extent to which public libraries can continue to absorb, update, and expand their pat depends on the resolution of a number of staffing, financial, maintenance and management, and building barriers. in a time of constrained budgets, it is unlikely that libraries will receive increased operational funding. indeed, reports of library funding cuts are increasing in the current economic downturn, which affects the ability of libraries to increase, or significantly update, staff—particularly in the areas of technology, licensing additional resources, procuring additional and new computers, and purchasing and offering expanded services such as digital photography, gaming, or social networking.19 moreover, the same financial constraints can affect the ability of libraries to raise capital funds for building improvements and new construction. funding also has an effect on the training that public libraries can offer or develop for their staff. and training is becoming increasingly important to the success of pat services and resources in public libraries—but not just training regarding the latest technologies. rather, there is a need for training that provides instruction on the relationship between the level of pat services and resources a library can or desires to provide and advocacy; broadband, computing, and other needs; technology planning and management; collaboration and partnering; and leadership. the public library pat environment is complex, encompasses a number of technologies, and has ties to many community services and resources. training programs need to reflect this complexity. the continued provision of pat services in public libraries is increasingly burdensome on the public library community, and the pressures to expand their pat services and resources continues to grow—particularly as libraries report their “sole provider” of free pat status in their communities. the successful libraries in terms of pat services and resources visited had staff that could n understand pat (both in terms of functionality and potential); n think creatively across the technology and library service spectrum; n integrate online content, pat, and library services; n articulate the value of pat as an essential community need and public library service; n articulate the role of the perception of the library by its community as a critical bridge to online content; n demonstrate leadership within the community and library; n form partnerships and extend pat services and resources into the community; and n raise funds and develop other support mechanisms to enhance pat services and resources in the library and throughout the community. in short, successful pat in libraries was being redefined in the context of communitywide pat service and resource provision. this approach not only can lead to a more robust community pat infrastructure, but it also lessens the library’s burden of pat service and resource provision. but equally important to note is that the extent to which all public libraries can engage in these activities on their own is unclear. indeed, several libraries visited were struggling to maintain basic pat service levels and indicated that increasing pat services came at the expense of other library services. “we’re trying to meet demand,” one librarian said, “but we have too few computers, too slow a connection, and staff don’t always know what to do when things go wrong or someone comes in talking about the latest technology or website.” for some libraries, therefore, quality pat services that meet community needs are simply out of reach. thus another implication and finding of the study is the need for libraries to explore other models of support for their pat environments—for example, using the services of a regional cooperative, if available; if none is available, libraries could form their own cooperative for resource sharing, technology support, and other aspects of pat service provision. the same approach could be 90 information technology and libraries | june 2009 taken within a city or county to enhance technology support throughout a region. another approach would be to outsource a library’s pat support and maintenance to a nearby library with support staff in a fee-for-service approach. there are a number of approaches that libraries could take to support their pat infrastructure. a key point is that libraries need to consider pat service provision in a broader community, regional, or state context, and the study found some libraries doing so. the need to avail staff of the skills required to truly support pat was a recurring theme throughout the site visits. approaches and access to training varied. for example, some state libraries provided—either directly or through the hiring of consultants and instructors—a number of technology-related courses taught in regional locations. an example of this approach is california’s infopeople project (http://www.infopeople .org/). some state libraries subscribed to webjunction (http://www.webjunction.org/), which provides access to online instructional content. online manuals provided by compumentor through a grant funded by the bill and melinda gates foundation aimed at helping rural libraries support their pat (www.maintainitproject.org) are another resource. beyond technology skills training, however, is the need for technology planning, effective communication, leadership, value demonstration, and advocacy. the extent to which leadership, advocacy, and library marketing, for example, are able to be taught remains a question. all of these issues take place with the backdrop of an economic downturn and budgetary constraints. increased operating costs created through inflation and higher energy costs place substantial pressures on public libraries simply to maintain current levels of service— much less engage in the additional levels of service that the pat environment brings. indeed, as the 2008 public library funding and technology access study demonstrated, public libraries are increasingly funding their technology-based services through non-recurring funds such as fines and fundraising activities.20 thus, the ability of public libraries to provide robust pat services and resources is increasingly limited unless such service provision comes at the expense of other library services. alone, the financial pressures place a high burden on public libraries. combined with the building, staffing, skills, and other constraints reported by public libraries, however, the emerging picture for library pat services and resources is one of significant challenge. n three key areas for additional exploration the findings from the study point to the need for additional research and exploration of three key services areas and issues related to pat support and services: 1. develop a better understanding of success in the pat environment. this study and the 2006 study by bertot et al. point to what is required for libraries to be successful in a networked environment.21 in fact, the 2007 public libraries and the internet report contained a section entitled “the successfully networked public library,” which offered a range of checklists for public libraries (and others) to consider as they planned and implemented their networked services.22 this study identified additional success factors and considerations focused specifically on the public access technology environment. together, these efforts point to the need to better understand and articulate the critical success factors necessary for public libraries to plan, implement, and update their pat given current service contexts. this is particularly necessary in the context of meeting user expectations and needs regarding networked technologies and services. 2. further identify technology-support models. this study uncovered a number of different technologysupport models implemented by public libraries. undoubtedly there are additional models that require identification. but, more importantly, there is a need to further explore how each technologysupport model assists libraries, under what circumstances, and in what ways. some models may be more or less appropriate on the basis of the service context of the library—and that is not clearly understood at this time. 3. levels of service capabilities. an underlying theme throughout this research, and one that is increasingly supported by the public library and the internet studies, is that the pat service context is essentially a continuum from low service and capability to high service and capability. there are a number of factors contributing to where libraries may lie on the success continuum—funding, management, leadership, attitude, skills, community support, and innovation, to name a few. this continuum requires additional research, and the research implications could be profound. emerging data indicate that there are public libraries that will be unable to continue to evolve and meet the increased demands of the networked environment, both in terms of staff and infrastructure. public libraries will have to make choices regarding the provision of pat services and resources in light of their ability to provide high-quality services (as defined by their service communities). for better or worse, the technology environment continually evolves and requires new technologies, management, and support. that is, public access technologies in public libraries | bertot 91 and will continue to be, the nature of public access to the internet. though there are likely other issues worthy of exploration, these three are critical to further our understanding of the pat environment and public library roles and issues associated with the provision of public access. n conclusion the pat environment in which public libraries operate is increasingly complex and continues to grow in funding, maintenance and management, staffing, and building demands. public libraries have navigated this environment successfully for more than fifteen years; however, stresses are now evident. libraries rose quickly to the challenge of providing public-access services to the communities that they serve. the challenges libraries face are not necessarily insurmountable, and there are a range of tools designed to help public libraries plan and manage their public-access services. these tools, however, place the burden of public access, or assume that the burden of public access in placed, on the public library. given increased operating costs because of inflation, the continual need to innovate and upgrade technologies, staff technology skills requirements, and other factors discussed in this article, libraries may not be in a position to shoulder the burden of public access alone. thus there is a need to reconsider the extent to which pat provision is the sole responsibility of the library; perhaps there is a need to integrate and expand public access throughout a community. the potential of such an approach can benefit a community through an integrated and broader access strategy, but also can relieve the pressure on the public library as the sole provider of public access. n acknowledgement this reserach was made possible in part through the support of the maintianit project (http://www.maintainit project.org/), an effort of the nonprofit techsoup web resource (http://www.techsoup.org/). references 1. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994). 2. john carlo bertot and charles r. mcclure, moving toward more effective public internet access: the 1998 national survey of public library outlet internet connectivity (washington, d.c.: national commission on libraries and information science, 1998), http://www.liicenter.org/reports/1998_plinternet_ study.pdf (accessed apr. 22, 2009). 3. charles r. mcclure, john carlo bertot, and john c. beachboard, internet costs and cost models for public libraries (washington, d.c.: national commission on libraries and information science, 1995). 4. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994); john carlo bertot, charles r. mcclure, paul t. jaeger, and joe ryan, public libraries and the internet 2006: study results and findings (tallahassee, fla.: information institute, 2006), http://www .ii.fsu.edu/projectfiles/plinternet/2006/2006_plinternet.pdf (accessed mar. 5, 2009). 5. john carlo bertot, charles r. mcclure, carla b. wright, elise jensen, and susan thomas, public libraries and the internet 2007: study results and findings (tallahassee, fla.: information institute, 2008). http://www.ii.fsu.edu/projectfiles/plinternet/ 2007/2007_plinternet.pdf (accessed sept. 10, 2008). 6. charles r. mcclure and paul t. jaeger, public libraries and internet service roles: measuring and maximizing internet services (chicago: ala, 2008). 7. george d’elia, june abbas, kay bishop, donald jacobs, and eleanor jo rodger, “the impact of youth’s use of the internet on the use of the public library,” journal of the american society for information science & technology 58, no. 14 (2007): 2180–96; george d’elia, corinne jorgensen, joseph woelfel, and eleanor jo rodger, “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 8. national center for education statistics (nces), public libraries in the united states: fiscal year 2005 [nces 2008301] (washington, d.c.: national center for education statistics, 2007); pew american and internet life, “internet activities,” http:// www.pewinternet.org/trends/internet_activities_2.15.08.htm (accessed mar. 5, 2009). 9. bertot et al., public libraries and the internet 2007. 10. ibid. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: public library association, 2007); joseph matthews, strategic planning and management for library managers (westport, conn.: libraries unlimited, 2005); joseph matthews, technology planning: preparing and updating a library technology plan (westport, conn.: libraries unlimited, 2004); diane mayo and jeanne goodrich, staffing for results: a guide to working smarter (chicago: public library association, 2002). 12. ala, libraries connect communities: public library funding & technology access study (chicago: ala, 2008), http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708report.cfm (accessed mar. 5, 2008). 13. charles p. smith, ed., motivation and personality: handbook of thematic content analysis (new york: cambridge univ. 92 information technology and libraries | june 2009 pr., 1992); klaus krippendorf, content analysis: an introduction to its methodology (beverly hills, calif.: sage, 1980). 14. ala, libraries connect communities. 15. bertot et al., public libraries and the internet 2006; bertot et al., public libraries and the internet 2007. 16. ibid. 17. nces, public libraries in the united states. 18. bertot et al., public libraries and the internet 2007. 19. american libraries, “branch closings and budget cuts threaten libraries nationwide,” nov. 7, 2008, http://www .ala.org/ala/alonline/currentnews/ newsarchive/2008/november2008/ branchesthreatened.cfm (accessed nov. 17, 2008). 20. ala, libraries connect communities. 21. bertot et al., public libraries and the internet 2006. 22. bertot et al., public libraries and the internet 2007. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries ping fu and moira fitzgerald information technology and libraries | september 2013 47 abstract this analysis compares how the traditional integrated library system (ils) and the next-generation ils may impact system and technical services staffing models at academic libraries. the method used in this analysis is to select two categories of ilss—two well-established traditional ilss and three leading next-generation ilss—and compare them by focusing on two aspects: (1) software architecture and (2) workflows and functionality. the results of the analysis suggest that the nextgeneration ils could have substantial implications for library systems and technical staffing models in particular, suggesting that library staffing models could be redesigned and key librarian and staff positions redefined to meet the opportunities and challenges brought on by the next-generation ils. introduction today, many academic libraries are using well-established traditional integrated library systems (ilss) built on the client-server computing model. the client-server model aims to distribute applications that partition tasks or workloads between the central server of a library automation system and all the personal computers throughout the library that access the system. the client applications are installed on the personal computers and provide a user-friendly interface to library staff. however, this model may not significantly reduce workload for the central servers and may increase overall operating costs because of the need to maintain and update the client software across a large number of personal computers throughout the library. 1 since the global financial crisis, libraries have been facing severe budget cuts, while hardware maintenance, software maintenance, and software licensing costs continue to rise. the technology adopted by the traditional ils was developed more than ten years ago and is evidently outdated. the traditional ils does not have sufficient capacity to provide efficient processing for meeting the changing needs and challenges of today’s libraries, such as managing a wide variety of licensed electronic resources and collaborating, cooperating, and sharing resources with different libraries.2 ping fu (pingfu@cwu.edu), a lita member, is associate professor and head of technology services in the brooks library, central washington university, ellensburg, wa. moira fitzgerald (moira.fitzgerald@yale.edu), a lita member, is access librarian and assistant head of access services in the beinecke rare book and manuscript library, yale university, new haven, ct. mailto:pingfu@cwu.edu mailto:moira.fitzgerald@yale.edu a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 48 today’s libraries manage a wide range of licensed electronic resource subscriptions and purchases. the traditional ils is able to maintain the subscription records and payment histories but is unable to manage details about trial subscriptions, license negotiations, license terms, and use restrictions. some vendors have developed electronic resources management system (erms) products as standalone products or as fully integrated components of an ils. however, it would be more efficient to manage print and electronic resources using a single, unified workflow and interface. to reduce costs, today’s libraries not only band together in consortia for cooperative resource purchasing and sharing, but often also want to operate one “shared ils” for managing, building, and sharing the combined collections of members.3 such consortia are seeking a new ils that exceeds traditional ils capabilities and uses new methods to deliver improved services. the new ils should be more cost effective, should provide prospects for cooperative collection development, and should facilitate collaborative approaches to technical services and resource sharing. one example of a consortium seeking a new ils is the orbis cascade alliance, which includes thirty-seven universities, colleges, and community colleges in oregon, washington, and idaho. as a response to this need, many vendors have started to reintegrate or reinvent their ilss. library communities have expressed interest in the new characteristics of these next-generation ilss; their ability to manage print materials, electronic resources, and digital materials within a unified system and a cloud-computing environment is particularly welcome.4 however, one big question remains for libraries and librarians, and that is what implications the next-generation ils will have on libraries’ staffing models. little on this topic has been presented in the library literature. this comparative analysis intends to answer this question by comparing the nextgeneration ils with the traditional ils from two perspectives: (1) software architecture, and (2) workflows and functionality, including the capacity to facilitate collaboration between libraries and engage users. scope and purpose the purpose of the analysis is to determine what potential effect the next-generation ils will have on library systems and technical services staffing models in general. two categories of ilss were chosen and compared. the first category consists of two major traditional ilss: ex libris’s voyager and innovative interfaces’ millennium. the second category includes three nextgeneration ilss: ex libris’s alma, oclc’s worldshare management services (wms), and innovative interfaces’ sierra. voyager and millennium were chosen because they hold a large portion of current market shares and because the authors have experience with these systems. yale university library is currently using voyager, while central washington university library is using millennium. alma, wms, and sierra were chosen because these three next-generation ilss are produced by market leaders in the library automation industry. the authors have learned about these new products by reading and analyzing literature and vendors’ proposals, as well as information technology and libraries | september 2013 49 attending vendors’ webinars and product demonstrations. in the long run, yale university library must look for a new library service platform to replace voyager, verde, metalib, sfx, and other add-ons. central washington university library is affiliated with the orbis cascade alliance mentioned above. the alliance is implementing a new library management service to be shared by all thirty-seven members of the consortium. ex libris, innovative interfaces, oclc, and serials solutions all bid for the alliance’s shared ils. after an extensive rfp process, in july 2012 the orbis cascade alliance decided to choose ex libris’s alma and primo as their shared library services platform. the system will be implemented in four cohorts of approximately nine member libraries each over a two-year period, beginning in january 2013. the central washington university library is in the forth migration cohort, and their new system will be live in december 2014. it is important to emphasize that the next-generation ils has no local online public access catalog (opac) interface. vendors use additional discovery products as the discovery-layer interfaces for their next-generation ilss. specifically, ex libris uses primo as the opac for alma, while oclc’s worldcat local provides the front-end interface for wms. innovative interfaces offers encore as the discovery layer for sierra. as front-end systems, these discovery platforms provide library users with one-stop access to their library resources, including print materials, electronic resources, and digital materials. while these discovery platforms will also impact library organization and librarianship, they will have more impact on the way that end-users, rather than library staff, discover and interact with library collections. in this analysis, we focus on the effects that back-end systems such as alma, wms, and sierra will have on library organizational structure and staffing, rather than the end-user experience. as our sample only includes five ilss, the scope of the analysis is limited, and the findings cannot be universal or extended to all academic libraries. however, readers will gain some insight into what challenges any library may face when migrating to a next-generation ils. literature review a few studies have been published on library staffing models. patricia ingersoll and john culshaw’s 2004 book about systems librarianship describes vital roles that systems librarians play, with responsibilities in the areas of planning, staffing, communication, development, service and support, training, physical space, and daily operations. 5 systems librarians are the experts who understand both library and information technology and can put the two fields together to context. they point out that system librarians are the key players who ensure that a library stays current with new information technology. the daily and periodic operations for systems librarians include ils administration, server management, workstation maintenance, software and applications maintenance and upgrades, configuration, patch management, data backup, printing issues, security, and inventory. all of these duties together constitute the workloads of systems librarians. ingersoll and culshaw also emphasize that systems librarians must be proactive in facing constant changes and keep abreast of emerging library technologies. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 50 edward iglesias et al., based on their own experiences and observations at their respective institutions, studied the impact of information technology on systems staff.6 their book covers concepts such as the client-server computing model, web 2.0, electronic resource management, open-source, and emerging information technologies. their 2004 studies show that, tough there are many challenges inherent in the position, there are also many ways for system staff to improve their knowledge, skills, and abilities to adapt to the changing information technologies. janet guinea has also studied the roles of systems librarians at an academic library.7 her 2003 study shows that systems librarians act as bridge-builders between the library and other university units in the development of library-initiated projects and in the promotion of information technology-based applications across campus. another relevant study was conducted by marshall breeding at vanderbilt university in an investigation of the library automation market. his 2012 study compares the well-established, traditional ilss that dominate the current market (and are based on client-server computing architecture developed more than a decade ago) to the next-generation ilss deployed through multitenant software-as-a-service (saas) models, which are based on service-oriented architecture (soa).8 through this comparison, breeding indicates that next-generation ilss will differ substantially from existing traditional ilss and will eliminate many hardware and maintenance investments for libraries. the next-generation ils will bring traditional ils functions, erms, digital asset management, link resolvers, discovery layers, and other add-on products together into one unified service platform, he argues.9 he gave the next-generation ils a new term, library services platform.10 this term signifies that a conceptual and technical shift is happening: the next-generation ils is designed to realign traditional library functions and simplify library operations through a more inclusive platform designed to handle different forms of content within a unified single interface. breeding’s findings conclude that the next-generation ils provides significant innovations, including management of print and electronic library materials, reliance on global knowledge bases instead of localized databases, deployment through multitenant saas based on a service-oriented architecture, and the provision of a suite of application programming interfaces (apis) that enable greater interoperability and extensibility.11 he also predicts that the next-generation ils will trigger a new round of ils migration.12 method our method narrowed down the analysis for the implications of ilss on library systems and technical services staffing models to two major aspects: (1) software architecture, and (2) workflows and functionality, including facilitation of collaborations between libraries and user engagement. first, we analyzed two traditional ilss, voyager and millennium, which are built on a client-server computing model, deliver modular workflow functionality, and are implemented in our institutions. through the analysis, we determined how these two aspects affect library organizational structure and librarian positions designed for managing these modular tasks. then, information technology and libraries | september 2013 51 based on information we collected and grouped from vendors’ documents, rfp responses, product demonstrations, and webinars, we examined the next-generation ilss alma, wms, and sierra— which are based on soa and intended to realign traditional library functions and simplify library operations—to evaluate how these two factors will impact staffing models. to provide a more in-depth analysis, particularly for systems staffing models, we also gathered and analyzed online systems librarian job postings, particularly for managing the voyager or millennium system, for the past five years. the purpose of this compilation is to cull a list of typical responsibilities of systems librarians and then determine what changes may occur when they must manage a next-generation ils such as alma, wms, or sierra. data on job postings were gathered from online job banks that keep an archive of past listings, including code4lib jobs, ala joblist, and various university job listing sites. duplicates and reposts were removed. the responsibilities and duties described in the job descriptions were examined for similarities to determine a typical list. the data from all sources were gathered together in a single database to facilitate its organization and manipulation. specific responsibilities, such as administering an ils, were listed individually, while more general responsibilities for which descriptions may vary from one posting to another were grouped under an appropriate heading. to ensure complete coverage, all postings were examined a second time after all categories had been determined. we also used our own institutions as examples to support the analysis. the implications of ils software architecture on staffing models voyager and millennium are built on client-server architecture. libraries that use these ilss also use add-ons, such as erms and link resolvers, to manage their print materials and licensed electronic resources. the installation, configuration, and updates of the client software require a significant amount of work for library it staff. many libraries must allocate substantial staff effort and resources to coordinating the installation of the new software on all computers throughout the library that access the system. those libraries that allow staff to work remotely have experienced additional costs and it challenges. in addition, server maintenance, backups, upgrades, and disaster recovery also require excessive time and effort of library it staff. administering ilss, erms, and other library hardware, software, and applications is one of the primary responsibilities for a library systems department. positions such as systems librarian, electronic resource librarian, and library it specialist were created to handle this complicated work. at a very large library, such as yale university library, the systems group of library it is only responsible for voyager’s configuration, operation, maintenance, and troubleshooting. two other it support groups—a library server support group and a workstation support group—are responsible for installation, maintenance, and upgrade of the servers and workstations. specifically, the library server support group deals with the maintenance and upgrade of ils servers and the software and relational database running on the servers, while the workstation support group takes care of the installation and upgrade of the client software on hundreds of a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 52 workstations throughout twenty physical libraries. at a smaller library, such as central washington university library, on the other hand, one systems librarian is responsible for the administration of millennium, including configuration, maintenance, backup, and upgrade on the server. another library it staff member helps install and upgrade the millennium client on about forty-five staff computers throughout its main library and two center campus libraries. comparatively, the next-generation ilss alma, wms, and sierra have a saas model designed by soa principles and deployed through a cloud-based infrastructure. oclc defines this model as “web-scale management services.”13 using this innovation, service providers are able to deliver services to their participating member institutions on a single, highly scalable platform, where all updates and enhancements can be done automatically through the internet. the different participating member institutions using the service can configure and customize their views of the application with their own brandings, color themes, and navigational controls. the participating member institutions are able to set functional preferences and policies according to their local needs. web-scale services reduce the total cost of ownership by spreading infrastructure costs across all the participating member institutions. the service providers have complete control over hardware and software for all participating member institutions, dramatically eliminating capital investments on local hardware, software, and other peripheral services. service providers can centrally implement applications and upgrades, integration across services, and system-wide infrastructure requirements such as performance reliability, security, privacy, and redundancy. thus participating member institutions are relieved from this burdensome responsibility that has traditionally been undertaken by their it staff.14 from this perspective, the next-generation ils will have a huge impact on library organizational structure, staffing, and librarianship. since the next-generation ils is implemented through the cloud-computing model, there is no requirement for local staff to perform the functions traditionally defined as “systems” staff activities, such as server and storage administration, backup and recovery administration, and server-side network administration. for example, the entire interfaces of alma and wms are served via web browser; there is no need for local staff to install and maintain clients on local workstations. therefore, if an institution decided to migrate to a next-generation ils, the responsibilities and roles of systems staff within the institution would need to be readdressed or redefined. we have learned from attending oclc’s webinars and product demonstrations that library systems staff would be required to prepare and extract data from their local systems during new systems implementation. they also would be required to configure their own settings such as circulation policies. however, after the migration, a systems staff member would likely serve as a liaison with the vendor. this would require, according to oclc’s proposal, only 10 percent of the systems staff’s time on an ongoing basis. through attending ex libris’s webinars and product demonstrations, we have learned that a local system administrator may be required to take on basic management processes, such as record-loading or integrating data from other campus systems. similarly, we have learned from innovative interfaces’ webinars and product demonstrations that sierra would still need local systems information technology and libraries | september 2013 53 expertise to perform the installations of the client software on staff workstations. sierra would require library it staff to perform administrative tasks like the user account administration and to support sierra in interfacing with local institution-specific resources. in general, as shown in table 1, local systems staff could be freed from the burdensome responsibility of administering the traditional ils because of the software architecture of the nextgeneration ils. systems librarian responsibilities workload percentage traditional ils nextgen ils managing ils applications, including modules and the opac 10 x managing associated products such as discovery systems, erms, link resolver, etc. 10 x day-to-day operations including management maintenance, troubleshooting, and user support 10 x x server maintenance, database maintenance and backup 10 x customizations and integrations 5 x x configurations 5 x x upgrades and enhancements 5 x patches or other fixes 5 x design and coordination of statistical and managerial reports 5 x x overall staff training 5 x x primary representative and contact to the designated library system vendors 5 x x keeping abreast of developments in library technologies to maintain current awareness of information tools 5 x x engaging in scholarly pursuit and other professional activities 10 x x serving on various teams and committees 5 x x reference and instruction 5 x x total 100 100% 60% table 1. systems librarian responsibilities comparison for traditional ils and next-generation ils. note: the systems librarian responsibilities and the approximate percentage of time devoted to each function are slightly readjusted based on the compiled descriptions of the systems librarian job postings we collected and analyzed from the internet and from vendors’ claims. a total of 47 position a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 54 descriptions were gathered. the workload percentage is adopted from the job description of the systems librarian position at one of our institutions. our analysis shows that systems staff might reduce their workload by approximately 40 percent. therefore library systems staff could use their time to focus on local applications development and other library priority projects. however, it is important to emphasize that library systems staff should reengineer themselves by learning how to use apis provided by the next-generation ils so that they will be able to support the customization of their institutions’ discovery interfaces and the integration of the ils with other local enterprise systems, such as financial management systems, learning management systems, and other local applications. the implications of ils workflows and functionality on staffing models the typical workflow and functionality of both voyager and millennium are built on a modular structure. major function modules, called client modules, include systems administration, cataloging, acquisitions, serials, circulation, and statistics and reports. additionally, the traditional ils provides an opac interface for library patrons to access library materials and manage their accounts. millennium has an erms module built in as a component of their ils while ex libris has developed an independent erms as an add-on to voyager. the systems administration module is used to add system users and to set up locations, patron types, material types, and other library policies. the cataloging module supports the functions of cataloging resources, managing the authority files, tagging and categorizing content, and importing and exporting bibliographic records. the sophistication of the cataloging module depends primarily on the ils. the acquisitions module helps in the tracking of purchases and acquisition of materials for a library by facilitating ordering, invoicing, and data exchange with serial, book, and media vendors through electronic data interchange (edi). the circulation module is used to set up rules for circulating materials and for tracking those materials, allowing the library to add patrons, issue borrowing cards, and form loan rules. it also automates the placing of holds, interlibrary loan (ill), and course reserves. self-checkout functionality can be integrated as well. the serials module is essentially a cataloging module for serials. libraries are often dependent on the serials module to help them track and check-in serials. the statistics and reports module is used to generate reports such as circulation statistics, age of collection, collection development, and other customized statistical reports. a typical traditional ils comprises a relational database, software to interact with that database, and two graphical user interfaces—one for patrons and one for staff. it usually separates software functions into discrete modules, each of them integrated with a unified interface. the traditional ils’s modular design was a perfect fit for a traditional library organizational structure. the staff at central washington university library, for example, under the library administration, are organized into the following three major groups: public services, including the reference and circulation departments; technical and technology services, including the cataloging, collection development, serials & electronic resource, and systems departments; and information technology and libraries | september 2013 55 other library services and centers, including the government documents department, the music library, two center campus libraries, the academic and research commons, and the rare book collection & archive. each department has at least one professional librarian and other library staff members responsible for their daily operations. for example, the collection development librarian is responsible for the acquisition of print monographs and serials, while the electronic resource librarian is responsible for purchasing and managing licensed databases or e-journals. however, the next-generation ils significantly enhances and reintegrates the workflow of traditional ils functions. the functionality is quite different from the traditional ils’s modular structure. the design of the functionality stresses two principles: modularity and extensibility. it brings together the selection, acquisition, management, and distribution of the entire library collection. it provides a centralized data-services environment to its unified workflows for all types of library assets. one of the big enhancements of the next-generation ils is the acquisitions module, which enables the management of both print and electronic materials within a single unified interface, with no need to move between modules or multiple systems for different formats and related activities. for example, according to oclc, wms streamlines selection and acquisition processes via built-in access to worldcat records and publisher data. vendor, local, consortium, and global library data share the same workflows. wms automatically creates holdings for both physical and electronic resources. the worldcat knowledge-base simplifies electronic resource management and delivery. order data from external systems can be automatically uploaded. for consortium users, wms’s unified workflow and interface fosters efficient resource-sharing between different institutions whose holdings share a common format. similarly, ex libris’s alma has an integrated central knowledge base (ckb) that describes available electronic resources and packages, so there is no need to load additional descriptive records when acquiring electronic resources based on the ckb. the purchasing workflow manages orders for both print and electronic resources in a very similar way and handles some aspects unique to electronic resources, such as license management and the identification of an access provider. staff users can start the ordering process by searching the ckb directly and ordering from there. this search is integrated into the repository search, allowing a staff user to perform searches both in his or her institution as well as in the community zone, which holds the ckb. the next-generation ils provides unified data services and workflows, and a single interface to manage all physical, electronic, and digital materials. this will require libraries to rethink their acquisitions staffing models. for example, in small libraries could merge the acquisition librarian position and the electronic resource librarian position or reorganize the two departments. another functionality enhancement of the next-generation ils provides the authoritative ability for consortia users to manage local holdings and collections as well as shared resources. for example, wms’s single shared knowledge base eliminates the need for each library to maintain a copy of a knowledge base locally, because all consortia members can easily see what is licensed by other members of the consortia. cataloging records are shared at the consortium and global levels a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 56 in real time. each institution immediately benefits from original cataloging records added to the system and from enhancements to existing records. authority control is built into worldcat, so there is no need to do authority processing against local bibliographic databases. with real-time circulation between libraries’ collections, there is no need to re-create bibliographic and item data in separate local systems. similarly, sierra enhances the traditional technical services workflows by providing a shared bibliographic database. whenever a member library performs selection or ordering, the library is able to determine if other consortia members have already selected, ordered, and cataloged the title. this may impact a local selection, allowing consortia members to more collectively develop their individual collections and reduce duplication. alma’s centralized metadata management service (mms) takes a very similar approach to wms and sierra, allowing several options for local control and shared cataloging, depending on an institution’s needs, while ex libris maintains authority files. very large institutions, for example, might manage some records in the local catalog and most records in a shared bibliographic database, while smaller institutions might manage all of their records in the shared bibliographic database. all these approaches require more collaboration and cooperation between consortia members. according to vendors’ claims on their proposals to the orbis cascade alliance, small institutions might not need to have a professional cataloger, since the cataloging process is simplified and it is therefore easier for paraprofessional staff to operate and copy bibliographic records from the knowledgebases of these ilss. in addition, the next-generation ils also allows library users to actively engage with ils software development. for example, by adding opensocial containers to the product, wms allows library developers to use api to build social applications called gadgets and add these gadgets to wms. one example highlighted by oclc is a gadget in the acquisitions area of wms that will show the latest new york times best sellers and how many copies the library has available for each of those titles. similarly, sierra’s open developer community will allow library developers to share ideas, reference code samples, and build a wide range of applications using sierra’s web services. also, sierra will provide a centralized online resource called sierra developer sandbox to offer a comprehensive library of documented apis for library-developed applications. all these enhancements provide library staff with new opportunities to redefine their roles in a library. conclusions and arguments in summary, compared to the client-server architecture and modular design of the traditional ils, the next-generation ils has an open architecture and is more flexible and unified in its workflow and interface, which will have a huge impact on library staffing models. the traditional ils specifies clear boundaries between staff modules and workflows while the next-generation ils has blurred these boundaries. the integration and enhancement of the functionality of the nextgeneration ils will help libraries streamline and automate workflows and processes for managing both print and electronic resources. it will increase libraries’ operational efficiency, reduce the information technology and libraries | september 2013 57 total cost of ownership, and improve services for users. particularly, it will free approximately 40 percent of library systems staff time from managing servers, software upgrades, client application upgrades, and data backups. moreover, the next-generation ils provides a new way for consortial libraries to collaborate, cooperate, and share resources. in addition, the web-scale services provided by the next-generation ils allow libraries to access an infrastructure and platforms that enable them to reach a broad, geographically diverse community while simultaneously focusing their services on meeting the specific needs of their end-users. thus the more integrated workflows and functionality allow library staff to work with more modules, play multiple roles, and back up each other, which will bring changes to traditional staffing models. however, the next-generation ils also brings libraries new challenges along with its clear advantages. librarians and library staff might have concerns pertaining to their job security and can be fearful of new technologies. they may feel anxious about how to reengineer their business processes, how to get training, how to improve their technological skills, and how to prepare for a transition. we argue here that library directors might think about these staff frustrations and find ways to address their concerns. libraries should provide staff more opportunities and training to help them to improve their knowledge and skills. redefining job descriptions and reorganizing library organizational structures might be necessary to better adapt to the changes brought about by the next-generation ils. systems staff might invest more time in local application developments, other digital initiatives, website maintenance, and other library priority projects. technical staff might reconsider their workflows and cross-train themselves to expand their knowledge and improve their work efficiency. they might spend more time on data quality control and special collection development or interact more with faculty on book and e-resource selections. we hope this analysis will provide some useful information and insights for those libraries planning to move to the next-generation ils. the shift will require academic libraries to reconsider their organizational structures and rethink their manpower distribution and staffing optimization to better focus on library priorities, projects, and services critical to their users. references 1. marshall breeding, “a cloudy forecast for libraries,” computers in libraries 31, no. 7 (2011): 32–34. 2. marshall breeding, “current and future trends in information technologies for information units,” el profesional de la información 21, no. 1 (2012): 11. 3. jason vaughan and kristen costello, “management and support of shared integrated library systems,” information technology & libraries 30, no. 2 (2011): 62–70. 4. marshall breeding, “agents of change,” library journal 137, no. 6 (2012): 30–36. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 58 5. patricia ingersoll and john culshaw, managing information technology: a handbook for systems librarians (westport, ct: libraries unlimited, 2004). 6. edward g. iglesias, an overview of the changing role of the systems librarian: systemic shifts (oxford, uk: chandos, 2010). 7. janet guinea, “building bridges: the role of the systems librarian in a university library,” library hi tech 21, no. 3 (2003): 325–32. 8. breeding, “agents of change,” 30. 9. ibid. 10. ibid., 33. 11. ibid., 33. 12. ibid., 30. 13. sally bryant and grace ye, “implementing oclc’s wms (web-scale management services) circulation at pepperdine university,” journal of access services 9, no. 1 (2012): 1. 14. gary garrison et al., “success factors for deploying cloud computing,” communications of the acm 55, no. 9 (2012): 62–68. december_ital_maceli_final technology skills in the workplace: information professionals’ current use and future aspirations monica maceli and john j. burke information technology and libraries | december 2016 35 abstract information technology serves as an essential tool for today’s information professional, and ongoing research is needed to assess the technological directions of the field over time. this paper presents the results of a survey of the technologies used by library and information science practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn. the most common technologies employed were email, office productivity tools, web browsers, library catalogand database-searching tools, and printers, with programming topping the list of most-desired technology skill to learn. similar technology usage patterns were observed for early and later-career practitioners. findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice. introduction over the past several decades, technology has rapidly moved from a specialized set of tools to an indispensable element of the library and information science (lis) workplace, and today it is woven throughout all aspects of librarianship and the information professions. information professionals engage with technology in traditional ways, such as working with integrated library systems, and in new innovative activities, such as mobile-app development or the creation of makerspaces.1 the vital role of technology has motivated a growing body of research literature, exploring the application of technology tools in the workplace, as well as within lis education, to effectively prepare tech-savvy practitioners. such work is instrumental to the progression of the field, and with the rapidly-changing technological landscape, requires ongoing attention from the research community. one of the most valuable perspectives in such research is that of the current practitioner. understanding current information professionals’ technology use can help in understanding the role and shape of the lis field, provide a baseline for related research efforts, and suggest future monica maceli (mmaceli@pratt.edu) is assistant professor, school of information, pratt institute, new york. john j. burke (burkejj@miamioh.edu) is library director and principal librarian, gardner-harvey library, miami university middletown, middletown, ohio. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 36 directions. the practitioner perspective is also valuable in separating the hype that often surrounds emerging technologies from the reality of their use and application within the lis field. this paper presents the results of a survey of lis practitioners, oriented toward understanding the participants’ current technology use and future technology aspirations. the guiding research questions for this work are as follows: 1. what combinations of technology skillsets do lis practitioners commonly use? 2. what combinations of technology skillsets do lis practitioners desire to learn? 3. what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years of experience in the field? literature review the growth and increasing diversity of technologies used in library settings has been matched by a desire to explore how these technologies impact expectations for lis practitioner skill sets. triumph and beile examined the academic library job market in 2011 by describing the required qualifications for 957 positions posted on the ala joblist and arl job announcements websites.2 the authors also compared their results with similar studies conducted in 1996 and 1988 to see if they could track changes in requirements over a twenty-three-year period. they found that the number of distinct job titles increased in each survey because of the addition of new technologies to the library work environment that require positions focused on handling them. the comparison also found that computer skills as a position requirement increased by 100 percent between 1988 and 2011, with 55 percent of 2011 announcements requiring them. looking more deeply at the technology requirements specifically, mathews and pardue conducted a content analysis of 620 jobs ads from the ala joblist to identify skills required in those positions.3 the top technology competencies required were web development, project management, systems development, systems applications, networking, and programming languages. they found a significant overlap of librarian skill sets with those of it professionals, particularly in the areas of web development, project management, and information systems. riley-huff and rholes found that the most commonly sought technology-related job titles were systems/automation librarian, digital librarian, emerging and instructional technology librarian, web services/development librarian, and electronic resources librarian.4 a few years later, maceli added to this list with newly popular technology-relating titles, including emerging technologies librarian, metadata librarian, and user experience/architect librarian.5 beyond examining which specific technologies librarians should be able to use, researchers have also pondered whether a list of skills is even possible to create. crawford synthesized a series of blog posts from various authors to discuss which technology skills are essential and which are too specialized to serve as minimum technology requirements for librarians.6 he questioned whether universal skill sets should be established given the variety of tasks within libraries and the unique backgrounds of each library worker. crawford also questioned the expectation that every librarian information technology and libraries | december 2016 37 will have a broad array of technology skills from programming to video editing to game design and device troubleshooting. partridge et al. reported on a series of focus groups held with 76 librarians that examined the skills required for members of the profession, especially those addressing technology.7 in the questions they asked the focus groups, the authors focused on the term “library 2.0” and attempted to gather suggestions on skills that current and future librarians need to assist users. they concluded that the groups identified that a change in attitudes by librarians was more important to future library service than the acquisition of skills with specific technology tools. importance was given to librarians’ abilities to stay aware of technological changes, be resilient and reflective in the face of them, and to communicate regularly and clearly with the members of their communities. another area examined in the studies is where the acquisition of technology skills should and does happen for librarians. riley-huff and rholes reported on a dual approach to measure librarians’ preparation for performing technology-related tasks.8 the authors assessed course offerings for lis programs to see if they included sufficient technology preparation for new graduates to succeed in the workplace. they then surveyed lis practitioners and administrators to learn how they acquired their skills and how difficult it is to find candidates with enough technology preparation for library positions. their findings suggest that while lis programs offer many technology courses, they lack standardization, and graduates of any program cannot be expected to have a broad education in library technologies. further research confirmed this troubling lack of consistency in technology-related curricula. singh and mehra assessed a variety of stakeholders, including students, employers, educators, and professional organizations, finding widespread concern about the coverage of technology topics in lis curricula.9 despite inconsistencies between individual programs, several studies provided a holistic view of the popular technology offerings within lis curricula. programs commonly offered one or more introductory technology courses, as well as courses in database design and development, web design and development, digital libraries, systems analysis, and metadata.10,11,12 as researchers have emphasized from a variety of perspectives, new graduates could not realistically be expected to know every technology with application to the field of information.13 there was widespread acknowledgement that learning in this area can, and must, continue in a lifelong fashion throughout one’s career. riley-huff and rholes reported that lis practitioners saw their own experiences involving continuing skill development on the job, both before and after taking on a technology role.14 however, literature going back many decades suggests that the increasing need for continuing education in information technology has generally not been matched by increasing organizational support for these ventures. numerous deterrents to continuing technology education were noted, including lack of time,15 organizational climate, and the perception of one’s age.16 while studies in this area have primarily focused on mls-level positions, jones reported on academic library support staff members and their perceptions of technology use over a ten-year period and found that increased technology responsibilities added technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 38 to workloads and increased workplace stress.17 respondents noted that increasing use of technology in their libraries has increased their individual workloads along with the range of responsibilities that they hold. method to build an understanding of the research questions stated above, which focus on the technologies currently used by information professionals and those they desired to learn, we designed and administered a thirteen-question anonymous survey (see appendix) to the subscribers of thirty library-focused electronic discussion groups between february 25 and march 13, 2015. the groups were chosen to target respondents employed in multiple types of libraries (academic, public, school, and special) with a wide array of roles in their libraries (public services librarians, systems staff members, catalogers, and so on). we solicited respondents with an email sent to the groups asking for their participation in the survey and with the promise to post initial results to the same groups. the survey included closed and open-ended questions oriented toward understanding current technology use and future aspirations as well as capturing demographics useful in interpreting and generalizing the results. the survey questions have been previously used and iteratively expanded over time by the second author, first in the fall of 2008, then spring of 2012, with summative results presented in the last three editions of the neal-schuman library technology companion. we obtained a total of 2,216 responses to the question, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?” of these responses, 1,488 (67 percent) of the respondents answered the question regarding technologies they would like to learn: “what technology skill would you like to learn to help you do your job better?” we conducted basic reporting of response frequency for closed questions to assess and report the demographics of the respondents. to analyze the open-ended survey question results in greater depth, we conducted a textual analysis using the r statistical package (https://www.r-project.org/). we used the tm (text mining) package in r (http://cran.rproject.org/package=tm) to calculate frequency, correlation of terms, generate plots, and cluster terms. results the following section will first present an overview of survey responses and respondents, and then explore results as related to the stated four research questions. the lis practitioners who responded to the survey reported that their libraries are located in forty us states, eight canadian provinces, and forty-three other countries. academic libraries were the most common type of library represented, followed by public, school, special, and other (see table 1). information technology and libraries | december 2016 39 library type number of respondents percentage of all respondents academic 1,206 54.4 public 545 24.6 school 266 12 special 138 6.2 other 61 2.8 table 1. the types of libraries in which survey respondents work respondents also provided their highest level of education. a total of 77 percent of responding lis practitioners have earned a library-related or other master’s degrees, dual master’s degrees, or doctoral degrees. from these reported levels of education, it is likely that more respondents are in librarian positions than in library support staff positions. however, individuals with master’s degrees serve in various roles in library organizations, so the percentage of graduate degree holders may not map exactly to the percentage of individuals in positions that require those degrees. significantly fewer respondents (16 percent) reported holding a high school diploma, some college credit, an associate degree, or a bachelor’s degree as their highest level of education. another aspect we measured in the survey was tasks that respondents performed on a regular basis. the range of tasks provided in the survey allowed for a clearer analysis of job responsibilities than broad categories of library work such as “public services” or “technical services.” some respondents appeared to be employed in solo librarian environments where they are performing several roles. even respondents who might have more focused job titles such as “reference librarian” or “cataloger” may be performing tasks that overlap traditional roles and categories of library work. the tasks offered in the survey and the responses to each are shown in table 2. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 40 task number of respondents percentage of respondents reference 1,404 63.4 instruction 1,296 58.5 collection development 1,260 56.9 circulation 917 41.4 cataloging 905 40.8 electronic resource management 835 37.7 acquisitions 789 35.6 user experience 775 35 library administration 769 34.7 outreach 758 34.2 marketing/public relations 722 32.6 library/it systems 672 30.3 periodicals/serials 659 29.7 media/audiovisuals 566 25.5 interlibrary loan 518 23.4 distance library services 474 21.4 archives/special collections 437 19 other 209 9.40% table 2. tasks performed on a regular basis by survey respondents while public services-related activities lead the list, with reference, instruction, collection development, and circulation as the top four task areas, technical services-related activities are well represented; the next three in rank are cataloging, electronic resource management, and acquisitions. the overall list of tasks shows the diversity of work lis practitioners engage in, as each respondent chose an average of six tasks. the results also suggest that the survey respondents are well acquainted with a wide variety of library work rather than only having experience in a few areas, making their uses of technology more representative of the broader library world. the survey also questioned the barriers lis practitioners face as they try to add more technology to their libraries, and 2,161 respondents replied to the question, “which of the following are barriers to new technology adoption in your library?” financial considerations proved to be the most common barrier, with “budget” chosen by 80.7 percent of respondents, followed by “lack of staff time” (62.4 percent), “lack of staff with appropriate skill sets” (48.5 percent), and “administrative restrictions” (36.7 percent). information technology and libraries | december 2016 41 what combinations of technology skillsets do lis practitioners commonly use? responses from survey question 8, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?,” were analyzed to build an understanding of this research questions. a total of 2,216 responses to this question were received. survey respondents were asked to select from a detailed list of technologies/skills (visible in question 8 of the appendix) that they regularly used. the top answers respondents chose for this question were: email, word processing, web browser, library catalog (public side), and library database searching. the full list of the top twenty-five technology skills and tools used is detailed in figure 1, with the list of the bottom fifteen technology skills used presented in figure 2. figure 1. top twenty-five technology skills/tools used by respondents (n = 2,216) 0 500 1,000 1,500 2,000 email word processing web browser library catalog public side library database searching spreadsheets printers web searching teaching others to use technology presentation software windows os laptops scanners library management system staff side downloadable ebooks web based ebook collections cloud based storage technology troubleshooting teaching using technology online instructional materials/products tablets web video conferencing educational copyright knowledge library website creation or management cloud-based productivity apps technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 42 figure 2. bottom fifteen technology skills/tools used by respondents (n = 2,216) text analysis techniques were then used to determine the frequent combinations of technology skills used in practice. first, a clustering approach was taken to visualize the most popular technologies that were commonly used in combination (figure 3). clustering helps in organizing and categorizing a large dataset when the categories are not known in advance, and, when plotted in a dendrogram chart, assists in visualizing these commonly co-occurring terms. the authors numbered the clusters identified in figure 3 for ease of reference. from left to right, the first cluster is focuses on communication and educational tools, the second emphasizes devices and software, the third contains web and multimedia creation tools, the fourth contains office productivity and public-facing information retrieval tools, and the fifth cluster has a diverse collection of responsibilities including systems-oriented responsibilities (from operating systems to specific hardware devices), working with ebooks, teaching with technology, and teaching technology to others. 0 500 1,000 1,500 2,000 mac os audio recording and editing technology equipment installation computer programming or coding assistive adaptive technology rfid chromebooks network management server management statistical analysis software makerspace technologies linux 3d printers augmented reality virtual reality information technology and libraries | december 2016 43 figure 3. cluster analysis of most frequent technology skills used in practice, with red outlines on each numbered cluster notably, the list of top skills used (figure 1) falls more on the end-user side of technology; skills more oriented toward systems work (e.g. linux, server management, computer programming, or coding) were less frequently mentioned, and several were among the lowest reported (figure 2). of the 2,216 respondents, 15 percent used programming or coding skills regularly in their job (which is of interest as programming or coding was the skill most desired to learn by respondents; this will be discussed further in the context of the next research question). plotting the correlations between the more advanced technology skillsets can provide a picture of the work such systems-oriented positions are commonly responsible for, particularly as they are less well represented in the responses as a whole. figure 4 plots the correlated terms for those tasked with “server management.” it is fair to assume someone with such responsibilities falls on the highly technical end of the spectrum. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 44 figure 4. terms correlated with “server management,” indicating commonly co-occurring workplace technologies for highly-technical positions the more common task of “library website creation or management,” which fell to those with a broad level of technological expertise, had numerous correlated terms. figure 5 demonstrated a wide array of technology tools and responsibilities. figure 5. terms correlated with “library website creation or management,” indicating commonly co-occurring technologies used on the job information technology and libraries | december 2016 45 and lastly, teaching using technology and teaching technology to others is a long-standing responsibility of librarians and library staff. the following plot (figure 6) presents the skills correlated with “teaching others to use technology.” figure 6. terms correlated with “teaching others to use technology,” indicating commonly cooccurring technologies used on the job what combinations of technology skillsets do lis practitioners desire to learn? we analyzed responses to survey question 10, “what technology skill would you like to learn to help you do your job better?,” to explore this research question. as summarized in burke18—and consistent with the prior year’s findings—coding or programming remained the most desired technology skillset, mentioned by 19 percent of respondents. the raw text analysis yielded a fuller list of the top terms mentioned by participants (table 3 and visualized in figure 7). technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 46 technology term number of respondents percentage of respondents coding or programming (combined for reporting) 292 19.59 web 178 11.96 software 158 10.62 video 112 7.53 apps 106 7.12 editing 105 7.06 design 85 5.71 database 76 5.11 table 3. terms mentioned by 5 percent or more of survey respondents figure 7. wordcloud of responses to “what technology skill would you like to learn to help you do your job better?” information technology and libraries | december 2016 47 we then explored the deeper context of responses and individually analyzed responses specific to the more popular technology desires. first, we assessed the responses mentioning the desire to learn coding or programming. of these responses, the most common specific technologies mentioned were html, python, css, javascript, ruby, and sql, listed in decreasing order of interest. although most participants did not describe what they would like to do with their desired coding or programming skills, of those that did, the responses indicated interest in ● becoming more empowered to solve their own technology problems (e.g., “i would like to learn the [programming languages] so i don't have to rely on others to help with our website,” “i’m one of the most tech-skilled people at my library, but i’d like to be able to build more of my own tools and manage systems without needing someone from it or outside support.”); ● improving communication with it (e.g., “how to speak code, to aid in communication with it,” “to better identify problems and work with it to fix them”); ● creating novel tools and improving system interoperability (e.g. “coding for app and api creation”); and ● bringing new technologies to their library and patrons (e.g., “coding so that i can incorporate a hackerspace in my library”). next, we took a clustering approach to visualize the terms commonly desired in combination. figure 8 describes the clustered terms that we found within the programming or coding responses. the terms “programming” and “coding” form a distinct cluster to the right of the diagram, indicating that many responses contained only those two terms. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 48 figure 8. clustering of terms present in responses indicating the desire to learn coding or programming the remaining portion of the diagram begins to illustrate the specific technologies mentioned for those respondents that answered in greater detail or expanded on their general answer of programming or coding. other related desired technology-skill areas become apparent: database management, html and css (as well as the more general “web design,” which appeared in the top terms in table 3), php and javascript, python and sql, and xml creation, among others. the bulleted list presented in the previous paragraph illustrates some of the potential applications participants envisioned these skills being useful in, but the majority did not provide this level of detail in their response. editing was another prominent term that appeared across participant responses and was largely meant in the context of video editing. because of the vagueness of the term “editing,” a closer look was necessary to determine other technology desires. looking at terms highly correlated with “editing” revealed both video and photo editing to be important to respondents. several of the topappearing terms were used more generally: “database” and mobile “apps” were mentioned without specifying the technology tool or scenario of use, such that a more contextual analysis could not be conducted. these responses can be particularly difficult to interpret as the term “databases” can have a technical meaning (e.g., working with sql) or it can refer to the use of library databases from an end user perspective. information technology and libraries | december 2016 49 what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years experience in the field? of the 2,216 survey responses, 877 stated they had worked in libraries for ten or fewer years. we analyzed these responses separately from the remaining 1,334 respondents who had worked in libraries for more than ten years. of this group, 644 had worked in libraries for twenty-plus years (figure 9). a handful of participants did not answer the question and were omitted from the analysis. figure 9. number of survey responses falling into the various categories for number of years working in libraries the top technology skills used in the workplace did not differ significantly between the different groups. the top skills, as discussed earlier and presented in figure 1, were well represented and similarly ordered. a few small percentage points of difference were noted in a handful of the top skills (figure 10). those newer to the field were slightly more likely to teach others to use technology, use cloud-based storage, and use cloud-based productivity apps. more experienced practitioners regularly used the library management system (on the staff side) more than those that were newer to the field. 0 100 200 300 400 500 600 700 0-2 3-5 6-10 11-15 16-20 21+ technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 50 figure 10. top twenty-five technology skills used by respondents in the zero to ten years’ experience (dark blue) and eleven-plus years experience (light blue) groups for the question regarding technologies they would like to learn, 69 percent of the participants with zero to ten years’ experience answered the question compared to a slightly smaller 65 percent of the participants with more than ten-years’ experience. top terms for both groups were very similar, including coding or programming, software, web, video, design, and editing. these terms were not dissimilar to the responses taken as a whole (table 3), indicating that respondents were generally interested in learning the same sorts of technology skills regardless of how long they had been in the field. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% email word processing web browser library catalog public side library database searching spreadsheets web searching printers teaching others to use technology presentation software windows os laptops scanners downloadable ebooks cloud based storage library management system staff side web based ebook collections technology troubleshooting teaching using technology online instructional materials/products cloud-based productivity apps tablets web video conferencing library website creation or management educational copyright knowledge 0-10 years 11+ years information technology and libraries | december 2016 51 a few noticeable differences between the two groups emerged. the most popular skills mentioned, coding or programming, were mentioned by 28 percent of the respondents with zero to ten years’ experience, and by 15 percent of the respondents with eleven-plus years experience. there was slightly more interest (by a few percentage points) in databases, design, python, and ruby in the zero to ten years’ experience group. taking a closer look at the different year ranges in the zero to ten years of experience or less group, revealed that those with three to five years of experience were most likely to be interested in learning coding or programming skills. figure 11. percentage of respondents interested in learning coding or programming in the groups with ten or fewer years’ experience of the participants that answered the question at all, several stated that there were no technology skills they would need or like to learn for their position, either because they were comfortable with their existing skills or were simply open to learning more as needed (but nothing specific came to mind). combined with those who did not answer the question (and so presumably did not have a particular technology they were interested in learning), 28 percent of the zero to ten years’ experience group and 31 percent of the eleven-plus years experience group did not have any technologies that they desired to learn at the moment. discussion as detailed earlier, the most common technologies employed by lis practitioners were email, office productivity tools, web browsers, library catalog and database searching tools, and printers. generally similar technology usage patterns were observed for early and later-career practitioners and programming topped the list of most-desired technology skill to learn. 0% 5% 10% 15% 20% 25% 30% 35% 0-2 years 3-5 years 6-10 years technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 52 the cluster analysis presented in figure 3 suggests that a relatively small percentage of practitioners have technology-intensive roles that would require skills such as programming, working with databases, systems administration, etc. rather, the cluster analysis showed common technology skillsets focused on the end-user side of technology tools. in fact, most of the top ten skills used—email, office productivity tools (word processing, spreadsheets and presentation software), web browsers, library catalog and database searching, printers, and teaching others to use technology—are fairly nontechnical in nature. a potential exception is that of teaching technology. figure 6 suggests that teaching others to use technology entails several hardware devices (for example, laptops, tablets, smartphones, and scanners) as well as online and digital resources, such as ebooks. however, most of the popular skills used would be considered baseline skills for information workers in any domain. as suggested by tennant, programming and other advanced technical skills do not necessarily need to be a core skill for all information professionals, but knowledge of the potential applications and possibilities of such tools is required.19 this idea was echoed by partridge et al., whose findings emphasized the need for awareness and resilience in tackling new technological developments.20 these skills alone would obviously be too little for lis practitioners explicitly seeking a high-tech role, as discussed in maceli.21 however, further research directed toward exploring the mental models and general technological understanding of information professionals would be helpful in understanding the true level of practitioner engagement with technology, to complement the list of relatively low-tech tools employed. programming has been a skill of great interest within the information professions for many years and the respondents’ enthusiasm and desire to learn in this area was readily apparent from the survey results, with nearly 20 percent of participants citing either “programming” or “coding” as a skill they desired to learn. in the context of their current responsibilities, 15 percent of respondents overall mentioned “computer programming or coding” as a regular technological skill they employed (figure 2). there was a slight difference between the librarians with fewer than eleven years of experience—19 percent coded regularly—compared to 13 percent of those with eleven or more years of experience. within the years-of-experience divisions, the newer practitioners were more interested in learning programming, with the peak of interest at three to five years in the workplace (figure 11). the relatively low interest or need to learn programming in the newest practitioners potentially indicates a hopeful finding—that their degree program was sufficient preparation for the early years of their career. prior research would contradict this finding. for example, choi and rasmussen’s 2006 survey found that, in the workplace, librarians frequently felt unprepared in their knowledge of programming and scripting languages.22 in the intervening years, curriculum has shifted to more heavily emphasize technology skills, including web development and other topics covering programming,23 perhaps better preparing early career practitioners. overall, information technology and libraries | december 2016 53 programming remains a popular skill in continuing education opportunities as well as in job listings,24 which aligns well with the respondents’ strong interest in this area. the skills commonly co-occurring with programming in practice included working with linux, database software, managing servers, and webpage creation (figure 4). taken as a whole, these skills indicate job responsibilities falling toward the systems side, with webpage creation a skill that bridged intensely technical and more user-focused work (as also evident in figure 4).this indicates that, though programming may be perceived as highly desirable for communicating and extending systems, as a formal job responsibility it may still fall to a relatively small number of information professionals in any significant manner. makerspace technologies and their implementation possibilities within libraries have garnered a great deal of excitement and interest in recent years, with much literature highlighting innovative projects in this area (such as american library association25 and bagley26). fourie and meyer provided an overview of the existing makerspace literature, finding that most research efforts focus on the needs and construction of the physical space.27 given the general popularity of the topic (as detailed in moorefield-lang),28 it is interesting to note that such technologies were infrequently mentioned by survey participants, both in those desiring to learn these tools and those who were currently using them. the most infrequent skills used (figure 2) included makerspace technologies, 3d printers, augmented, and virtual reality. only a small number of respondents currently used this mix of makerspace-oriented and emerging technologies, and only 3 percent of respondents mentioned interest in learning makespace-related skills. despite many research efforts exploring the particulars of unique makerspaces in a case-study approach (for example, moorefield-lang),29 little data exists on the total number of makerspaces within libraries, and the skillset is largely absent from prior research describing lis curriculum and job listings. this makes it difficult to determine whether the low number of participants that reported working with makerspace technologies is reflective of the small number of such spaces in existence or simply that few practitioners are assigned to work in this area, no matter their popularity. in either case, these findings provide a useful baseline with which to track the growth of makerspace offerings over time and librarian involvement in such intensely technological work. despite the interest and clear willingness to learn and use technology, several workplace challenges became apparent from participant responses. as prior research explored (notable riley-huff and rholes),30 practitioners assumed they would be continually learning and building skills on the job throughout their career to stay current technologically. as described in the earlier results section, many participants mentioned that, although they were highly willing and able to learn, the necessary organizational resources were lacking. as one participant noted, “i’d like to learn anything but the biggest problem seems to be budget (time and monetary).” several participants expressed feeling overwhelmed with their current workload. new learning opportunities, technological or otherwise, were simply not feasible. although the survey results indicated that practitioners of all ages were roughly equally interested in learning new technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 54 technologies, a handful of responses mentioned that ageist issues were creating barriers. though few, these respondents described being dismissed as technologists because of their age. these themes have long been noted in the large body of continuing-education-related literature going back several decades. stone’s study ranked lack of time as the top deterrent to professional development for librarians, and it appears little has changed.31 chan and auster noted that organizational climate and the perception of one’s age may impair the pursuit of professional development, among other impediments.32 however, research has noted a generally strong drive in older librarians to continue their education; long and applegate found a preference in latercareer librarians for learning outlets provided by formal library schools and related professional organizations, but a lower interest in generally popular topics such as programming.33 these findings were consistent with the participant responses gathered in this survey. finally, as detailed in the results section, a significant percent of respondents (33 percent) did not answer the question regarding what technologies they would like to learn. as is a limitation with survey research, it is difficult to know what the respondent’s intention was in not answering the question, i.e., are they comfortable with their current technology skills? do they lack the time or interest in pursuing further technology education? and of those that did answer, many did not specify their intended use of the technologies they desired to learn. so a deeper exploration of what technologies lis practitioners desire to learn and why would be of value as well. these questions are worth pursuing in more depth through further research efforts. conclusion this study provides a broad view into the technologies that lis practitioners currently use and desire to learn, across a variety of types of libraries, through an analysis of survey responses. despite a marked enthusiasm toward using and learning technology, respondents described serious organizational limitations impairing their ability to grow in these areas. the lis practitioners surveyed have interested patrons, see technology as part of their mission, and are not satisfied with the current state of affairs, but they seem to lack money, time, skills, and a willing library administration. though respondents expressed a great deal of interest in more advanced technology topics, such as programming, the majority typically engaged with technology on an end-user level, with a minority engaged in deeply technical work. this study suggests future work in exploring information professionals’ conceptual understanding of and attitudes toward technology, and a deeper look at the reasoning behind those who did not express a desire to learn new technologies. information technology and libraries | december 2016 55 references 1. marshall breeding, “library technology: the next generation,” computers in libraries 33, no. 8 (2013): 16–18, http://librarytechnology.org/repository/item.pl?id=18554. 2. therese f. triumph and penny m. beile, “the trending academic library job market: an analysis of library position announcements from 2011 with comparisons to 1996 and 1988,” college & research libraries 76, no. 6 (2015): 716–39, https://doi.org/10.5860/crl.76.6.716. 3. janie m. mathews and harold pardue, “the presence of it skill sets on librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57, https://doi.org/10.5860/crl.70.3.250. 4. debra a. riley-huff and julia m. rholes, “librarians and technology skill acquisition: issues and perspectives,” information technology and libraries 30, no. 3 (2011): 129–40, https://doi.org/10.6017/ital.v30i3.1770. 5. monica maceli, “creating tomorrow’s technologists: contrasting information technology curriculum in north american library and information science graduate programs against code4lib job listings,” journal of education for library and information science 56, no. 3 (2015): 198–212, https://doi.org/10.12783/issn.2328-2967/56/3/3. 6. walt crawford, “making it work perspective: techno and techmusts,” cites and insights 8, no. 4 (2008): 23–28. 7. helen partridge et al., “the contemporary librarian: skills, knowledge and attributes required in a world -f emerging technologies,” library & information science research 32, no. 4 (2010): 265–71, https://doi.org/10.1016/j.lisr.2010.07.001. 8. riley-huff and rholes, “librarians and technology skill acquisition.” 9. vandana singh and bharat mehra, “strengths and weaknesses of the information technology curriculum in library and information science graduate programs,” journal of librarianship and information science 45, no. 3 (2013): 219–231, https://doi.org/10.1177/0961000612448206. 10. riley-huff and rholes, “librarians and technology skill acquisition.” 11. sharon hu, “technology impacts on curriculum of library and information science (lis)—a united states (us) perspective,” libres: library & information science research electronic journal 23, no. 2 (2013): 1–9, http://www.libres-ejournal.info/1033/. 12. singh and mehra, “strengths and weaknesses of the information technology curriculum.” 13. see, for example, crawford, “making it work perspective”; partridge et al., “the contemporary librarian.” technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 56 14. riley-huff and rholes, “librarians and technology skill acquisition.” 15. elizabeth w. stone, factors related to the professional development of librarians (metuchen, nj: scarecrow, 1969). 16. donna c. chan and ethel auster, “factors contributing to the professional development of reference librarians,” library & information science research 25, no. 3 (2004): 265–86, https://doi.org/10.1016/s0740-8188(03)00030-6. 17. dorothy e. jones, “ten years later: support staff perceptions and opinions on technology in the workplace,” library trends 47, no. 4 (1999): 711–45. 18. john j. burke, the neal-schuman library technology companion: a basic guide for library staff, 5th edition (new york: neal-schuman, 2016). 19. roy tennant, “the digital librarian shortage,” library journal 127, no. 5 (2002): 32. 20. partridge et al., “the contemporary librarian.” 21. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34, no. 3 (2015): 8–21, https://doi.org/10.6017/ital.v34i3.5893. 22. youngok choi and edie rasmussen, “what is needed to educate future digital libraries: a study of current practice and staffing patterns in academic and research libraries,” d-lib magazine 12, no. 9 (2006), http://www.dlib.org/dlib/september06/choi/09choi.html. 23. see, for example, maceli, “creating tomorrow's technologists.” 24. elías tzoc and john millard, “technical skills for new digital librarians,” library hi tech news 28, no. 8 (2011): 11–15, https://doi.org/10.1108/07419051111187851. 25. american library association, “manufacturing makerspaces,” american libraries 44, no. 1/2 (2013), https://americanlibrariesmagazine.org/2013/02/06/manufacturing-makerspaces/. 26. caitlin a. bagley, makerspaces: top trailblazing projects, a lita guide (chicago: american library association, 2014). 27. ina fourie and anika meyer, “what to make of makerspaces: tools and diy only or is there an interconnected information resources space?,” library hi tech 33, no. 4 (2015): 519–25, https://doi.org/10.1108/lht-09-2015-0092. 28. heather moorefield-lang, “change in the making: makerspaces and the ever-changing landscape of libraries,” techtrends 59, no. 3 (2015): 107–12, https://doi.org/10.1007/s11528-015-0860-z. information technology and libraries | december 2016 57 29. heather moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93, https://doi.org/10.1108/lht-06-2014-0056. 30. riley-huff and rholes, “librarians and technology skill acquisition.” 31. stone, factors related to the professional development of librarians. 32. chan and auster, “factors contributing to the professional development of reference librarians.” 33. chris e. long and rachel applegate, “bridging the gap in digital library continuing education: how librarians who were not ‘born digital’ are keeping up,” library leadership & management 22, no. 4 (2008), https://journals.tdl.org/llm/index.php/llm/article/view/1744. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 58 appendix. survey questions 1. what type of library do you work in? 2. where is your library located (state/province/country)? 3. what is your job title? 4. what is your highest level of education? 5. which of the following methods have you used to learn about technologies and how to use them? please mark all that apply. • articles • as part of a degree i earned • books • coworkers • face-to-face credit courses • face-to-face training sessions • library patrons • online credit courses • online training sessions (webinars, etc.) • practice and experiment on my own • web resources i regularly check (sites, blogs, twitter, etc.) • web searching • other: 6. which of the following skill areas are part of your responsibilities? please mark all that apply. • acquisitions • archives/special collections • cataloging • circulation • collection development • distance library services • electronic resource management • instruction • interlibrary loan information technology and libraries | december 2016 59 • library administration • library it/systems • marketing/public relations • media/audiovisuals • outreach • periodicals/serials • reference • user experience • other: 7. how long have you worked in libraries? • 0–2 years • 3–5 years • 6–10 years • 11–15 years • 16–20 years • 21 or more years 8. which of the following technologies or technology skills are you expected to use in your job on a regular basis? please mark all that apply • assistive/adaptive technology • audio recording and editing • augmented reality (google glass, etc.) • blogging • cameras (still, video, etc.) • chromebooks • cloud-based productivity apps (google apps, office 365, etc.) • cloud-based storage (google drive, dropbox, icloud, onedrive, etc.) • computer programming or coding • computer security and privacy knowledge • database creation/editing software (ms access, etc.) • dedicated e-readers (kindle, nook, etc.) • digital projectors technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 60 • discovery layer/service/system • downloadable e-books • educational copyright knowledge • e-mail • facebook • fax machine • image editing software (photoshop, etc.) • laptops • learning management system (lms) or virtual learning environment (vle) • library catalog (public side) • library database searching • library management system (staff side) • library website creation or management • linux • mac operating system • makerspace technologies (laser cutters, cnc machines, arduinos, etc.) • mobile apps • network management • online instructional materials/products (libguides, tutorials, screencasts, etc.) • presentation software (ms powerpoint, prezi, google slides, etc.) • printers (public or staff) • rfid (radio frequency identification) • scanners and similar devices • server management • smart boards/interactive whiteboards • smartphones (iphone, android, etc.) • software installation • spreadsheets (ms excel, google sheets, etc.) • statistical analysis software (sas, spss, etc.) • tablets (ipad, surface, kindle fire, etc.) • teaching others to use technology information technology and libraries | december 2016 61 • teaching using technology (instruction sessions, workshops, etc.) • technology equipment installation • technology purchase decision-making • technology troubleshooting • texting, chatting, or instant messaging • 3d printers • twitter • using a web browser • video recording and editing • virtual reality (oculus rift, etc.) • virtual reference (text, chat, im, etc.) • word processing (ms word, google docs, etc.) • web-based e-book collections • web conferencing/video conferencing (webex, google hangouts, goto meeting, etc.) • webpage creation • web searching • windows operating system • other: 9. which of the following are barriers to new technology adoption in your library? please mark all that apply. • administrative restrictions • budget • lack of fit with library mission • lack of patron interest • lack of staff time • lack of staff with appropriate skill sets • satisfaction with amount of available technology • other: 10. what technology skill would you like to learn to help you do your job better? 11. what technologies do you help patrons with the most? 12. what technology item do you circulate the most? technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 62 13. what technology or technology skill would you most like to see added to your library? the role of the library in the digital economy article the role of the library in the digital economy serhii zharinov information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12457 serhii zharinov (serhii.zharinov@gmail.com) is researcher, state scientific and technical library of ukraine. © 2020. abstract the gradual transition to a digital economy requires all business entities to adapt to the new environmental conditions that are taking place through their digital transformation. these tasks are especially relevant for scientific libraries, as digital technologies make changes in the main subject field of their activities, the processes of creating, storing, and information disseminating. in order to find directions for the transformation of scientific libraries and determine their role in the digital economy, a study of the features of digital transformation and the experience of the digital transformation of foreign libraries was conducted. management of research data, which is implemented through the creation of current research information systems (cris) was found to be one of the most promising areas of the digital transformation of libraries. the problem area of this direction and ways of engaging libraries in it have been also analyzed in the work. introduction the transition to a digital economy contributes to the even greater penetration of digital technologies into our lives and the emergence of new conditions of competition and trends in organizations’ development. big data, machine learning, and artificial intelligence are becoming common tools implemented by the pioneers of digital transformation in their activities.1 significant changes in the main functions of libraries, storage and dissemination of information caused by the development of digital technologies, affect the operational activities of libraries, user and partners’ requests to the library, and ways to meet them. in the process of adapting to these changes, the role of libraries in the digital economy is changing. this study is designed to find current areas of library development and to determine the role of the library in the digital economy. achieving this goal requires study of the “digital economy” concept and the peculiarities of the digital transformation of organizations in order to better understand the role of the library in it; research on the development of libraries and determine what best fits the new role of the library in the digital economy; identification of obstacles to the development of this area and ways to engage libraries in it. the concept of the “digital economy” the transition to an information society and digital economy will gradually change all industries, and all companies must change accordingly.2 taking advantage of the digital economy is the main driving force of innovation, competitiveness, and economic development of the country.3 the transition to a digital economy is not instant but occurs over many years. the topic emerged starting at the end of the twentieth century, but in recent years has experienced rapid growth. in the web of science (wos) citation database, publications with this term in the title began to be published in 1996 (figure 1). mailto:serhii.zharinov@gmail.com information technology and libraries december 2020 the role of the library in the digital economy | zharinov 2 figure 1. the number of publications in the wos citation database for the query “digital economy.” one of the first books devoted entirely to the study of the digital economy concept is the work of don tapscott, published in 1996. in this book, the author understands the digital economy as an economy in which the use of digital computing technologies in economic activity becomes its dominant component.4 thomas mesenbourg, an american statistician and economist, identified in 2000 the three main components of the digital economy: e-business, e-commerce, and e-business infrastructure.5 a number of works on the development of indicators to assess the state of the digital economy, in particular, the work of philip barbet and nathalie coutinet, are based on the analysis of these components.6 alnoor bhimani, in his 2003 paper, “digitization and accounting change,” defined the digital economy as “the digital interrelationships and dependencies between emerging communication and information technologies, data transfers along predefined channels and emerging platforms, and related contingencies within and across institutional and organizational entities.”7 bo carlsson’s 2004 article described the digital economy as a dynamic state of the economy characterized by the constant emergence of new activities based on the use of the internet and new forms of communication between different authors of ideas, whose communication allows them to generate new activities.8 in 2009, john hand gave the meaning of the digital economy as the new design or use of information and communication technologies that help transform the lives of people, society, or business.9 0 100 200 300 400 500 600 700 1995 2000 2005 2010 2015 2020 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 3 ciocoiu carmen nadia, in her 2011 article, explained the digital economy as a state of the economy where knowledge and networking begin to play a more important role than capital in a postindustrial society due to technology.10 in a 2014 article, kit lesya defined the digital economy as an element of the network economy, characterized by the transformation of all spheres of the economy by transferring information resources and knowledge to a computer platform for further use.11 ukrainian scientists mykhailo voinarenko and larysa skorobohata, in a study of network tools in 2015, gave the following definition of the digital economy: “the digital economy, unlike the internet economy, assumes that all economic processes (except for the production of goods) take place independently of the real world. goods and services do not have a physical medium but are ‘electronic.’”12 yurii pivovarov, director of the ukrainian association for innovation development (uaid), gives the following definition: “digital economy is any activity related to information technology. and in this case, it is important to separate the terms: digital economy and it sphere. after all, it is not about the development of it companies, but about the consumption of services or goods they provide—online commerce, e-government, etc.—using digital information technology.”13 taking into account the above, in this study, the digital economy is defined as digital infrastructure encompasses all business entities and their activities. the transition to the digital economy is the process of creating conditions for the digital transformation of organizations, the creation of digital infrastructure, and the process of gradual involvement of various economic entities and certain sectors of the economy in the digital infrastructure. one of the first practical and political manifestations of the transition to the digital economy was the european commission’s index of digital economy and society (desi), first published in 2014. the main components of the index are communications, human capital, internet use, digital integration, and digital public services. among european countries in 2019, there is significant progress in the digitalization of business and in the interaction of society with the state.14 for ukraine, the first step towards the digital economy was the digital economy and development concept of ukraine, which defines the understanding of the digital economy, the direction and principles of transition to it.15 thus, for active representatives of the public sector, this concept is a signal that the development of structures and organizations should be based not on improving operational efficiency, but on transformation in accordance with the requirements of industry 4.0. confirmation of the seriousness of the ukrainian government’s intentions in this direction is the creation of the ministry of digital transformation in 2019 and the digitization of the latest public services through online services.16 one of the priority challenges which needs to be solved at the stage of transition to the digital economy is the development of skills in working with digital technologies in the entire population . this is relevant not only for ukraine, but also for the european union. in europe, a third of the active workforce does not have basic skills in working with digital technologies; in ukraine, 15.1 percent of ukrainians do not have digital skills, and the share of the working population with below-average digital skills is 37.9 percent.17 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 4 part of the solution to this challenge in ukraine is entrusted to the “digital education” project, implemented by the ministry of digital transformation (osvita.diia.gov.ua), which through the mini-series created by him for different target audiences should form digital literacy in the population of ukraine. features of digital transformation developed digital skills in the population make the digital transformation of organizations not just a competitive advantage, but a prerequisite for their survival. thus, the larger the target audience is accustomed to the benefits of the digital economy, the more actively the organization is to adapt to new requirements and customer needs, to the new competitive environment. digital transformation of the organization is a complex process that is not limited to the implementation of software in the company’s activities or automation of certain components of production. it includes changes to all elements of the company, including methods of manufacturing and customer service, the organization’s strategy and business model, approaches , and management methods. according to a study by mckinsey, the integration of new technologies into a company's operations can reduce profits in 45 percent of cases.18 therefore, it is extremely important to have a comprehensive approach to digital transformation, understanding the changes being implemented, choosing the method of their implementation, and gradually involving all structural units and business processes in the process of transformation. the boston consulting group study identified six factors necessary for the effective use of the benefits of modern technologies:19 • connectivity of analytical data; • integration of technologies and automation; • analysis of results and application of conclusions; • strategic partnership; • competent specialists in all departments; and • flexible structure and culture. mckinsey consultants draw attention to the low percentage of successful digital transformation practices and based on the successful experience of 83 companies form five categories of recommendations that can contribute to successful digitalization:20 • involvement of leaders experienced in digitalization; • development of digital staff skills; • creating conditions for the use of digital skills by staff; • digitization of tools and working procedures of the company; and • establishing digital communication and ensuring the availability of information. experts at the institute of digital transformation identify four main stages of digital transformation in the company:21 1. research, analysis and understanding of customer experience. 2. involvement of the team in the process of digital transformation and implementation of corporate culture, which contributes to this process. 3. building an effective operating model based on modern systems. 4. transformation of the business model of the organization. https://osvita.diia.gov.ua/ information technology and libraries december 2020 the role of the library in the digital economy | zharinov 5 the “integrated model of digital transformation” study identifies one of the key factors of successful digital transformation, focusing on priority digital projects, the development and implementation of which should be engaged in specific organizational teams. the authors identify three main functional activities for digital transformation teams, the implementation of which provides a gradual comprehensive renewal of the company, namely: the creation and implementation of digital strategy, digital activity management, digitization of operational activities.22 in their study, ukrainian scientists natalia kraus, oleksandr holoborodko, and kateryna kraus determine that the general pattern for all digital economy projects is their focus on a specific consumer and comprehensive use of available information about the latter and the conditions of project effectiveness.23 initially, the project is pre-tested on a small scale, and only after obtaining satisfactory results from the testing of new principles of activity in a narrow target audience is the project scaled to a wider range of potential users. all this reduces the risks associated with digital transformation. eliminating unnecessary changes and false hypotheses on a small scale allows to avoid overspending at the stage of a comprehensive transformation of the entire enterprise. therefore, the process of effective digital transformation should begin with the involvement of experienced leaders in the field of digital transformation, analysis of the weaknesses of the organization, and building of a plan for its comprehensive transformation, which is divided into individual projects implemented by individual qualified teams with a gradual increase in the volume of these projects, while confirming their effectiveness on a small scale. the process of digital transformation should be accompanied by constant training of employees in digital skills. the goal of digital transformation is to build an efficient, high-profile company that can quickly adapt to new environmental conditions, which is achieved through the introduction of digital technologies and new methods and tools of organization management. directions of library development in the digital economy based on the study of the digital economy concept and the peculiarities of digital transformation, the review of library development in the digital economy was conducted to find the library’s place in digital infrastructure and potential projects that can be implemented on a separate library as part of its comprehensive transformation plan. the main task is to determine the new role of the library in the digital economy and the areas that best meet it. the search for directions in the development of the library in response to the spread of digital technology began at the end of the last century. one of the first concepts to reflect the impact of the internet on the library sector is the concept of the digital library, published in 1999.24 in 2006, the concept of “library 2.0” emerged, which is based on the use of web 2.0 technologies, dynamic sites, users become data authors, open-source software, api interfaces, data added to one database is immediately fed to partner databases.25 the spread of the use of social networks, mobile technologies, and their successful use in library practice has led to the formation of the concept of “library 3.0.”26 the development of open source, cloud service, big data, augmented reality, context-aware, and other technologies has influenced library activities, which is reflected in the “library 4.0.”27 researchers, scholars, and the professional community continued to develop the concepts of the modern library, drawing on the experience of implementing changes in library activities and taking into account the development of other areas, and in 2020 articles began to appear which described the concept of “library 5.0,” based on a personalized approach to students, information technology and libraries december 2020 the role of the library in the digital economy | zharinov 6 support of each student during the whole period of study, development of skills necessary for learning and a set of other supporting actions integrated into the educational process.28 in determining the current role of the library in the digital economy, it is necessary to pay attention to a study by denis solovianenko, who in identifies research and educational infrastructure as one of the key elements of scientific libraries of the twenty-first century.29 olga stepanenko considers libraries as part of the information and communication infrastructure, the development of which is one of the main tasks of the transformation of the socioeconomic environment in accordance with the needs of the digital economy, which ensures high efficiency of stakeholders the pace of digitalization of the state economy, which occurs through the development of its constituent elements.30 the importance of traditional library services replacing digital infrastructure, based on the example of the moravian library, is proved in a study by michal indrak and lenka pokorna, published in april 2020.31 projects that contribute to the library’s adaptation to the conditions of the digital economy, implemented in the environment of public libraries, include: digitization of library collections (including historical heritage) and the creation of a database of full-text documents; providing free access to the internet via library computers and wi-fi; organization of online customer service, development of services that do not require a physical presence in the library; organization of events for the development of digital skills of users, work with information.32 under such conditions, the role of the librarian as a specialist in the field of information changes from being a custodian to being an intermediary, a distributor.33 one of the main objectives of library activity in the digital economy becomes overcoming a digital divide, dissemination of knowledge about modern technologies and innovations, the assistance of their use by the community, development of digital skills in all users of the library.34 an example of the digital public library is the digital north library project in canada, which resulted in the creation of the inuvialuit digital library (https://inuvialuitdigitallibrary.ca). the project lasted four years, bringing together researchers from different universities and the community in the region, who together digitized cultural heritage documents and created metadata. the library now has more than 5,200 digital resources collected in 49 catalogues. the implementation of this project provides access to library services and information to a significant number of people living in remote areas of northern canada and unable to visit libraries (https://sites.google.com/ualberta.ca/dln/home?authuser=0, https://inuvialuitdigitallibrary.ca).35 other representatives of modern digital libraries, one of the main tasks of which is the preservation of cultural heritage and the spread of national culture, are the british library (https://www.bl.uk), the hispanic digital library—biblioteca nacional de españa (http://www.bne.es), gallica digital library in france (https://gallica.bnf.fr), the german digital library—deutsche digitale bibliothek (https://www.deutsche-digitale-bibliothek.de), and the european library (https://www.europeana.eu). another direction was the development of analytical skills in information retrieval. academic libraries, operating with their competencies in information retrieval and information technology, which refined the results of the analysis were able to better identify trends in academia and expand cooperation with teachers to update their curricula.36 libraries become active participants https://inuvialuitdigitallibrary.ca/ https://sites.google.com/ualberta.ca/dln/home?authuser=0 https://inuvialuitdigitallibrary.ca/ https://www.bl.uk/ http://www.bne.es/ https://gallica.bnf.fr/ https://www.deutsche-digitale-bibliothek.de/ https://www.europeana.eu/ information technology and libraries december 2020 the role of the library in the digital economy | zharinov 7 in the process of teaching, learning, and assessment of acquired knowledge in educational institutions. t. o. kolesnikova, in her research of models of library development, substantiates the expediency of creating information intelligence centers for the implementation of the latest scientific advances in training and production processes, the involvement of libraries in the activities of higher educational establishments in the educational process, and the creation of centralized repositories as directions of development for university libraries of ukraine.37 one of the advantages of the development and dissemination of digital technologies is the possibility of forming individual curricula for students. involvement of university libraries in this area is one of the new areas of their activities in the digital economy.38 one of the important areas of operation for departmental and scientific-technical libraries that contribute to increasing the innovative potential of the country is activity in the area of intellectual property. consulting services in the field of intellectual property, information support for scientists, creation of electronic patent information databases in the public domain , and other related services are important components of libraries in many countries. consulting services in the field of intellectual property, information support for scientists, creation of electronic patentinformation databases in the public domain and other related services are important components of libraries in many countries.39 another important component of libraries’ transformation is the deepening of their role in scientific communication; expanding the boundaries of the use of information technology in order to integrate scientific information into a single network; creation and management of information technology infrastructure of science.40 the presence of libraries on social networks has become an important component of their digital transformation. on the one hand, libraries have thus created another source of information dissemination and expanded the number of service delivery channels, for the implementation of which they have developed online training videos and interactive help services.41 on the other hand, social networks have become a marketing tool to engage the audience in the digital fund of the library and its online services. an additional important component of the presence of libraries in social networks was the establishment of contacts and exchange of ideas with other professional organizations, which contributed to the further expansion of the network of library partners.42 another area of activity that libraries take on in the digital economy is the management of research data, which is confirmed by the significant number of publications on this topic in professional scientific and research journals for 2017–18.43 joining this area allows libraries to become part of the scientific digital information and communication infrastructure, the creation of which is one of the main tasks of digital transformation on the way to the digital economy.44 the development of this area contributes to the digitalization of scientific and information sphere, systematization and structuring of all scientific research data has a positive effect on the effectiveness of research, the level of scientific novelty of the results of intellectual activity. the ukrainian institute of the future with the digital agency of ukraine consider digital transformation as the integration of modern digital technologies into all spheres of business. the introduction of modern technologies (artificial intelligence, blockchain, koboty, digital twins, iiot platforms and others) in the production process will lead to the transition to industry 4.0. according to their forecasts, the key competence in industry 4.0 should be data processing and information technology and libraries december 2020 the role of the library in the digital economy | zharinov 8 analytics.45 research information is an integral part of this competence, so the development of this area is one of the most promising for the library in the digital economy. the tools used in the management of research data are called current research information systems, abbreviated as cris. in ukraine, there is no such system connected to the international community. 46 the change of the library’s role from a repository to its manager, the alignment of the functions and tasks of a cris with the key requirements of the digital economy, and the advantages of such systems, together with the fact that they are still not used in ukraine, make this area extremely relevant for research and a promising area of work of scientific libraries, so we’ll consider it more thoroughly. problems in research data management the global experience of research information management shows several problems in the process of research data management. some of them are related to the processes of workflow organization, control, and reporting. this is due to the use of several poorly coordinated systems to organize the work of scientists. data sets from different systems without metadata are very difficult to combine into a single system, and it is almost impossible to automate the process. all this is manifested in the lack of information security of the decision-making process in the field of science, both at the state level and at the level of individual structures. this situation can lead to wrong management decisions and can lead to overspending on similar, duplicate projects; increasing the cost of the process of recruiting and finding scientists with relevant experience for research, finding the equipment needed for research. cris, which began to appear in europe in the 1990s, are designed to overcome these shortcomings and promote the effective organization of scientific work. such systems are now widespread throughout the world, with a total of about five hundred, which are mainly concentrated in europe and india. however, there is currently no research information management system in ukraine that meets international standards and integrates with international scientific databases. this omission slows down ukraine’s integration into the international scientific community. the solution to this problem may be the creation of the national electronic scientific information system uris (ukrainian research information system).47 the development of this system is an initiative of the ministry of education and science of ukraine. it is based on combining data from ukrainian scientific institutions with data from crossref and other organizations, as well as ensuring integration with other international cris systems through the use of the cerif standard. future developers of the system face a number of challenges, both specific and already studied by foreign scientists. a significant number of studies in this area are designed to over come the problem of lack of access to research data, as well as to solve problems of data standardization and openness. in the global experience, the directions of collection processes management and development of structured data sets, their distribution on a commercial basis, and also ways of receiving the advantage of providing them in open access are investigated. the mechanisms of financing these processes are studied, in particular, the effective ways of attracting patronage funds are analyzed. the possibilities of licensing the received data sets and their distribution, approaches and tools that can be the most effective for the library are determined. in particular, alice wise describes information technology and libraries december 2020 the role of the library in the digital economy | zharinov 9 the experience of settling some legal aspects by clarifying the use of the site in the license agreement, which covers the conditions of access to information and search in it, while maintaining a certain level of anonymity.48 the problem of data consistency is related to the lack of uniform standards for information retention, which would relate to the format of the data, the metadata itself, the methods of their generation and use. thus, the use of different standards and formats in repositories and archives leads to problems with data consistency in researchers, which, in turn, affects the quality of service delivery and makes it impossible to use multiple data sets.49 another important problem for the dissemination of research data is the lack of tools, components in libraries, and repositories of higher educational establishments and scientific institutions. it is worth to develop the infrastructure so that at the end of the projects, in addition to the research results, the scientists publish the research data they used and generated. this approach will be convenient both for authors (in case they need to reuse the research data) and for other scientists (because they will have access to data that can be used in their own research).50 the development of the necessary tools is quite relevant, especially because researcher-practitioners are in favor of sharing the data they create with other researchers and the licensed use of other people’s datasets in conducting their own research, according to international surveys.51 another reason for the low prevalence of research data is that datasets have less of an impact on a researcher’s reputation and rating than publications.52 this is partly due to the lack of citation tracking infrastructure in datasets, in contrast to the publication of research results, and the lack of standards for storing and publishing data. prestigious scientific journals have been struggling with this problem for several years. for example, the american economic review requires authors whose articles contain empirical work, modelling, or experimental work to provide information about research data in a volume enough for replication.53 nature and science require authors to preserve research data and provide them at the request of the editors of the journals.54 one of the reasons for the underdeveloped infrastructure in research data management is the weak policy of disseminating free access to this data, as a result of which even a small part of usable scientific data remains closed by license agreements and cannot be used by other scientists.55 open science initiatives related to publications have been operating in the scientific field for a long time, but their dissemination to research data remains insufficient. the development of the uris system will provide management of scientific information, will solve problems highlighted in the above scientific works of researchers; will promote the efficient use of funds, will simplify the process of finding data for conducting research; will discipline research , and therefore will have a positive impact on the entire economy of ukraine. library and research information management library involvement in the development process for scientific information management systems will be an important future direction of their work. such systems, which could include all the necessary information about scientific research, will contribute to the renewal and development of the library sphere of ukraine, will promote the transition of the state to a digital economy. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 10 the creation of the uris system is designed to provide access to research data generated by both ukrainian and foreign scientists. such a system can ensure the development of cooperation in the field of research, intensification of knowledge exchange, and interaction through the open exchange of scientific data and integration of ukrainian scientific infrastructure into the world scientific and information space. according to surveys conducted by the international organizations eurocris and oclc, of the 172 respondents working in the field of research information management, 83 percent said that libraries play an important role in the development of open science, copyright, and the deposit of research results. the share of libraries that play a major role in this direction was 90 percent. almost 68 percent of respondents noted the significant contribution of libraries in filling in the metadata needed to correctly identify the work of researchers in various databases; 60 percent noted the important role of libraries in verifying the correctness of metadata filling by researchers, and almost 49 percent of respondents assess the role of libraries as the main one in the management of research data (figure 4). figure 4. the proportion of organizations among 172 users of cris-systems that assess the role of libraries in the management of research information as basic or supporting.56 at the same time, the activity of libraries in the direction of assistance in information management of scientific research can take various forms, which should be adopted by scientific libraries of ukraine; some of these forms will be useful to public libraries that can become science ambassadors in their communities. based on the experience of foreign libraries, we have identified areas of activity in which the library can join the management of research information. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% financial support for rim project management maintaining or servicing technical operations impact assessment and reporting strategic development, management and planning creating internal reports for departments system configuration outreach and communication initiating rim adaption research data management metadata validation workflows metadata entry training and support open access, copyright and deposit information technology and libraries december 2020 the role of the library in the digital economy | zharinov 11 one of the main directions for libraries that cooperate with cris users or are themselves the organizers of such systems is the introduction and support of open science. historically, libraries support open science because they provide access to scientific papers, but they can further expand their activities. using open data resources and promoting them among the scientific community, involving scientific users in disseminating their own research results on the principles of open science, supporting users in disseminating their publications, creating conditions for increasing the citation of scientific papers, tracking information about user publications, creating and support of public profiles of scientists in scientific and professional resources and scientific social networks—all this will help to intensify researchers in engaging in open science and take advantages of this area. the analysis of the world experience shows that in the activity of scientific libraries there is a significant intensification of support for the strategic goals of the structures that finance their activities and to which they are subordinated. libraries are moving away from the usual customer service and expanding their activities through the use of their own assets and the introduction of new modern tools. such libraries try to promote the development of parent structures, increase modern competencies to meet the needs and goals of these institutions better. by introducing and implementing various tools for the development of management, libraries synchronize their strategy with the strategy of the parent structure to achieve a synergistic effect. the next important direction of library development is their socialization. wanting to get rid of the antiquated understanding of the word library, many of them conduct campaigns aimed at changing the image of the library in the imagination of users, communities, and society. an important component of this system step is to build relationships with the target audience, creating user communities around the library, which are not only its users but also supporters, friends, and promoters. building relationships with members of the scientific community allows libraries to reduce resistance to change as a result of the introduction of scientific information management systems; to influence users positively so that they introduce new tools into their usual activities, receive benefits, and become an active part of the scientific space structuring process. recently, work with metadata has undergone some changes. the need for identification and structuring of data in the world scientific space leads to the fact that they are already filled not only by libraries but also by other organizations that produce, issued, publish scientific results and scientific literature. scientists are beginning to make more active use of modern standards in the field of information in order to promote their own work. libraries, in turn, take on the role of consultant or contractor with many years of experience working with metadata and sufficient knowledge in this area. on the other hand, filling in metadata by users frees up the time of librarians and creates conditions for them to perform other functions, such as information management, creation of automated data collection and management systems integrated with scientific databases—both ukrainian and international. another area of research information management is the direct management of this process. thus, cris are developed and implemented with the contribution of scientific libraries in different countries of the world. this allows libraries to combine disparate data obtained from different sources, compile scientific reports, evaluate the effectiveness of scientific activities of the institution, create profiles of scientific institutions and scientists, develop research network s, etc. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 12 scientists and students can find the results of scientific research, and look for partners and sources of funding for research. research managers have access to up-to-date scientific information, which allows to more accurately assess the productivity and influence of individual scientists, research groups and institutions. business representatives get access to up-to-date information on promising scientific developments, and the public—a way to control research conducting effectively. conclusions ukraine is on the path to a digital economy, characterized by the penetration of new technologies in all areas of human activity, simplification of access to information, goods and services, blurring the geographical boundaries of companies, increasing the share of automated and robotic production units, strengthening the role of creation and use databases. these changes affect all sectors of the economy, and all organizations, without exception, need to adapt accordingly. rapid response to relevant changes helps to increase competitiveness both at the level of individual organizations and at the level of the state economy. adaptation to the conditions of the digital economy occurs through digital transformation—a complex process that requires a review of all business processes of the organization and radically changes its business model. the digital transformation of the organization takes place through the involvement of management, which is competent in digitization, updating management methods, developing digital skills, establishing efficient production and services, implementing digital to ols and building digital communication, implementing individual development projects, and adapting to new user needs. the digital transformation of the economy occurs through the transformation of its individual sectors, creating conditions for the transformation of their representatives. one of the first steps in the process of transition to the digital economy is the establishment of digital information and communication infrastructure. libraries are representatives of the information sphere, which were the main operators of information in the analogue era. significant changes in the subject area of their activities require the search for a new role for libraries. modern projects and directions of library development are integral elements of transformation to the conditions of the digital economy. the result of completing this complex implementation will allow libraries to update their management methods, the range of services, and the channels of their provision; change fixed assets through their digitization, structuring the data and creating metadata; affect approaches to communication with users and cooperation with both domestic and international partners; change the functions and positioning of the library; and will enable them become effective information operator-managers. in the digital economy, the role of the library is changing from passively collecting and storing information to actively managing it. one of the areas of development that most comprehensively meets this role is the management of research data, which is implemented through the creation of cris systems. thus, the main asset of libraries is a digital, structured database, which is automatically and regularly updated, the main focus of which is to support the decision-making process. the library becomes an assistant in conducting research, finding funding, partners, fixed assets and information; a partner in the strategic management of both scientific organizations and the state at the level of committees and ministries. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 13 the development of this area in ukraine requires solving a number of technical, administrative, and managerial questions that are relevant not only in ukraine, but also around the world. in particular, libraries need to address the issue of data integration and consistency, its accessibility and openness, copyright, and personal data issues. solving the problems of creation and operation of cris systems in ukraine are promising areas for future research. endnotes 1 andriy dobrynin, konstantin chernykh, vasyl kupriyanovsky, pavlo kupriyanovsky and serhiy sinyagov, “tsifrovaya ekonomika—razlichnyie puti k effektivnomu primeneniyu tehnologiy (bim, plm, cad, iot, smart city, big data i drugie),” international journal of open information technologies 4, no. 1 (2016): 4–10, https://cyberleninka.ru/article/n/tsifrovayaekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smartcity-big-data-i-drugie. 2 jurgen meffert, volodymyr kulagin, and alexander suharevskiy, digital @ scale: nastolnaya kniga po tsifrovizatsii biznesa (moscow: alpina, 2019). 3 victoria apalkova, “kontseptsiia rozvytku tsyfrovoi ekonomiky v yevrosoiuzi ta perspektyvy ukrainy,” visnyk dnipropetrovskoho universytetu. seriia «menedzhment innovatsii» 23, no. 4 (2015): 9–18, http://nbuv.gov.ua/ujrn/vdumi_2015_23_4_4. 4 don tapscott, the digital economy: promise and peril in the age of networked intelligence (new york: mcgraw-hill, 1996). 5 thomas l. mesenbourg, measuring the digital economy (washington, dc: bureau of the census, 2001). 6 philippe barbet and nathalie coutinet, “measuring the digital economy: state-of-the-art developments and future prospects,” communications & strategies, no. 42 (2001): 153, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf . 7 alnoor bhimani, “digitization and accounting change,” in management accounting in the digital economy, edited by alnoor bhimani, 1-12 (london: oxford university press, 2003), https://doi.org/10.1093/0199260389.003.0001. 8 bo carlsson, “the digital economy: what is m=new and what is not?,” structural change and economic dynamics 15, no. 3 (september 2004): 245–64, https://doi.org/10.1016/j.strueco.2004.02.001. 9 john hand, “building digital economy—the research councils programme and the vision,” lecture notes of the institute for computer sciences, social informatics and telecommunications engineering 16, (2009): 3, https://doi.org/10.1007/978-3-642-11284-3_1. 10 carmen nadia ciocoiu, “integration digital economy and green economy: opportunities for sustainable development,” theoretical and empirical researches in urban management 6, no. 1 (2011): 33–43, https://www.researchgate.net/publication/227346561. https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie http://nbuv.gov.ua/ujrn/vdumi_2015_23_4_4 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf https://doi.org/10.1093/0199260389.003.0001 https://doi.org/10.1016/j.strueco.2004.02.001 https://doi.org/10.1007/978-3-642-11284-3_1 https://www.researchgate.net/publication/227346561 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 14 11 lesya zenoviivna kit, “evoliutsiia merezhevoi ekonomiky,” visnyk khmelnytskoho natsionalnoho universytetu, ekonomichni nauky, no. 3 (2014): 187–94, http://nbuv.gov.ua/ujrn/vchnu_ekon_2014_3%282%29__42. 12 mykhailo voinarenko and larissa skorobohata, “merezhevi instrumenty kapitalizatsii informatsiino-intelektualnoho potentsialu ta innovatsii,” visnyk khmelnytskoho natsionalnoho universytetu, . ekonomichni nauky, no. 3 (2015): 18–24, http://elar.khnu.km.ua/jspui/handle/123456789/4259. 13 yurii pivovarov, “ukraina perehodut na “cifrovu economic,” sccho ce oznachae,” edited by miroslav liskovuch. ukrinform (january 21, 2020). https://www.ukrinform.ua/rubricsociety/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html. 14 european commission, “digital economy and society index,” brussels, belgium, https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en. 15 kabinet ministriv ukrainu, “pro skhvalennia kontseptsii rozvytku tsyfrovoi ekonomiky ta suspilstva ukrainy na 2018–2020 roky ta zatverdzhennia planu zakhodiv shchodo yii realizatsii,” (kyiv: 2018), https://zakon.rada.gov.ua/laws/show/67-2018-%d1%80. 16 kabinet ministriv ukrainu, “pytannia ministerstva tsyfrovoi transformatsii,” (kyiv: 2019), https://zakon.rada.gov.ua/laws/show/856-2019-%d0%bf. 17 piatuy, “biblioteky stanut pershymy oflain-khabamy: mintsyfry zapustyt kursy z tsyfrovoi osvity,” https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfryzapustyt-kursy-z-tsyfrovoi-osvity-206206.html. 18 jacques bughin, jonathan deaki, and barbara o’beirne, “digital transformation: improving the odds of success,” mckinsey & company, https://www.mckinsey.com/businessfunctions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-ofsuccess. 19 domynyk fyld, shylpa patel, and henry leon, “kak dostich tsifrovoy zrelosti,” the boston consulting group inc. (2018), https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_89 1609_mastering_digital_marketing_maturity.pdf. 20 hortense de la boutetière, alberto montagner, and angelika reich, “unlocking success in digital transformations,” mckinsey & company, https://www.mckinsey.com/businessfunctions/organization/our-insights/unlocking-success-in-digital-transformations. 21 top lea, “tsyfrova transformatsiia biznesu: navishcho vona potribna i shche 14 pytan,” businessviews, https://businessviews.com.ua/ru/business/id/cifrova-transformacijabiznesu-navischo-vona-potribna-i-sche-14-pitan-2046. 22 vasily kupriyanovsky, andrey dobrynin, sergey sinyagov, and dmitry namiot, “tselostnaya model transformatsii v tsifrovoy ekonomike—kak stat tsifrovyimi liderami,” international journal of open information technologies 5, no. 1 (2017): 26–33, http://nbuv.gov.ua/ujrn/vchnu_ekon_2014_3%282%29__42 http://elar.khnu.km.ua/jspui/handle/123456789/4259 https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en https://zakon.rada.gov.ua/laws/show/67-2018-%d1%80 https://zakon.rada.gov.ua/laws/show/856-2019-%d0%bf https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_891609_mastering_digital_marketing_maturity.pdf https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_891609_mastering_digital_marketing_maturity.pdf https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 15 https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomikekak-stat-tsifrovymi-liderami. 23 nataliia kraus, alexander holoborodko, and kateryna kraus, “tsyfrova ekonomika: trendy ta perspektyvy avanhardnoho kharakteru rozvytku,” efektyvna ekonomika no. 1 (2018): 1–7, http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf. 24 david bawden and ian rowlands, “digital libraries: assumptions and concepts,” international journal of libraries and information studies (libri), no. 49 (1999): 181–91, https://doi.org/10.1515/libr.1999.49.4.181. 25 jack m. maness, “library 2.0: the next generation of web-based library services,” logos 13, no. 3 (2006): 139–45, https://doi.org/10.2959/logo.2006.17.3.139. 26 woody evans, building library 3.0: issues in creating a culture of participation (oxford: chandos publishing, 2009). 27 younghee noh, “imagining library 4.0: creating a model for future libraries,” the journal of academic librarianship 41, no. 6 (november 2015): 786–97, https://doi.org/10.1016/j.acalib.2015.08.020. 28 helle guldberg et al., “library 5.0,” septentrio conference series, uit the arctic university of norway, no. 3 (2020), https://doi.org/10.7557/5.5378. 29 denys solovianenko, “akademichni biblioteky u novomu sotsiotekhnichnomu vymiri. chastyna chetverta. suchasnyi riven dyskursu akademichnoho bibliotekoznavstva ta postup e-nauky,” bibliotechnyi visnyk no.1 (2011): 8–24, http://journals.uran.ua/bv/article/view/2011.1.02. 30 olga petrivna stepanenko, “perspektyvni napriamy tsyfrovoi transformatsii v konteksti rozbudovy tsyfrovoi ekonomiky,” in modeliuvannia ta informatsiini systemy v ekonomitsi : zb. nauk. pr., edited by v. k. halitsyn, (kyiv: kneu, 2017), 120–31, https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120131.pdf?sequence=1&isallowed=y. 31 michal indrák and lenka pokorná, “analysis of digital transformation of services in a research library,” global knowledge, memory and communication (2020), https://doi.org/10.1108/gkmc-09-2019-0118. 32 irina sergeevna koroleva, “biblioteka—optimalnaya model vzaimodeystviya s polzovatelyami v usloviyah tsifrovoy ekonomiki,” informatsionno-bibliotechnyie sistemyi, resursyi i tehnologii no. 1 (2020): 57–64, https://doi.org/10.20913/2618-7515-2020-1-57-64. 33 james currall and michael moss, “we are archivists, but are we ok?”, records management journal 18, no. 1 (2008): 69–91, https://doi.org/10.1108/09565690810858532. 34 kirralie houghton, marcus foth and evonne miller, “the local library across the digital and physical city: opportunities for economic development,” commonwealth journal of local governance no. 15 (2014): 39–60, https://doi.org/10.5130/cjlg.v0i0.4062. https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf https://doi.org/10.1515/libr.1999.49.4.181 https://doi.org/10.2959/logo.2006.17.3.139 https://doi.org/10.1016/j.acalib.2015.08.020 https://doi.org/10.7557/5.5378 http://journals.uran.ua/bv/article/view/2011.1.02 https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isallowed=y https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isallowed=y https://doi.org/10.1108/gkmc-09-2019-0118 https://doi.org/10.20913/2618-7515-2020-1-57-64 https://doi.org/10.1108/09565690810858532 https://doi.org/10.5130/cjlg.v0i0.4062 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 16 35 sharon farnel and ali shiri, “community-driven knowledge organization for cultural heritage digital libraries: the case of the inuvialuit settlement region,” advances in classification research online no. 1 (2019): 9–12, https://doi.org/10.7152/acro.v29i1.15453. 36 elizabeth tait, konstantina martzoukou, and peter reid, “libraries for the future: the role of it utilities in the transformation of academic libraries,” palgrave communications no. 2 (2016): 1–9, https://doi.org/10.1057/palcomms.2016.70. 37 tatiana alexandrovna kolesnykova, “suchasna biblioteka vnz: modeli rozvytku v umovakh informatyzatsii,” bibliotekoznavstvo. dokumentoznavstvo. informolohiia no. 4 (2009): 57–62, http://nbuv.gov.ua/ujrn/bdi_2009_4_10. 38 ekaterina kudrina and karina ivina, “digital environment as a new challenge for the university library,”bulletin of kemerovo state university. series: humanities and social sciences 2, no. 10 (2019): 126–34, https://doi.org/10.21603/2542-1840-2019-3-2-126-134. 39 anna kochetkova, “tsyfrovi biblioteky yak oznaka xxi stolittia,” svitohliad no. 6 (2009): 68–73, https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68kochetkova.pdf. 40 victoria alexandrovna kopanieva, “naukova biblioteka: vid e-katalohu do e-nauky,” bibliotekoznavstvo. dokumentoznavstvo. informolohiia no. 6 (2016): 4–10, http://nbuv.gov.ua/ujrn/bdi_2016_3_3. 41 christy r. stevens, “reference reviewed and re-envisioned: revamping librarian and deskcentric services with libstars and libanswers,” the journal of academic librarianship 39, no. 2 (march 2013): 202–14, https://doi.org/10.1016/j.acalib.2012.11.006. 42 samuel kai-wah chu and helen s du, “social networking tools for academic libraries,” journal of librarianship and information science 45, no. 1 (february 17, 2012): 64–75, https://doi.org/10.1177/0961000611434361. 43 acrl research planning and review committee, “2018 top trends in academic libraries a review of the trends and issues affecting academic libraries in higher education,” c&rl news 79, no.6 (2018): 286–300. https://doi.org/10.5860/crln.79.6.286. 44 currall and moss, “we are archivists, but are we ok?”, 69–91, https://doi.org/10.1108/09565690810858532. 45 valerii fishchuk et al., “ukraina 2030e— kraina z rozvynutoiu tsyfrovoiu ekonomikoiu,” ukrainskyi instytut maibutnoho, 2018, https://strategy.uifuture.org/kraina-z-rozvinutoyucifrovoyu-ekonomikoyu.html. 46 eurocris, “search the directory of research information system (dris),” https://dspacecris.eurocris.org/cris/explore/dris. 47 mon, “mon zapustylo novyi poshukovyi servis dlia naukovtsiv—vin bezkoshtovnyi ta bazuietsia na vidkrytykh danykh z usoho svituю,” https://mon.gov.ua/ua/news/mon https://doi.org/10.7152/acro.v29i1.15453 https://doi.org/10.1057/palcomms.2016.70 http://nbuv.gov.ua/ujrn/bdi_2009_4_10 https://doi.org/10.21603/2542-1840-2019-3-2-126-134 https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf http://nbuv.gov.ua/ujrn/bdi_2016_3_3 https://doi.org/10.1016/j.acalib.2012.11.006 https://doi.org/10.1177/0961000611434361 https://doi.org/10.5860/crln.79.6.286 https://doi.org/10.1108/09565690810858532 https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://dspacecris.eurocris.org/cris/explore/dris https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu information technology and libraries december 2020 the role of the library in the digital economy | zharinov 17 zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-navidkritih-danih-z-usogo-svitu. 48 nancy herther et al., “text and data mining contracts: the issues and needs,” proceedings of the charleston library conference, 2016, https://doi.org/10.5703/1288284316233. 49 karen hogenboom and michele hayslett, “pioneers in the wild west: managing data collections.” portal: libraries and the academy 17, no. 2 (2017): 295–319, https://doi.org/10.1353/pla.2017.0018. 50 philip young et al., “library support for text and data mining,” a report for the university libraries at virginia tech, 2017, http://bit.ly/2fccowu. 51 carol tenopir et al., “data sharing by scientists: practices and perceptions,” plos one 6 (2011), no. 6, https://doi.org/10.1371/journal.pone.0021101. 52 filip kruse and jesper boserup thestrup, “research libraries’ new role in research data management, current trends and visions in denmark,” liber quarterly 23, no.4 (2014): 310– 35, https://doi.org/10.18352/lq.9173. 53 american economic review, “data and code.” aer guidelines for accepted articles. instructions for preparation of accepted manuscripts, 2020, https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#iic. 54 “data access and retention.” the publication ethics and malpractice statement, (new york: marsland press, 2019), http://www.sciencepub.net/marslandfile/ethics.pdf. 55 patricia cleary et al., “text mining 101: what you should know,” the serials librarian 72, no.1-4 (may 2017): 156–59, https://doi.org/10.1080/0361526x.2017.1320876. 56 rebecca bryant et al., practices and patterns in research information management findings from a global survey (dublin: oclc research, 2018), https://doi.org/10.25333/bgfg-d241. https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://doi.org/10.5703/1288284316233 https://doi.org/10.1353/pla.2017.0018 http://bit.ly/2fccowu https://doi.org/10.1371/journal.pone.0021101 https://doi.org/10.18352/lq.9173 https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#iic http://www.sciencepub.net/marslandfile/ethics.pdf https://doi.org/10.1080/0361526x.2017.1320876 https://doi.org/10.25333/bgfg-d241 abstract introduction the concept of the “digital economy” features of digital transformation directions of library development in the digital economy problems in research data management library and research information management conclusions endnotes 2 information technology and libraries | march 2009 andrew k. pace president’s message: lita now andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. a t the time of this writing, my term as lita president is half over; by the time of publication, i will be in the home stretch—a phrase that, to me, always connotes relief and satisfaction that is never truly realized. i hope that this time between ala conferences is a time of reflection for the lita board, committees, interest groups, and the membership at large. various strategic planning sessions are, i hope, leading us down a path of renewal and regeneration of the division. of course, the world around us will have its effect—in particular, a political and economic effect. first, the politics. i was asked recently to give my opinion about where the new administration should focus its attention regarding library technology. i had very little time to think of a pithy answer to this question, so i answered with my gut that the united states needs to continue its investment in it infrastructure so that we are on par with other industrialized nations while also lending its aid to countries that are lagging behind. furthermore, i thought it an apt time to redress issues of data privacy and retention. the latter is often far from our minds in a world more connected, increasingly through wireless technology, and with a user base that, as one privacy expert put it, would happily trade a dna sample for an extra value meal. i will resist the urge to write at greater length a treatise on the bill of rights and its status in 2008. i will hope, however, that lita’s technology and access and legislation and regulation committees will feel reinvigorated post–election and post–inauguration to look carefully at the issues of it policy. our penchant for new tools should always be guided and tempered by the implementation and support of policies that rationalize their use. as for the economy, it is our new backdrop. one anecdotal view of this is the number of e-mails i’ve received from committee appointees apologizing that they will not be able to attend ala conferences as planned because of the economic downturn and local cuts to library budgets. libraries themselves are in a paradoxical situation—increasing demand for the free services that libraries offer while simultaneously facing massive budget cuts that support the very collections and programs people are demanding. what can we do? well, i would suggest that we look at library technology through a lens of efficiency and cost savings, not just from a perspective of what is cool or trendy. when it comes to running systems, we need to keep our focus on end-user satisfaction while considering total cost of ownership. and if i may be selfish for a moment, i hope that we will not abandon our professional networks and volunteer activities. while we all make sacrifices of time, money, and talent to support our profession, it is often tempting when economic times are hard to isolate ourselves from the professional networks that sustain us in times of plenty. politics and economics? though i often enjoy being cynical, i also try to make lemonade from lemons whenever i can. i think there are opportunities for libraries to get their own economic bailout in supporting public works and emphasizing our role in contributing to the public good. we should turn our “woe-are-we” tendencies that decry budget cuts and low salaries into championed stories of “what libraries have done for you lately.” and we should go back to the roots of it, no matter how mythical or anachronistic, and think about what we can do technically to improve systemwide efficiencies. i encourage the membership to stay involved and reengage, whether through direct participation in lita activities or through a closer following of the activities in the ala office of information technology policy (oitp, www.ala.org/ala/aboutala/offices/oitp) and the ala washington office itself. there is much to follow in the world that affects our profession, and so many are doing the heavy lifting for us. all we need to do sometimes is pay attention. make fun of me if you want for stealing a campaign phrase from richard nixon, but i kept coming back to it in my head. in short, library information technology— now more than ever. letter from the editor: reviewers wanted letter from the editor reviewers wanted kenneth j. varnum information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13xxx together with one of the other journals published by ala’s core division, information technology and libraries (ital) and library leadership and management (ll&m) invite applications for peer reviewers. serving as a reviewer is a great opportunity for individuals from all types of libraries and with a wide variety of experience to contribute to scholarship within our chosen profession. we are seeking the broadest pool of reviewers possible. reviewer responsibilities for both journals are to have an interest/experience with the journal’s topics, as described below. reviewers should expect to review 2-4 articles a year and should provide thoughtful and actionable comments to authors and the editor. reviewers will work with the editor, associate editor, and/or editorial board of the corresponding journal. see the job description for ital reviewers for more details about this new role. we welcome applications from individuals at libraries of all types, levels of experience, locations, perspectives, and voices, especially those from underrepresented groups. reviewers will be selected to maximize the diversity of representation across these areas, so if you’re not sure if you should apply, please do! increasing the pool of reviewers for information technology and libraries is part of the editorial board’s desire to provide equitable treatment to submitted articles and will enable us to follow a more typical process for peer-reviewed journals: a two-reviewer double-blind process. that will be a welcome and, frankly, overdue change to ital’s current process, in which submitted articles are typically reviewed by one person. expanding the number of reviewers across the breadth of subject areas our journal covers will foster a more rigorous yet more open review process. should you be more interested more in the policy side of this journal, please watch out for a call for volunteers for the ital editorial board. that process will start in april. * * * * * * * as this issue of the journal goes online, covid as a global health crisis has just entered its second year. i’m constantly reminded of the duality of our collective ability to show resilience and exhibit fragility as we continue to endure this period. when i wrote the letter from the editor a year ago, i focused on the imminent vote to establish a new ala division, core, as the most important question facing me. how quickly things changed! by the time the march 2020 issue was published, everything was different. wherever you are, however you have adapted to the situation, i hope you are well and, like me, are turning from wondering when this period will end, to wondering what “normal” will be in the post-pandemic world. kenneth j. varnum, editor varnum@umich.edu march 2021 https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://ejournals.bc.edu/index.php/ital/about/editorialteam https://doi.org/10.6017/ital.v39i1.12137 mailto:varnum@umich.edu leadership and infrastructure and futures…oh my! letter from the core president leadership, infrastructure, futures christopher cronin information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13027 christopher cronin (cjc2260@columbia.edu) is core president and associate university librarian for collections, columbia university. © 2020. i am so pleased to be able to welcome all ital subscribers to core: leadership, infrastructure, futures! this issue marks the first of ital since the election of core’s inaugural leadership. a merger of what was formerly three separate ala divisions—the association of library collections & technical services (alcts), library & information technology association (lita), and the library leadership & management association (llama)—core is an experiment of sorts. it is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. while initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. it is instead an assertion of our collective position at the center of our profession. it is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. all organizations need effective leaders, strong infrastructure, and a vision for the future. and that is what core strives to build with and for its members. while i welcome ital’s readers into core, i also welcome core’s membership into ital. no longer publications of their former divisions, all three journals within core have an opportunity to reconsider their mandates. as with all things, audience matters. ital’s readership has now expanded dramatically, and those new readers must be invited into ital’s world just as much as ital has been invited into theirs. as we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. and who doesn’t want to reinvent themselves every once in a while? start over. move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. how will being part of this effort, and of an expanded division, reframe ital’s mandate? the importance of information technology has never been more apparent. it is not lost on me that we do this work in core during a year of unprecedented tumult. in 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. and throughout all of these challenges, what have we relied on more this year than technology? technology kept us productive and engaged. it provided a focal point for communication and connection. it provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu information technology and libraries december 2020 leadership, infrastructure, futures | cronin 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. for some, but unfortunately not all, technology also kept us employed. and as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. and yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. in her parting words to you in the final issue of ital as a lita journal, evviva weinraub lajoie, the last president of lita, wrote: we may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. i believe that together, we can make core stand up to that challenge. i believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. it was a privilege to have served as the final president of alcts and such a humbling and daunting responsibility to now transition into serving as core’s first. it is a responsibility i do not take lightly, particularly in this moment when so much is demanded of us. as we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. we must work together to make our professional home everything we need it to be and to help those who need us. it is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687 june_ital_fagan_final an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda jody condit fagan an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 7 7 abstract academic web search engines have become central to scholarly research. while the fitness of google scholar for research purposes has been examined repeatedly, microsoft academic and google books have not received much attention. recent studies have much to tell us about google scholar’s coverage of the sciences and its utility for evaluating researcher impact. but other aspects have been understudied, such as coverage of the arts and humanities, books, and non-western, non-english publications. user research has also tapered off. a small number of articles hint at the opportunity for librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. this article seeks to summarize research concerning google scholar, google books, and microsoft academic from the past three years with a mind to informing practice and setting a research agenda. selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas. introduction recent pew internet surveys indicate an overwhelming majority of american adults see themselves as lifelong learners who like to “gather as much information as [they] can” when they encounter something unfamiliar (horrigan 2016). although significant barriers to access remain, the open access movement and search engine giants have made full text more available than ever.1 the general public may not begin with an academic search engine, but google may direct them to google scholar or google books. within academia, students and faculty rely heavily on academic web search engines (especially google scholar) for research; among academic researchers in high-income areas, academic search engines recently surpassed abstracts & indexes as a starting place for research (inger and gardner 2016, 85, fig. 4). given these trends, academic librarians have a professional obligation to understand the role of academic web search engines as part of the research process. jody condit fagan (faganjc@jmu.edu) is professor and director of technology, james madison university, harrisonburg, va. 1 khabsa and giles estimate “almost 1 in 4 of web accessible scholarly documents are freely and publicly available” (2014, 5). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 8 two recent events also point to the need for a review of research. legal decisions in 2016 confirmed google’s right to make copies of books for its index without paying or even obtaining permission from copyright holders, solidifying the company’s opportunity to shape the online experience with respect to books. meanwhile, microsoft rebooted their academic web search engine, now called microsoft academic. at the same time, information scientists, librarians, and other academics conducted research into the performance and utility of academic web search engines. this article seeks to review the last three years of research concerning academic web search engines, make recommendations related to the practice of librarianship, and propose a research agenda. methodology a literature review was conducted to find articles, conference presentations, and books about the use or utility of google books, google scholar, and microsoft academic for scholarly use, including comparisons with other search tools. because of the pace of technological change, the focus was on recent studies (2014 through 2016, inclusive). a search was conducted on “google books” in ebsco’s library and information science and technology abstracts (lista) on december 19, 2016, limited to 2014-2016. of the 46 results found, most were related to legal activity. only four items related to the tool’s use for research. these four titles were entered into google scholar to look for citing references, but no additional relevant citations were found. in the relevant articles found, the literature reviews testified to the general lack of studies of google books as a research tool (abrizah and thelwall 2014; weiss 2016) with a few exceptions concerning early reviews of metadata, scanning, and coverage problems (weiss 2016). a search on “google books” in combination with “evaluation or review or comparison” was also submitted to jmu’s discovery service,2 limited to 2014-2016 in combination with the terms. forty-nine items were found and from these, three relevant citations were added; these were also entered into google scholar to look for citing references. however, no additional relevant citations were found. thus, a total of seven citations from 2014-2016 were found with relevant information concerning google books. earlier citations from the articles’ bibliographies were also reviewed when research was based on previous work, and to inform the development of a fuller research agenda. a search on “microsoft academic” in lista on february 3, 2017 netted fourteen citations from 2014-2016. only seven seemed to focus on evaluation of the tool for research purposes. a search on “microsoft academic” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery service, limited to 2014-2016. eighteen items were found but no additional citations were added, either because they had already been found or were not relevant. the seven titles found in lista were searched in google scholar for citing references; four additional relevant citations were found, plus a paper relevant to google scholar not 2 jmu’s version of ebsco discovery service contained 453,754,281 items at the time of writing and is carefully vetted to contain items of curricular relevance to the jmu community (fagan and gaines 2016). information technology and libraries | june 2017 9 previously discovered (weideman 2015). thus, a total of eleven citations were found with relevant information for this review concerning microsoft academic. because of this small number, several articles prior to 2014 were included in this review for historical context. an initial search was performed on “google scholar” in lista on november 19, 2016, limited to 2014-2016. this netted 159 results, of which 24 items were relevant. a search on “google scholar” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery tool limited to 2014-2016, and eleven relevant citations were added. items older than 2014 that were repeatedly cited or that formed the basis of recent research were retrieved for historical context. finally, relevant articles were submitted to google scholar, which netted an additional 41 relevant citations. altogether, 70 citations were found to articles with relevant information for this review concerning google scholar in 2014-2016. readers interested in literature reviews covering google scholar studies prior to 2014 are directed to (gray et al. 2012; erb and sica 2015; harzing and alakangas 2016b). findings google books google books (https://books.google.com) contains about 30 million books, approaching the library of congress’s 37 million, but far shy of google’s estimate of 130 million books in existence (wu 2015), which google intends to continue indexing (jackson 2010). content in google books includes publisher-supplied, self-published, and author-supplied content (harper 2016) as well as the results of the famous google books library project. started in december 2004 as the “google print” project,3 the project involved over 40 libraries digitizing works from their collections, with google indexing and performing ocr to make them available in google books (weiss 2016; mays 2015). scholars have noted many errors with google books metadata, including misspellings, inaccurate dates, and inaccurate subject classifications (harper 2016; weiss 2016). google does not release information about the database’s coverage, including which books are indexed or which libraries’ collections are included (abrizah and thelwall 2014). researchers have suggested the database covers mostly u.s. and english-language books (abrizah and thelwall 2014; weiss 2016). the conveniences of google books include limits by the type of book availability (e.g. free ebooks vs. google e-books), document type, and date. the detail view of a book allows magnification, hyperlinked tables of contents, buying and “find in a library” options, “my library,” and user history (whitmer 2015). google books also offers textbook rental (harper 2016) and limited print-on-demand services for out-of-print books (mays 2015; boumenot 2015). in april 2016, the supreme court affirmed google’s right to make copies for its index without paying or even obtaining permission from copyright holders (authors guild 2016; los angeles times 2016). scanning of library books and “snippet view” was deemed fair use: “the purpose of the copying is highly transformative, the public display of text is limited, and the revelations do 3 https://www.google.com/googlebooks/about/history.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 10 not provide a significant market substitute for the protected aspects of the originals” (u.s. court of appeals for the second circuit 2015). literature concerning high-level implications of google books suggests the tool is having a profound effect on research and scholarship. the tool has been credited for serving as “a huge laboratory” for indexing, interpretation, working with document image repositories, and other activities (jones 2010). at the same time, the academic community has expressed concerns about google books’s effects on social justice and how its full-text search capability may change the very nature of discovery (hoffmann 2014; hoffmann 2016; szpiech 2014). one study found that books are far more prevalently cited in wikipedia than are research articles (kousha and thelwall 2017). yet investigations of google books’ coverage and utility as a research tool seem to be sorely lacking. as weiss noted, “no critical studies seem to exist on the effect that google books might have on the contemporary reference experience” (weiss 2016, 293). furthermore, no information was found concerning how many users are taking advantage of google books; the tool was noticeably absent from surveys such as (inger and gardner's (2016) and from research centers such as the pew internet research project. in a largely descriptive review, harper (2016) bemoaned google books’ lack of integration with link resolvers and discovery tools, and judged it lacking in relevant material for the health sciences, because so much of the content is older. she also noted the majority of books scanned are in english, which could skew scholarship. the non-english skew of google books was also lamented by weiss, who noted an “underrepresentation of spanish and overestimation of french and german (or even japanese for that matter)” especially as compared to the number of spanish speakers in the united states (weiss 2016, 286-306). whitmer (2015) and mays (2015) provided practical information about how google books can be used as a reference tool. whitmer presented major google books features and challenged librarians to teach google books during library instruction. mays conducted a cursory search on the 1871 chicago fire and described the primary documents she retrieved as “pure gold,” including records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs (mays 2015, 22). mays also described google books as a godsend to genealogists for finding local records (e.g. police departments, labor unions, public schools). in her experience, the geographic regions surrounding the forty participating google books library project libraries are “better represented than other areas” (mays 2015, 25). mays concludes, “its poor indexing and search capabilities are overshadowed by the ease of its fulltext search capabilities and the wonderful ephemera that enriches its holdings far beyond mere ‘books’” (mays 2015, 26). abrizah and thelwall (2014) investigated whether google books and google scholar provided “good impact data for books published in non-western countries.” they used a comprehensive list of arts, humanities, and social sciences books (n=1,357) from the five main university presses in information technology and libraries | june 2017 11 malaysia 1961-2013. they found only 23% of the books were cited in google books4 and 37% in google scholar (p. 2502). the overlap was small: only 15% were cited in both google scholar and google books. english-language books were more likely to be cited in google books; 40% of english language books were cited versus 16% malay. examining the top 20 books cited in google books, researchers found them to be mostly written in english (95% in google books vs 29% in the sample), and published by university of malaysia press (60% in google books vs 26% in the sample) (2505). the authors concluded that due to the low overlap between google scholar and google books, searching both engines was required to find the most citations to academic books. kousha and thelwall (2015; 2011) compared google books with thomson reuters book citation index (bkci) to examine its suitability for scholarly impact assessment and found google books to have a clear advantage over bkci in the total number of citations found within the arts and humanities, but not for the social sciences or sciences. they advised combining results from bkci with google books when performing research impact assessment for the arts and humanities and social sciences, but not using google books for the sciences, “because of the lower regard for books among scientists and the lower proportion of google books citations compared to bkci citations for science and medicine” (kousha and thelwall 2015, 317). microsoft academic microsoft academic (https://academic.microsoft.com) is an entirely new software product as of 2016. therefore, the studies cited prior to 2016 refer to entirely different search engines than the one currently available. however, a historical account of the tool and reviewers’ opinions was deemed helpful for informing a fuller picture of academic web search engines and pointing to a research agenda. microsoft academic was born as windows live academic in 2006 (carlson 2006), was renamed live search academic after a first year of struggle (jacsó 2008), and was scrapped two years later after the company recognized it did not have sufficient development support in the united states (jacsó 2011). microsoft asia research group launched a beta tool called libra in 2009, which redirected to the “microsoft academic search” service by 2011. early reviews of the 2011 edition of microsoft academic search were promising, although the tool clearly lacked the quantity of data searched by google scholar (jacsó 2011; hands 2012). there were a few studies involving microsoft academic search in 2014. ortega and aguillo (2014) compared microsoft academic search and google scholar citations for research evaluation and concluded “microsoft academic search is better for disciplinary studies than for analyses at institutional and individual levels. on the other hand, google scholar citations is a good tool for individual assessment because it draws on a wider variety of documents and citations” (1155). 4 google books does not support citation searching; the researchers searched for the book title to manually find citations to a book. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 12 as part of a comparative investigation of an automatic method for citation snowballing using microsoft academic search, choong et al. (2014) manually searched for a sample of 949 citations to journal or conference articles cited from 20 systematic reviews. they found microsoft academic search contained 78% of the cited articles and noted its utility for testing automated methods due to its free api and no blocks to automated access. the researchers also tested their method against google scholar, but noted “computer-access restrictions prevented a robust comparison” (n.p.). also in 2014, orduna-malea et al. (2014) attempted a longitudinal study of disciplines, journals, and organizations in microsoft academic search only to find the database had not been updated since 2013. furthermore they found the indexing to be incomplete and still in process, meaning microsoft academic search’s presentation of information about any particular publication, organization, or author was distorted. despite this finding, mas was included in two studies of scholar profiles. ortega (2015) compared scholar profiles across google scholar, microsoft academic search, research gate, academia.edu, and mendeley, and found little overlap across the sites. they also found social and usage indicators did not consistently correlate with bibliometric indicators, except on the researchgate platform. social and usage indicators were “influenced by their own social sites,” while bibliometric indicators seemed more stable across all services (13). ward et al. (2015) still included microsoft academic search in their discussion of scholarly profiles as part of the social media network, noting microsoft academic search was painfully time-consuming to work with in terms of consolidating data, correcting items, and adding missing items. in september 2016, hug et al. demonstrated the utility of the new microsoft academic api by conducting a comparative evaluation of normalized data from microsoft academic and scopus (hug, ochsner, and braendle 2016). they noted microsoft academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016” (10). the microsoft academic api offers rich, structured metadata with the exception of document type. they found all attributes containing text were normalized and that identifiers were available for all entities, including references, supporting bibliometricians’ needs for data retrieval, handling, and processing. in addition to the lack of document type, the researchers also found the “fields of study” to be too granular and dynamic, and their hierarchies incoherent. they also desired the ability to use the doi to build api requests. nevertheless, the advantages of microsoft academic’s metadata and api retrieval suggested to hug et al. that microsoft academic was superior to google scholar for calculating research impact indicators and bibliometrics in general. in october 2016, harzing and alakangas compared publication and citation coverage of the new microsoft academic with google scholar, scopus, and web of science using a sample of 145 academics at the university of melbourne (harzing and alakangas 2016a) including observations from 20-40 faculty each in the humanities, social sciences, engineering, sciences, and life sciences. they discovered microsoft academic had improved substantially since their previous study (harzing 2016b), increasing 9.6% for a comparison sample in comparison with 1.4%, 2%, and 1.7% growth in google scholar, scopus, and web of science (n.p.). the researchers noted a few information technology and libraries | june 2017 13 problems with data quality, “although the microsoft academic team have indicated they are working on a resolution” (n.p.). on average, the researchers found that microsoft academic found 59% as many citations as google scholar, 97% as many citations as scopus, and 108% as many citations as web of science. google scholar had the top counts for each disciplinary area, followed by scopus except in the social sciences and humanities, where microsoft academic ranked second. the researchers explained that microsoft academic “only includes citation records if it can validate both citing and cited papers as credible,” as established through a machine-learningbased system, and discussed an emerging metric of “estimated citation count” also provided by microsoft academic. the researchers concluded that microsoft academic is promising to be “an excellent alternative for citation analysis” and suggested microsoft should work to improve coverage of books and grey literature. google scholar google scholar was released in beta form in november 2004, and was expanded to include judicial case law in 2009. while google scholar has received much attention in academia, it seems to be regarded by google as a niche product: in 2011 google removed scholar from the list of top services and list of “more” services, relegating it to the “even more” list. in 2014, the scholar team consisted of just nine people (levy 2014). describing google scholar in an introductory manner is not helped by google’s vague documentation, which simply says it “includes scholarly articles from a wide variety of sources in all fields of research, all languages, all countries, and over all time periods.”5 the “wide variety of sources” includes “journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts,” as well as court opinions and patents, but not “news or magazine articles, book reviews, and editorials.” books and dissertations uploaded to google book search are “automatically” included in scholar. google says abstracts are key, noting “sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from google scholar.” studies of google scholar can be divided in to three major categories of focus: investigating the coverage of google scholar; the use and utility of google scholar as part of the research process; and google scholar’s utility for bibliographic measurement, including evaluating the productivity of individual researchers and the impact of journals. there is some overlap across these categories, because studies of google scholar seem to involve three questions: 1) what is being searched? 2) how does the search function? and 3) to what extent can the user usefully accomplish her task? the coverage of google scholar scholars want to know what “scholarship” is covered by google scholar, but the documentation merely states that it indexes “papers, not journals”6 and challenges researchers to investigate 5 https://scholar.google.com/intl/en/scholar/inclusion.html 6 https://www.google.com/intl/en/scholar/help.html#coverage an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 14 google scholar’s coverage empirically despite google scholar’s notoriously challenging technical limitations. while some limitations of google scholar have been corrected over the years, longstanding logistical hurdles involved with studying google scholar’s coverage have been well-documented for over a decade (shultz 2007; bonato 2016; haddaway et al. 2015; levay et al. 2016), and include: • search queries are limited to 256 characters • not being able to retrieve more than 1,000 results • not being able to display more than 20 results per page • not being able to download batches of results (e.g. to load into citation management software) • duplicate citations (beyond the multiple article “versions”), requiring manual screening • retrieving different results with advanced and basic searches • no designation of the format of items (e.g. conference papers) • minimal sort options for results • basic boolean operators only7 • illogical interpretation of boolean operators: esophagus or oesophagus and oesophagus or esophagus return different numbers of results (boeker, vach, and motschall 2013) • non-disclosure of the algorithm by which search results are sorted. additionally, one study reported experiencing an automated block to the researcher’s ip address after the export of approximately 180 citations or 180 individual searches (haddaway et al. 2015, 14). furthermore, the research excellence framework was unable to use google scholar to assess the quality of research in uk higher education institutions, because of researchers’ inability to agree with google on a “suitable process for bulk access to their citation information, due to arrangements that google scholar have in place with publishers” (research excellence framework 2013, 1562). such barriers can limit what can be studied and also cost researchers significant time in terms of downloading (prins et al. 2016) and cleaning citations (levay et al. 2016). despite these hurdles, research activity analyzing the coverage of google scholar has continued in the past two years, often building off previous studies. this section will first discuss google scholar’s size and ranking, followed by its coverage of articles and citations, then its coverage of books, grey literature, and open access and institutional repositories. google scholar size and ranking in a 2014 study, khabsa and giles estimated there were at least 114 million english-language scholarly documents on the web, of which google scholar had “nearly 100 million.” another study by orduna-malea, ayllón, martín-martín, and lópez-cózar (2015) estimated that the total number 7 e.g., no nesting of logical subexpressions deeper than one level (boeker, vach, and motschall 2013) and no truncation operators. information technology and libraries | june 2017 15 of documents indexed by google scholar, without any language restriction, was between 160 and 165 million. by comparison, in 2016 the author’s discovery tool contained about 168 million items in academic journals, conference materials, dissertations, and reviews.8 google scholar’s presence in the information marketplace has influenced vendors to increase the discoverability of their content, including pushing for the display of abstracts and/or the first page of articles (levy 2014). proquest and gale indexes were added to google scholar in 2015 (quint 2016). martín-martín et al. (2016b) noted that google scholar’s agreements with big publishers come at a price: “the impossibility of offering an api,” which would support bibliometricians’ research (54). google scholar’s results ranking “aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”9 martín-martín and his colleagues (2017, 159) conducted a large, longitudinal study of null query results in google scholar and found a strong correlation between result list ranking and times cited. the influence of citations is so strong that when the researchers performed the same search process four months later, 14.7% of documents were missing in the second sample, causing them to conclude even a change of one or two citations could lead to a document being excluded or included from the top 1,000 results (157). using citation counts as a major part of the ranking algorithm has been hypothesized to produce the “matthew effect,” where “work that is already influential becomes even more widely known by virtue of being the first hit from a google scholar search, whereas possibly meritorious but obscure academic work is buried at the bottom” (antell et al. 2013, 281). google scholar has been shown to heavily bias its ranking toward english-language publications even when there are highly cited non-english publications in the result set, although selection of interface language may influence the ranking. martin-martin and his colleagues noted that google scholar seems to use the domain of the document’s hosting web site as a proxy for language, meaning that “some documents written in english but with their primary version hosted in nonanglophone countries’ web domains do appear in lower positions in spite of receiving a large number of citations” (martin-martin et al. 2017, 161). this effect is shown dramatically in figure 3 of their paper. google scholar coverage: articles and citations the coverage of articles, journals, and citations by google scholar has been commonly examined by using brute force methods to retrieve a sample of items from google scholar and possibly one or more of its competitors. (studies discussed in this section are listed in table 1). the goal is usually to determine how well google scholar’s database compares to traditional research databases, usually in a specific field. core methodology involves importing citations into software such as publish or perish (harzing 2016a), cleaning the data, then performing statistical tests, 8 the discovery tool does not contain all available metadata but has been carefully vetted (fagan and gaines 2016). 9 https://www.google.com/intl/en/scholar/about.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 16 expert review, or both. haddaway (2015) and moed et al. (2016) have written articles specifically discussing methodological aspects. recent studies repeatedly find that google scholar’s coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations (karlsson 2014; harzing 2014; harzing 2016b; harzing and alakangas 2016b; moed, barilan, and halevi 2016; prins et al. 2016; wildgaard 2015; ciccone and vickery 2015). in only three studies did google scholar find fewer items, and the meaningful difference was minimal.10 science disciplines were the most studied in google scholar, including agriculture, astronomy, chemistry, computer science, ecology, environmental science, fisheries, geosciences, mathematics, medicine, molecular biology, oceanography, physics, and public health. social sciences studied include education (prins et al. 2016), economics (harzing 2014), geography (ştirbu et al. 2015, 322-329), information science (winter, zadpoor, and dodou 2014; harzing 2016b), and psychology (pitol and de groote 2014). studies related to the arts or humanities 2014-2016 included an analysis of open access journals in music (testa 2016) and a comparison between google scholar and web of science for research evaluation within education, pedagogical sciences, and anthropology11 (prins et al. 2016). wildgaard (2015) and bornmann et al. (2016) included samples of humanities scholars as part of bibliometric studies, but did not discuss disciplinary aspects related to coverage. prior to 2014, the only study found related to the arts and humanities compared google scholar with historical abstracts (kirkwood jr. and kirkwood 2011). google scholar’s coverage has been growing over time (meier and conkling 2008; harzing 2014; winter, zadpoor, and dodou 2014; bartol and mackiewicz-talarczyk 2015, 531; orduña-malea and delgado lópez-cózar 2014) with recent increases in older articles (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b), leading some to question whether this supports the documented trend of increased citation of older literature (martín-martín et al. 2016c; varshney 2012). winter et al. noted that in 2005 web of science yielded more citations than google scholar for about two-thirds of their sample, but for the same sample in 2013, google scholar found more citations than web of science, with only 6.8% of citations not retrieved by google scholar (winter, zadpoor, and dodou 2014, 1560). the unique citations of web of science were “typically documents before the digital age and conference proceedings not available online” (winter, zadpoor, and dodou 2014, 1560). harzing and alakangas’s (2016b) large-scale longitudinal comparison of google scholar, scopus, and web of science suggested that google scholar’s retroactive expansion has stabilized and now all three databases are growing at similar rates. 10 for example, bramer, giustini, and kramer (2016a) found slightly more of their 4,795 references from systematic reviews in embase (97.5%) than in google scholar (97.2%). in testa (2016), the music database rilm indexed two more of the 84 oa journals than google scholar (which indexed at least one article from 93% of the journals). finally, in a study using citations to the most-cited article of all time as a sample, web of science found more citations than did google scholar (winter, zadpoor, and dodou 2014). 11 prins et al. classified anthropology as part of the humanities. information technology and libraries | june 2017 17 google scholar also seems to cover both the oldest and the most recent publications. unlike traditional abstracts and indexes, google scholar is not limited by starting year, so as publishers post tables of contents of their earliest journals online, google scholar discovers those sources (antell et al. 2013, 281). trapp (2016) reported the number of citations to a highly-cited physics paper after the first 11 days of publication to be 67 in web of science, 72 in scopus, and 462 in google scholar (trapp 2016, 4). in a study of 800 citations to nobelists in multiple fields, harzing found that “google scholar could effectively be 9–12 months ahead of web of science in terms of publication and citation coverage” (2013, 1073). an increasing proportion of journal articles in google scholar are freely available in full text. a large-scale, longitudinal study of highly-cited articles 1950-2013 found 40% of article citations in the sample were freely available in full text (martín-martín et al. 2014). another large-sample study found 61% of articles in their sample from 2004–2014 could be freely accessed (jamali and nabavi 2015). in both studies, nih.gov and researchgate were the top two full-text providers. google scholar’s coverage of major publisher content varies; having some coverage of a publisher does not imply all articles or journals from that publisher are covered. in a sample of 222 citations compared across google scholar, scopus, and web of science, google scholar contained all of the springer titles, as many elsevier titles as scopus, and the most articles by wolters kluwer and john wiley. however, among the three databases, google scholar contained the fewest articles by bmj and nature (rothfus et al. 2016). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 18 18 study sample results (bartol and mackiewicztalarczyk 2015) documents retrieved in response to searches on crops and fibers in article titles, 1994-2013 (samples varied by crop) google scholar returned more documents retrieved for each crop. for example, “hemp” retrieved 644 results in google scholar, 493 in scopus, and 318 in web of science; google scholar demonstrated higher yearly growth of records over time. (bramer, giustini, and kramer 2016b) references from a pool of systematic reviewer searches in medicine (n=4795) google found 97.2%, embase, 97.5%, medline 92.3% of all references; when using search strategies, embase retrieved 81.6%, medline 72.6%, and google scholar 72.8%. (ciccone and vickery 2015) based on 183 user searches randomly selected from ncsu libraries’ 2013 summon search logs (n=137) no significant difference between the performance of google scholar, summon, and eds for known-item searches; “google scholar outperformed both discovery services for topical searches.” (harzing 2014) publications and citation metrics for 20 nobelists in chemistry, economics, medicine, physics, 20122013 (samples varied) google scholar coverage is now “increasing at a stable rate” and provides “comprehensive coverage across a wide set of disciplines for articles published in the last four decades” (575). (harzing 2016b) citations from one researcher (n=126) microsoft academic found all books and journal articles covered by google scholar; google scholar found 35 additional publications including book chapters, white papers, and conference papers. (harzing and alakangas 2016a) samples from (harzing and alakangas 2016b, 802) (samples varied by faculty) google scholar provided higher “true” citation counts than microsoft academic but microsoft academic “estimated” citation counts were 12% higher than google scholar for life sciences and equivalent for the sciences. information technology and libraries | june 2017 19 (harzing and alakangas 2016b) citations of the works of 145 faculty among 37 scholarly disciplines at the university of melbourne (samples varied by faculty) for the top faculty member, google scholar had 519 total papers (compared with 309 in both web of science and scopus); google scholar had 16,507 citations (compared with 11,287 in web of science and 11,740 in scopus). (hilbert et al. 2015) documents published by 76 information scientists in german-speaking countries (n=1,017) google scholar covered 63%, scopus, 31%, bibsonomy, 24%, mendeley, 19%, web of science, 15%, citeulike, 8%. (jamali and nabavi 2015) items published between 2004 and 2014 (n=8,310) 61% of articles were freely available; of these, 81% were publisher versions and 14% were pre-prints; researchgate was the top full-text source netting 10.5% of full-text sources, followed by ncbi.nlm.nih.gov (6.5%). (karlsson 2014) journals from ten different fields (n=30) google scholar retrieved documents from all the selected journals; summon only retrieved documents from 14 out of 30 journals. (lee et al. 2015) journal articles housed in florida state university’s institutional repository (n=170) metadata found in google for 46% of items and in google scholar for 75% of items; google scholar found 78% of available full text. google scholar found full text for six items with no full text in the ir. (martín-martín et al. 2014) items highly cited by google scholar (n=64,000) 40% could be freely accessed using google scholar; nih.gov and researchgate were the top two full-text providers. (moed, bar-ilan, and halevi 2016) citations to 36 highly cited articles in 12 scientific-scholarly english-language journals (n=about 7,000) 47% of sources were in both google scholar and scopus; 47% of sources were in google scholar only; 6% of sources were in scopus only; of the unique google scholar citations, sources were most often from google books, springer, ssrn, researchgate, acm digital library, arxiv, and aclweb.org. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 20 (prins et al. 2016) article citations in the field of education and pedagogies, and citations to 328 articles in anthropology (n=774) google scholar found 22,887 citations in education & pedagogical science compared to web of science’s 8,870, and 8,092 in anthropology compared with web of science’s 1,097. (ştirbu et al. 2015) compared # of citations resulting from two geographical topic searches (samples varied) google scholar found 2,732 geographical references whereas web of science found only 275, georef 97, and francis 45. for sedimentation, google scholar found 1,855 geographical references compared to web of science’s 606, georef’s 1,265, and francis’s 33; google scholar overlapped web of science by 67% and 82% for the two searches, and georef by 57% and 62% (testa 2016) open access journals in music (n=84) google scholar indexed at least one article from 93% of oa journals. rilm indexed two additional journals. (wildgaard 2015) publications from researchers in astronomy, environmental science, philosophy and public health (n=512) publication count from web of science was 2-4 times lower for all disciplines than google scholar; citation count was up to 13 times lower in web of science than in google scholar. (winter, zadpoor, and dodou 2014) growth of citations to 2 classic articles (19952013) and 56 science and social science articles in google scholar, 2005-2013 (samples varied) total citation counts 21% higher in web of science than google scholar for lowry (1951) but google scholar 17% higher than web of science for garfield (1955) and 102% higher for the 56 research articles; google scholar showed a significant retroactive expansion to all articles compared to negligible retroactive growth in web of science. table 1. studies investigating google scholar’s coverage of journal articles and citations, 2014-2016. information technology and libraries | june 2017 21 google scholar coverage: books many studies mentioned that books, including google books, are sometimes included in google scholar results. jamali and nabavi (2015) found 13% of their sample of 8,310 citations from google scholar were books, while martín-martín et al. (2014) had found that 18% of their sample of 64,000 citations from google scholar were books. within the field of anthropology, prins (2016) found books to generate the most citation impact in google scholar (41% of books in their sample were cited in google scholar) compared to articles (21% of articles were cited in google scholar). in education, 31% of articles and 25% of books were cited by google scholar (3). abrizah and thelwall found only 37% of their sample of 1,357 arts, humanities, and social sciences books from the five main university presses in malaysia had been cited in google scholar (23% of the books had been cited in google books) (abrizah and thelwall 2014, 2502). the overlap was small: 15% had impact in both google scholar and google books. the authors concluded that due to the low overlap between google scholar and google books, searching both engines is required to find the most citations to academic books. english books were significantly more likely to be cited in google scholar (48% vs. 32%), as were edited books (53% vs. 36%). they surmised edited books’ citation advantage was due to the use of book chapters in social sciences. they found arts and humanities books more likely to be cited in google scholar than social sciences books (40% vs. 34%) (abrizah and thelwall 2014, 2503). google scholar coverage: grey literature grey literature refers to documents not published commercially, including theses, reports, conference papers, government information, and poster sessions. haddaway et al. (2015) was the only empirical study found focused on grey literature. they discovered that between 8% and 39% of full-text search results from google scholar were grey literature, with the greatest concentration of citations from grey literature on page 80 of results for full-text searches and page 35 for title searches. they concluded “the high proportion of grey literature that is missed by google scholar means it is not a viable alternative to hand searching for grey literature as a standalone tool” (2015, 14). for one of the systematic reviews in their sample, none of the 84 grey literature articles cited were found within the exported google scholar search results. the only other investigation of grey literature found was bonato (2016), who after conducting a very limited number of searches on one specific topic and a search for a known item, concluded google scholar to be “deficient.” in conclusion, despite much offhand praise for google scholar’s grey literature coverage (erb and sica 2015; antell et al. 2013), the topic has been little studied and when it has, grey literature results have not been prominent. google scholar coverage: open access and institutional repository content erb and sica touted google scholar’s access to “free content that might not be available through a library’s subscription services,” including open access journals and institutional repository coverage (2015, 48). recent research has dug deeper into both these content areas. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 22 in general, oa articles have been shown to net more citations than non-oa articles, as koler-povh, južnic, and turk (2014) showed within the field of civil engineering. across their sample of 2,026 scholarly articles in 14 journals, all indexed in web of science, scopus, and google scholar, oa articles received an average of 43 citations while non-oa articles were cited 29 times (1039). google scholar did a better job discovering those citations; in google scholar the median of citations of oa articles was always higher than that for non-oa articles, wheras this was true in web of science for only 10 of the 14 journals and in scopus for 11 of the 14 journals (1040). similarly, chen (2014) found google scholar to index far more oa journals than scopus and web of science, especially “gold oa.”12 google scholar’s advantage should not be assumed across all disciplines, however; testa (2016) found both google scholar and rilm to provide good coverage of oa journals in music, with google scholar indexing at least one article from 93% of the 84 oa journals in the sample. but the bibliographic database rilm indexed two more oa journals than google scholar. google scholar indexing of repositories may be critical for success, but results vary by ir platform and whether the ir metadata has been structured according to google’s guidelines. in a random sample from shodhganga, india’s central etd database, weideman (2015) found not one article had been indexed in full text by google scholar, although in many cases the metadata was indexed, leading the author to identify needed changes to the way shodhganga stores etds.13 likewise, chen (2014) found that neither google scholar nor google appears to index baidu wenku, a major full-text archive and social networking site in china similar to researchgate, and orduña-malea and lópez-cózar (2015) found that latin american repositories are not very visible in google or google scholar due to limitations of the description schemas chosen as well as search engine reliability. in yang’s (2016) study of texas tech’s dspace ir, google was the only search engine that indexed, discovered, or linked to pdf files supplemented with metadata; google scholar did not discover or provide links to the ir’s pdf files, and was less successful at discovering metadata. when google scholar is able to index ir content, it may be responsible for significant traffic. in a study of four major u.s. universities’ institutional repositories (three dspace, one contentdm) involving a dataset of 57,087 unique urls and 413,786 records, researchers found that 48%–66% of referrals came from google scholar (obrien et al. 2016, 870). the importance of google scholar in contrast to google was noted by lee et al. (2015), who conducted title searches on 170 journal articles housed in florida state university’s institutional repository (using bepress’s digital commons platform), 100 of which existed in full text in the ir. links to the ir were found in google results for 45.9% of the 170 items, and in google scholar for 74.7% of the 170 items. furthermore, google scholar linked to the full text for 78% of the 100 cases where full text was available, and even provided links to freely available full text for six items that did not have full 12 oa articles on publisher web sites, whether the journal itself is oa or not (chen 2014) 13 most notably, the need to store thesis documents as one pdf file instead of divided into multiple, separate files, to create html landing pages as per google’s recommendations, and to submit the addresses of these pages to google scholar. information technology and libraries | june 2017 23 text in the ir. however, the researchers also noted “relying on either google or google scholar individually cannot ensure full access to scholarly works housed in oa irs.” in their study, among the 104 fully open access items there was an overlap in results of only 57.5%; google provided links to 20 items not found with google scholar, and google scholar provided links to 25 items not found with google (lee et al. 2015, 15). google scholar results note the number of “versions” available for each item. in a study of 982 science article citations (including both oa and non-oa) in irs, pitol and degroote found 56% of citations had between four and nine google scholar versions (2014, 603) almost 90% of the citations shown were the publisher version, but of these, only 14.3% were freely available in full text on the publisher web site. meanwhile, 70% percent of the items had at least one free full-text version available through a “hidden” google scholar version. the author’s experience in retrieving full text for this review indicates this issue still exists, but research would be needed to formulate reliable recommendations for users. use and utility of google scholar as part of the research process studies were found concerning google scholar’s popularity with users and their reasons for preferring it (or not) over other tools. another group of studies examined issues related to the utility of google scholar for research processes, including issues related to messy metadata. finally, a cluster of articles focused specifically on using google scholar for systematic reviews. popularity and user preferences several studies have shown google scholar to be well-known to scholarly communities. a survey of 3,500 scholars from 95 countries found that over 60% of 3,500 scientists and engineers and over 70% of respondents in the social sciences, arts, and humanities were aware of google scholar and used it regularly (van noorden 2014). in a large-scale journal-reader survey, inger and gardner (2016) found that among academic researchers in high-income areas, academic search engines surpassed abstracts and indexes as a starting place for research (2016, 85, figure 4). in low-income areas, google use exceeded google scholar use for academic research. major library link resolver software offers reports of full-text requests broken down by referrer. inger and gardner (2016) showed a large variance across subjects for whether people prefer google or google scholar: “people in the social sciences, education, law, and business use google scholar more to find journal articles. however, people working in the humanities and religion and theology prefer to use google” (88). humanities scholar use of google over google scholar was also found by kemman et al. (2013); google, google images, google scholar, and youtube were used more than jstor or other library databases, even though humanities scholars’ trust in google and google scholar was lower. user research since 2014 concerning google scholar has focused on graduate students. results suggest scholar is used regularly but the tool is only partially sufficient. in their study of 20 engineering masters’ students’ use of abstracts and indexes, johnson and simonsen (2015) found an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 24 that half their sample (n=20) had used google scholar the last time they located an article using specific search terms or criteria. google was the second most-used source at 20%, followed by abstracting and indexing services (15%). graduate students describe google scholar with nuance and refer to it as a specific part of their process. in bøyum and aabø’s (2015) interviews with eight phd business students and wu and chen’s (2014, 381) interviews with 32 graduate students drawn from multiple academic disciplines, the majority described using library databases and google scholar for different purposes depending on the context. graduate students in both studies were well aware of google scholar’s use for citation searching. bøyum and aabø’s (2015) subjects described library resources as more “academically robust” than google or google scholar. wu and chen’s (2014) interviewees praised google scholar for its wider coverage and convenience, but lamented the uncertain quality, sometimes inaccessible full text, too many results, lack of sorting function (document type or date), finding documents from different disciplines, and duplicate citations. google scholar was seen by their subjects as useful during early stages of information seeking. in contrast to general assumptions, more than half the students (wu and chen 2014, 381) interviewed reported browsing more than 3 pages’ worth of google scholar results. about half of interviewees reported looking at cited documents to find more, however students had mixed opinions about whether the citing documents turned out to be relevant. google scholar’s “my library” feature, introduced in 2013, now competes with other bibliographic citation management software. in a survey of 344 (mostly graduate) students, conrad, leonard, and somerville found google scholar was the most-used (47%) followed by endnote (37%), and zotero (19%) (2015, 572). follow-up interviews with 13 of the students revealed that a few students used multiple tools, for example one participant noted he/she used “endnote for sharing data with lab partners and others “across the community”; mendeley for her own personal thesis work, where she needs to “build a whole body of literature”; and google scholar citations for “quick reference lists that i may not need for a second or third time.” messy metadata many studies have suggested google scholar’s metadata is “messy.” although none in the period of study examined this phenomenon in conjunction with relative user performance, the issues found could affect scholarship. a 2016 study itemized the most common mistakes in google scholar resulting from its extraction process: 1) incorrect title identification; 2) missing or incorrectly assigned authors; 3) book reviews indexed as books; 4) failing to group versions of the same document, which inflates citation counts; 5) grouping different editions of books, which deflates citation counts; 6) attributing citations to documents that did not cite them, or missing citations that did; and 7) duplicate author profiles (martín-martín et al. 2016b). the authors concluded that “in an academic big data environment, these errors (which we deem affect less than 10% of the records in the database) are of no great consequence, and do not affect the core system information technology and libraries | june 2017 25 performance significantly” (54). two of these issues have been studied specifically: duplicate citations and missing publication dates. the rate of duplicate citations in google scholar has ranged upwards of 2.93% (haddaway et al. 2015) and 5% (winter, zadpoor, and dodou 2014, 1562), which can be compared to a .05% duplicate citation rate in web of science (haddaway et al. 2015, 13). haddaway found the main reasons for duplication include “typographical errors, including punctuation and formatting differences; capitalization differences (google scholar only), incomplete titles, and the fact that google scholar scans citations within reference lists and may include those as well as the citing article” (2015, 13). the issue of missing publication dates varies greatly across samples. dates were found to be missing 9% of the time in winter et al.’s study, although it varied by publication type: 4% of journals, 15% of theses, and 41% of the unknown document types” (winter, zadpoor, and dodou 2014, 1562). however martin-martin et al. studied a sample of 32,680 highly-cited documents and found that web of science and google scholar agreed on publication dates 96.7% of the time, with an idiosyncratically large proportion of those mismatches in 2012 and 2013 (2017, 159). utility for research processes prior to 2014, studies such as asher, duke, and wilson's 2012 evaluated google scholar’s utility as a general research tool, often in comparison with discovery tools. since 2014, the only such study found was namei and young’s comparison of summon, google scholar, and google using 299 known-item queries. they found google scholar and summon returned relevant results 74% of the time; google returned relevant results 91% of the time. for “scholarly formats,” they found summon returned relevant results 76% of the time; google 79%; and google 91% (2015, 526527). the remainder of studies in this category focused specifically on systematic reviews, perhaps because such reviews are so time-consuming. authors develop search strategies carefully, execute them in multiple databases, and document their search methods and results carefully. some prestigious journals are beginning to require similar rigor for any original research article, not just systematic reviews (cals and kotz 2016). information provided by professional organizations about the use of google scholar for systematic reviews seems inconsistent: the cochrane handbook for systematic reviews of interventions lists google scholar among sources for searching, but none of the five “highlighted reviews” on the cochrane web site at the time of this article’s writing used google scholar in their methodologies. the uk organization national institute for health and care excellence’s manual (national institute for health and care excellence (nice)) only mentions google scholar in an appendix of search sources under “conference abstracts.” a study by gehanno et al. (2013) found google scholar contained 100% of the references from 29 systematic reviews, and suggested google scholar could be the first choice for systematic reviews or meta-analyses. this finding prompted a slew of follow-up studies in the next three years. an an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 26 immediate response by giustini and boulos (2013) pointed out that systematic reviews are not performed by searching for article titles as with gehanno et al.’s method, but through search strategies. when they tried to replicate a systematic review’s topical search strategy in google scholar, the citations were not easily discovered. in addition the authors were not able to find all the papers from a given systematic review even by title searching. haddaway et al. also found imperfect coverage: for one of the seven reviews examined, 31.5% of citations could not be found (2015, 11). haddaway also noted that special characters and fonts (as with chemical symbols) can cause poor matching when such characters are part of article titles. recent literature concurs that it is still necessary to search multiple databases when conducting a systematic review, including abstracts and indexes, no matter how good google scholar’s coverage seems to be. no one database’s coverage is complete, including google scholar (thielen et al. 2016), and practical recall of google scholar is exceptionally low due to the 1,000 result limit, yet at the same time, google scholar’s lack of precision is costly in terms of researchers’ time (bramer, giustini, and kramer 2016b; haddaway et al. 2015). the challenges limiting study of google scholar’s coverage also bedevil those wishing to use it for reviews, especially the 1,000 result retrieval limit, lack of batch export, and lack of exported abstracts (levay et al. 2016). additionally, google scholar’s changing content, unknown algorithm and updating practices, search inconsistencies, limited boolean functions, and 256-character query limit prevent the tool from accommodating the detailed, reproducible search methodologies required by systematic reviews (bonato 2016; haddaway et al. 2015; giustini and boulos 2013). bonato noted google scholar retrieved different results with advanced and basic searches; could not determine the format of items (e.g. conference papers); and found other inconsistent results.14 bonato also lamented the lack of any kind of document type limit. despite the limitations and logistical challenges, practitioners and scholars are finding solid reasons for including academic web search engines as part of most systematic review methodologies (cals and kotz 2016). stansfield et al. noted that “relevant literature for lowand middle-income countries, such as working and policy papers, is often not included in databases,” and that google scholar finds additional journal articles and grey literature not indexed in databases (2016, 191). for eight systematic reviews by eppi-center, “over a quarter of relevant citations were found from websites and internet search engines” (stansfield, dickson, and bangpan 2016, 2). specific tools and practices have been recommended when using search engines within the context of systematic reviews. software is available to record search strategies and results (harzing and alakangas 2016b; haddaway 2015). haddaway suggests the use of snapshot tools (haddaway 2015) to record the first 1,000 google scholar records rather than the typical assessment of the first 50 search results as had been done in the past: “this change in practice 14 bonato (2016) found zero hits for conference papers when limiting by year 2015-2016, but found two papers presented at a 2015 meeting. information technology and libraries | june 2017 27 could significantly improve both the transparency and coverage of systematic reviews, especially with respect to their grey literature components.” (haddaway et al. 2015, 15). both haddaway (2015) and cochrane recommend that review authors print or save locally electronic copies of the full text or relevant details rather than bookmarking web sites, “in case the record of the trial is removed or altered at a later stage” (higgins and green 2011). new methods for searching, downloading, and integrating academic search engine results into review procedures using free software to increase transparency, repeatability, and efficiency have been proposed by haddaway and his colleagues (2015). google scholar citations and metrics google scholar citations and metrics are not academic search engines, but this article included them because these products are interwoven into the fabric of the google scholar database. google scholar citations, launched in late 2011 (martín-martín et al. 2016b, 12) groups citations by author, while google metrics (launch date uncertain) provides similar data for articles and journals. readers interested in an in-depth literature review of google scholar citations for earlier years (2005-2012) are directed to (thelwall and kousha 2015b). in his comprehensive review of more recent literature about using google scholar citations for citation analysis, waltman (2016) described several themes. google scholar’s coverage of many fields is significantly broader than web of science and scopus, and this seems to be continuing to improve over time. however studies regularly report google scholar’s inaccuracies, content gaps, phantom data, easily manipulatable citation counts, lack of transparency, and limitations for empirical bibliometric studies. as discussed in the coverage section, google scholar’s citation database is competitive with other major databases such as web of science and has been growing dramatically in the last few years (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b; harzing 2014) but has recently stabilized (harzing and alakangas 2016b). more and more studies are concluding that google scholar will report more comprehensive information about citation impact than web of science or scopus. across a sample of articles from many years of one science journal, trapp (2016) found the proportion of articles with zero citations was 37% for web of science, 29% for scopus, and 19% for google scholar. some of google scholar’s superiority for citation analysis in the social sciences and humanities is due to its inclusion of book content, software, and additional journals (prins et al. 2016; bornmann et al. 2016). bornmann et al. (2016) noted citations to all ten of a research institute’s ten books published in 2009 were found in google scholar, whereas web of science found citations for only two books. furthermore they found data in google scholar for 55 of the total of 71 of the institute’s book chapters. for the four conference proceedings they could identify in google scholar, there were 100 citations, of which 65 could be found in google scholar. the comparative success of google scholar for citation impact varies by discipline, however: (levay et al. 2016) found web of science to be more reliable than google scholar, quicker for an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 28 downloading results, and better for retrieving 100% of the most important publications in public health. despite google scholar’s growth, using all three major tools (scopus, web of science, and google scholar) still seems to be necessary for evaluating researcher productivity. rothfus (2016) compared web of science, scopus, and google scholar citation counts for evaluating the impact of the canadian network of observational drug effect studies (cnodes), as represented by a sample of 222 citations from five articles. attempting to determine citation metrics for the cnodes research team yielded different results for every article when using the three tools. they found that “using three tools (web of science, scopus, google scholar) to determine citation metrics as indicators of research performance and impact provided varying results, with poor overall agreement among the three” (237). major academic libraries’ web sites often explain how to find one’s h-index in all three (suiter and moulaison 2015). researchers have also noted the disadvantages of google scholar for citation impact studies. google scholar is costly in terms of researcher time. levay et al. (2016) estimated the cost of “administering results” from web of science to be 4 hours versus 75 hours for google scholar. administering results includes using the search tool to search, download, and add records to bibliographic citation software, and removing duplicate citations. duplicate citations are often mentioned as a problem (prins et al. 2016), although moed (2016) suggested the double counting by google scholar would occur only if the level of analysis is on target sources, not if it is on target articles.15 downloaded citation samples can still suffer from double counts, however: harzing and alakangas described how cleaning “a fairly extreme case” in their study reduced the number of papers from 244 to 106 (2016b). google scholar also does not identify self-citations, which can dramatically influence the meaning of results (prins et al. 2016). furthermore, researchers have shown it is possible to corrupt google scholar citations by uploading obviously false documents (delgado lópez-cózar, robinson-garcía, and torres-salinas 2014).while the researchers noted traditional citation indexes can also be defrauded, google’s products are less transparent and abuses may not be easily detected. google did not respond to the research team when contacted and simply deleted the false documents to which it had been alerted without reporting the situation to the affected authors, and the researchers concluded: “this lack of transparency is the main obstacle when considering google scholar and its by-products for research evaluation purposes” (453). because these disadvantages do not outweigh google scholar’s seemingly broader coverage, many articles investigate workarounds for using google scholar more effectively when evaluating 15 “if a document is, for instance, first published in arxiv, and a next version later in a journal j, citations to the two versions are aggregated. in google scholar metrics, in which arxiv is included as a source, this document (assuming that its citation count exceed the h5 value of arxiv and journal j) is listed both under arxiv and under journal j, with the same, aggregate citation count (moed 2016, 29). information technology and libraries | june 2017 29 research impact. harzing and alakangas (2016b) recommend the hi index16, which is corrected for career length and co-authorship patterns, as the citation metric of choice for a fair comparison of google scholar with other tools. bornmann et al. (2016) investigated a method to normalize data and reduce errors when using google scholar data to evaluate citations in the social sciences and humanities. researcher profiles can also be used to find other scholars by topic. in a 2014 survey of researchers (n=8,554), dagienė and krapavickaitė found that 22% used a third-party service such as google scholar or microsoft academic to produce lists of their scholarly activities and 63% reported their scholarly record was freely available on the web (2016, 158, 161). google scholar ranked only second to microsoft word as the most frequently used software to maintain academic activity records (160). martín-martín et al. (2016b) examined 814 authors in the field of bibliometrics using google scholar citations, researcherid, researchgate, mendeley, and twitter. google scholar was the most used social research sharing platform, followed by researchgate, with researcherid gaining wider acceptance among authors deemed “core” to the field. only about one-third of the authors created a twitter profile, and many mendeley and researcherid profiles were found empty. the study found google scholar academic profiles’ distinctive advantages to be automatic updates and its high growth rate, with disadvantages of scarce quality control, inherited metadata mistakes from google scholar, and its manipulatability. overall, martin-martin and colleagues concluded that google scholar “should be the preferred source for relational and comparative analyses in which the emphasis is put on author clusters” (57). google scholar metrics provides citation information for articles and journals. in a sample of 1,000 journals, orduña-malea and delgado lópez-cózar found that “despite all the technical and methodological problems,” google scholar metrics provides sound and reliable journal rankings (2014, 2365). google scholar metrics seems to be an annual publication; the 2016 edition contains 5,734 publications and 12 language rankings. russian, korean, polish, ukranian, and indionesian were added this year, while italian and dutch were removed for unknown reasons (martín-martín et al. 2016a). researchers also found that many discussion papers and working papers were removed in 2016. english-language publications are broken into subject areas and disciplines. google scholar metrics often, but not always creates separate entries for each language in which a journal is published. bibliometricians call for google scholar metrics to display the total number of documents published in the publications indexed and the total number of citations received: “these are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator” (13). adding country and language of publication and self-citation rates are among the other improvements listed by lopez-cozar and colleagues. 16 harzing and alakangas (2016b) define the hia as the hi norm/academic age. academic age refers to the number of years elapsed since first publication. to calculate hi norm, one divides the number of citations by the number of authors for that paper, and then calculates the h-index of the normalized citation count. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 30 informing practice the glaring lack of research related to the coverage of arts and humanities scholarship, limited research on book coverage, and relaunch of microsoft academic make it impossible to form a general recommendation regarding the use of academic web search engines for serious research. until the ambiguity of arts and humanities coverage is clarified, and until academic web search engines are transparent and stable, traditional bibliographic databases still seem essential for systematic reviews, citation analysis, and other rigorous literature search purposes. disciplinespecific databases also have features such as controlled vocabulary, industry classification codes, and peer review indicators that make scholars more efficient and effective. nevertheless, the increasing relevance of academic search engines and solid coverage of sciences and social sciences make it essential for librarians to become as expert with google scholar, google books, and microsoft academic. for some scholarly tasks, academic search engines may be superior: for example, when looking up doi numbers for this paper’s bibliography, the most efficient process seemed to be a google search on the article title plus the term “doi,” and the most likely site to display in the results was researchgate.17 librarians and scholars should champion these tools as an important part of an efficient, effective scholarly research process (walsh 2015), while also acknowledging the gaps in coverage, biases, metadata issues and missing features available in other databases. academic web search engines could form the centerpiece for instruction sessions surrounding the scholarly network, as shown by “cited by” features, author profiles, and full-text sources. traditional abstracts and indexes could then be presented on the basis of their strengths. at some point, explaining how to access full text will likely no longer focus on the link resolver but on the many possible document versions a user might encounter (e.g. pre-prints or editions of books) and how to make an informed choice. in the meantime, even though web search engines and repositories may retrieve copious full text outside library subscriptions, college students should still be made aware of the library’s collections and services such as interlibrary loan. when considering google scholar’s weaknesses, it’s important to keep in mind chen’s observation that we may not have a tool available that does any better (antell et al. 2013). while google scholar may be biased toward english-language publications, so are many bibliographic databases. overall, google scholar seems to have increased the visibility of international research (bartol and mackiewicz-talarczyk 2015). while google scholar’s coverage of grey literature has been shown to be somewhat uneven (bonato 2016; haddaway et al. 2015), it seems to include more diversity among relevant document types than many abstracts and indexes (ştirbu et al. 2015; bartol and mackiewicz-talarczyk 2015). although the rigors of systematic reviews may contraindicate the tool’s use as a single source, it adds value to search results from other databases (bramer, giustini, and kramer 2016a). user preferences and priorities should also be taken into account; google 17 because the authority of researchgate is ambiguous, in such cases i then looked up the doi using google to find the publisher’s version. in some cases, the doi was not displayed on the publisher’s result page (e.g., https://muse.jhu.edu/article/197091). information technology and libraries | june 2017 31 scholar results have been said to contain “clutter,” but many researchers have found the noise in google scholar tolerable given its other benefits (ştirbu et al. 2015). google books purportedly contains about 30 million items, focused on u.s.-published and englishlanguage books. but its coverage is hit-or-miss, surprising mays (2015) with an unexpected wealth of primary sources but disappointing harper (2016) with limited coverage of academic health sciences books. recent court decisions have enabled google to continue progressing toward their goal of full-text indexing and making snippet views available for the google-estimated universe of 130 million books, which suggests its utility may increase. google books is not integrated with link resolvers or discovery tools but has been found useful for providing information about scholarly research impact, especially for the arts, humanities, and social sciences. as re-launched in 2016, microsoft academic shows real potential to compete with google scholar in coverage and utility for finding journal articles. as of february 2017 its index contains 120 million citations. in contrast to the mystery of google scholar’s black-box algorithms and restrictive limitations, microsoft academic uses an open-system approach and offers an api. microsoft academic appears to have less coverage of books and grey literature compared with google scholar. research is badly needed about the coverage and utility of both google books and microsoft academic. google scholar continues to evolve, launching a new algorithm for known-item searching in 201618 that appears to work very well. google scholar does not reveal how many items it searches but studies have suggested 160 million documents have been indexed. studies have shown the google scholar relevance algorithm to be heavily influenced by citation counts and language of publication. google scholar has been so heavily researched and is such a “black box” that more attention would seem to have diminishing returns, except in the area of coverage of and utility for arts and humanities research. librarians may find these takeaways useful for working with or teaching google scholar: • little is known about coverage of arts and humanities by google scholar. • recent studies repeatedly find that in the sciences and social sciences google scholar covers as much if not more than library databases, has more recent coverage, and frequently provides access to full text without the need for library subscriptions. • although the number of studies is limited, google scholar seems excellent at retrieving known scholarly items compared with discovery tools. • using proper accent marks in the title when searching for non-english language items appears to be important. 18 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 32 • finding full text for non-english journal articles may require searching google scholar in the original language. • while google scholar may include results from google books, it appears both tools should be used rather than assuming google books will appear in google scholar. • while google scholar does include grey literature, these results do not usually rank highly. • google scholar and google must both be used to effectively search across institutional repository content. • free full text may be buried underneath the “all x versions” links because the publisher’s web site is usually the dominant version presented to the user. the right-hand column links may help ameliorate this situation, but not reliably. • google scholar is well-known in most academic communities and used regularly; however, it is seldom the only tool used, with scholars continuing to use other web search tools, library abstracts and indexes, and published web sites as well. • experts in writing systematic reviews recommend google scholar be included as a search tool along with traditional abstracts and indexes, using software to record the search process and results. • for evaluating research impact, google scholar may be superior to web of science or scopus, but using all three tools still seems necessary. • as with any database, citation metadata should be verified against the publisher’s data; with google scholar, publication dates should receive deliberate attention. • when google scholar covers some of a major publisher’s content, that does not imply it covers all of that publisher’s content. • google scholar metrics appears to provide reliable journal rankings. research agenda this review of the literature also provides direction for future research concerning academic web search engines. because this review focused on 2014-2016, researchers may need to review studies from earlier periods for methodological ideas and previous findings, noting that dramatic changes in search engine coverage and behavior can occur within only a few years.19 across the studies, some general best practices were observed. when comparing the coverage of academic web search engines, their utility for establishing research impact, or other bibliometric studies, researchers should strongly consider using software such as publish or perish, and to design their research approach with previous methodologies in mind. information scientists have charted a set of clear disciplinary methods; there is no need to start from scratch. even when 19 for example ştirbu found that google scholar overlapped georef by 57% and 62% (ştirbu et al. 2015, 328), compared with a finding by neuhaus in 2006 where scholar overlapped with georef by 26% (2006, 133). information technology and libraries | june 2017 33 performing a large-scale quantitative assessment such as (kousha and thelwall 2015), manually examining and discussing a subset of the sample seems helpful for checking assumptions and for enhancing the meaning of the findings to the reader. some researchers examined the “top 20” or “top 10” results qualitatively (kousha and thelwall 2015), while others took a random sample from within their large-study sample (kousha, thelwall, and rezaie 2011). academic search engines for arts and humanities research research into the use of academic web search engines within arts and humanities fields is sorely needed. surveys show humanities scholars use both google and google scholar (inger and gardner 2016; kemman, kleppe, and scagliola 2013; van noorden 2014). during interviews of 20 historians by martin and quan-haase (2016) concerning serendipity, five mentioned google books and google scholar as important for recreating serendipity of the physical library online. almost all arts and humanities scholars search the internet for researchers and their activities, and commonly expressed the belief that having a complete list of research activities online improves public awareness (dagienė and krapavickaitė 2016). mays’s (2015) practical advice and the few recent studies on citation impact of google books for these disciplines point to the enormous potential for this tool’s use. articles describing opportunities for new online searching habits of humanities scholars have not always included google scholar (huistra and mellink 2016). wu and chen’s interviews with humanities graduate students suggested their behavior and preferences were different from science and technology students, doing more known-item searching and struggling with “semantically ambiguous keywords” that retrieved irrelevant results (2014, 381). platform preferences seem to have a disciplinary aspect: hammarfelt’s (2014) investigation of altmetrics in the humanities suggests mendeley and twitter should be included along with google scholar when examining citation impact of humanities research, while a 2014 nature survey suggests researchgate is much less popular in the social sciences and humanities than in the sciences (van noorden 2014). in summary, arts and humanities scholars are active users of academic web search engines and related tools, but their preferences and behavior, and the relative success of google scholar as a research tool cannot be inferred from the vast literature focused on the sciences. advice from librarians and scholars about the strengths and limitations of academic web search engines in these fields would be incredibly useful. specific examples of needed research, and related studies to reference for methodological ideas: • similar to the studies that have been done in the sciences, how well do academic search engines cover the arts and humanities? an emphasis on formats important to the discipline would be important (prins et al. 2016). • how does the quality of search results compare between academic search engines and traditional library databases for arts and humanities topics? to what extent can the user usefully accomplish her task? (ruppel 2009)? an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 34 • to what extent do academic search engines support the research process for scholarship distinctive to arts and humanities disciplines (e.g. historiographies, review essays)? • in academic search engines, how visible is the arts and humanities literature found in institutional repositories (pitol and de groote 2014)? specific aspects of academic search engine coverage this review suggests that broad studies of academic search engine coverage may have reached a saturation point. however, specific aspects of coverage need additional investigation: • grey literature: although google scholar’s inclusion of grey literature is frequently mentioned as valuable, empirical studies evaluating its coverage are scarce. additional research following the methodology of haddaway (2015) could investigate the bibliographies of literature other than systematic reviews, investigate various disciplines, or use a sample of valuable known items (similar to kousha, thelwall, and rezaie’s (2011) methodology for books). • non-western, non-english language literature: for further investigation of the repeated finding of non-western, non-english language bias (abrizah and thelwall 2014; cavacini 2015), comparisons to library abstracts and indexes would be helpful for providing context. to what extent is this bias present in traditional research tools? hilbert et al. found the coverage of their sample increased for english language in both web of science and scopus, and “to a lesser extent” in google scholar (2015, 260). • books: any investigations of book coverage in microsoft academic and google scholar would be welcome. very few 2014-2016 studies focused on books in google scholar, and even looking in earlier years turned up little research. georgas (2015) compared google with a federated search tool for finding books, so her study may be a useful reference. kousha et al. (2011) found three times as many citations in google scholar than in scopus to a sample of 1,000 academic books. the authors concluded “there are substantial numbers of citations to academic books from google books and google scholar, and it therefore may be possible to use these potential sources to help evaluate research in bookoriented disciplines” (kousha, thelwall, and rezaie 2011, 2157). • institutional repositories: yang (2016) recommended that “librarians of digital resources conduct research on their local digital repositories, as the indexing effects and discovery rates on metadata or associated text files may be different case by case,” and the studies found 2014-2016 show that ir platform and metadata schema dramatically affect discovery, with some irs nearly invisible (weideman 2015; chen 2014; orduña-malea and lópez-cózar 2015; yang 2016) and others somewhat findable by google scholar (lee et al. 2015; obrien et al. 2016). askey and arlitsch (2015) have explained how google scholar’s decisions regarding metadata schema can dramatically affect results.20 libraries who 20 for example, google’s rejection of dublin core. information technology and libraries | june 2017 35 would like their institutional repositories to serve as social sharing platforms for research should consider conducting a study similar to (martín-martín et al. 2016b). finally, a study of ir journal article visibility in academic web search engines could be extremely informative. • full-text retrieval: the indexing coverage of academic search engines relates to the retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication. johnson and simonsen (2015) found that more of the engineering students they surveyed obtained scholarly articles from a free download or getting a pdf from a colleague at another institution than used the library’s subscription. meanwhile, libraries continue to pay for costly subscription resources. monitoring this situation is essential for strategic decision-making. quint (2016) and karlsson (2014) have suggested strategies for libraries and vendors to support broader access to subscription full text through creative licensing and per-item fee approaches. institutional repositories have had mixed results in changing scholars’ habits (both contributors and searchers) but are demonstrably contributing to the presence of full text in the academic search engine experience. when will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions? google books similarly to microsoft academic, google books as a search tool also needs dedicated research from librarians and information scientists about its coverage, utility, and/or adoption. a purposeful comparison with other large digital repositories such as hathitrust (https://www.hathitrust.org) would be a boon to practitioners and the public. while hathitrust is transparent about its coverage (https://www.hathitrust.org/statistics_visualizations), specific areas of google books’ coverage have been called into question. weiss (2016) suggested a gap in google books exists from about 1915-1965 “because many publishers either have let it fall out of print, or the book is orphaned and no one wants to go through the trouble of tracking down the copyright owners” and found that copies in google books “will likely be locked down and thus unreadable, or visible only as a snippet, at best” (303). has this situation changed since the court rulings concerning the legality of snippet view? longitudinal studies in the growth of google books similar to (harzing 2014) could illuminate this and other questions about google books’s ability to deliver content. uneven coverage of content types, geography, and language should be investigated. mays noted a possible geographical imbalance within the united states (mays 2015, 26). others noted significant language and international imbalances, and large disciplinary differences (weiss 2016; abrizah and thelwall 2014; kousha and thelwall 2015). weiss and others suggest the implications of google books’ coverage imbalance have enormous social implications: “google and other [massive digital libraries] have essentially canonized the books they have scanned and contribute to the marginalization of those left unscanned” (301). therefore more holistic quantitative investigations of the types of information in google books and possible skewness an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 36 would be welcome. finally, chen’s study (2012) comparing the coverage of google books and worldcat could be repeated to provide longitudinal information. the utility of google books for research purposes also needs further investigation. books are far more prevalently cited in wikipedia than are research articles (thelwall and kousha 2015a). examining samples of wikipedia articles’ citation lists for the prevalence of google books could reveal how dominant a force google books has become in that space. on a more philosophical level, investigating the ways google books might transform scholarly processes would be useful. szpiech (2014) considered how the google books version of a medieval manuscript transformed his relationship with texts, causing a rupture “produced by my new power to extract words and information from a text without being subject to its order, scale, or authority” (78). he hypothesized readers approach google books texts as consumers, rather than learners, whereby “the critical sense of the gestalt” is at risk of being forgotten” (84). have other researchers in experienced what he describes? microsoft academic given the stated openness of microsoft’s new academic web search engine,21 the closed nature of google scholar, and the promising findings of bibliometricians (harzing 2016b; harzing and alakangas 2016a), librarians and information scientists should embark on a thorough review of microsoft academic with similar enthusiasm to which they approached google scholar. the search engine’s coverage, utility for research, and suitability for bibliometric analysis22 all need to be examined. microsoft academic’s abilities for supporting scholarly social networking would also be of interest, perhaps using ward et al. (2015) as a theoretical groundwork. the tool’s coverage and utility for various disciplines and research purposes is a wide-open field for highly useful research. professional and instructional approaches based on user research to inform instructional approaches, more study on user behavior is needed, perhaps repeating herrera’s (2011) study with google scholar and microsoft academic. in light of the recent focus on graduate students, research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome. using an interview or focus group generates exploratory findings that could be tested through surveys with a larger, more representative sample of the population of interest. studying searching behaviors has been common; can librarians design creative studies to investigate reading, engagement, and reflection when web search engines are used as part of the process? is there a way to study whether the “matthew effect” (antell et al. 2013, 281), the aging citation 21 microsoft’s faq says the company is “adopting an open approach in developing the service, and we invite community participation. we like to think what we have developed is a community property. as such, we are opening up our academic knowledge as a downloadable dataset” and offers the academic knowledge api (https://www.microsoft.com/cognitive-services/en-us/academic-knowledge-api). 22 see jacsó (2011) for methodology. information technology and libraries | june 2017 37 phenomenon (verstak et al. 2014; martín-martín et al. 2016a; davis and cochran 2015), or other epistemological hypotheses are influencing scholarship patterns? a bold study could be performed to examine differences in quality outcomes between samples of students using primarily academic search engines versus traditional library search tools. exploratory studies in this area could begin by surveying students about their use of search tools for research methods courses or asking them to record their research process in a journal, and correlating the findings with their grades on the final research product. three specific areas of user research needed are the use of scholarly social network platforms, researcher profiles, and the influence of these on scholarly collaboration and research (ward, bejarano, and dudás 2015, 178); the performance of google’s relatively new known-item search23 (compared with microsoft academic’s known-item search abilities), and searching in non-english languages. regarding the latter, albarillo’s (2016) method which he applied to library databases could be repeated with google scholar, microsoft academic, and google books. finally, to continue their strong track record as experts in navigating the landscape of digital scholarship, librarians need to research assumptions regarding best practices for scholarly logistics. for example, searching google for article titles plus the term “doi,” then scanning the results list for researchgate was found by this study’s author to most efficiently provide doi numbers: but is this a reliable approach? does researchgate have sufficient accuracy to be recommended as the optimal tool for this task? what is the most efficient way for a scholar to locate full text for a citation? are academic search engines’ bibliographic citation management software export tools competitive with third-party commercial tools such as refworks? another area needing investigation is the visibility of links to free full text in google scholar. pitol and degroote found that 70% percent of the items in their study had at least one free full-text version available through a “hidden” google scholar version (2014, 603), and this author’s work on this review article indicates this problem still exists — but to what extent? also, when free full text exists in multiple repositories (e.g. researchgate, digital commons, academic.edu), which are the most trustworthy and practically useful for scholars? librarians should discuss the answers to these questions and be ready to provide expert advice to users. conclusion with so many users opting to use academic web search engines for research, librarians need to investigate the performance of microsoft academic, google books, and of google scholar for the arts and humanities, and to re-think library services and collections in light of these tools’ strengths and limitations. the evolution of web indexing and increasing free access to full text should be monitored in conjunction with library collection development. to remain relevant to 23 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 38 modern researchers, librarians should continue to strengthen their knowledge of and expertise with public academic web search engines, full-text repositories, and scholarly networks. bibliography abrizah, a., and mike thelwall. 2014. "can the impact of nonwestern academic books be measured? an investigation of google books and google scholar for malaysia." journal of the association for information science & technology 65 (12): 2498-2508. https://doi.org/10.1002/asi.23145. albarillo, frans. 2016. "evaluating language functionality in library databases." international information & library review 48 (1): 1-10. https://doi.org/10.1080/10572317.2016.1146036. antell, karen, molly strothmann, xiaotian chen, and kevin o’kelly. 2013. "cross-examining google scholar." reference & user services quarterly 52 (4): 279-282. https://doi.org/10.5860/rusq.52n4.279. asher, andrew d., lynda m. duke, and suzanne wilson. 2012. "paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources." college & research libraries 74(5):464-488. https://doi.org/10.5860/crl374. askey, dale, and kenning arlitsch. 2015. "heeding the signals: applying web best practices when google recommends." journal of library administration 55 (1): 49-59. https://doi.org/10.1080/01930826.2014.978685. authors guild. "authors guild v. google." accessed january 1, 2016, https://www.authorsguild.org/where-we-stand/authors-guild-v-google/. bartol, tomaž, and maria mackiewicz-talarczyk. 2015. "bibliometric analysis of publishing trends in fiber crops in google scholar, scopus, and web of science." journal of natural fibers 12 (6): 531. https://doi.org/10.1080/15440478.2014.972000. boeker, martin, werner vach, and edith motschall. 2013. "google scholar as replacement for systematic literature searches: good relative recall and precision are not enough." bmc medical research methodology 13 (1): 1. bonato, sarah. 2016. "google scholar and scopus for finding gray literature publications." journal of the medical library association 104 (3): 252-254. https://doi.org/10.3163/15365050.104.3.021. bornmann, lutz, andreas thor, werner marx, and hermann schier. 2016. "the application of bibliometrics to research evaluation in the humanities and social sciences: an exploratory study using normalized google scholar data for the publications of a research institute." information technology and libraries | june 2017 39 journal of the association for information science & technology 67 (11): 2778-2789. https://doi.org/10.1002/asi.23627. boumenot, diane. "printing a book from google books." one rhode island family. last modified december 3, 2015, accessed january 1, 2017. https://onerhodeislandfamily.com/2015/12/03/printing-a-book-from-google-books/. bøyum, idunn, and svanhild aabø. 2015. "the information practices of business phd students." new library world 116 (3): 187-200. https://doi.org/10.1108/nlw-06-2014-0073. bramer, wichor m., dean giustini, and bianca m. r. kramer. 2016. "comparing the coverage, recall, and precision of searches for 120 systematic reviews in embase, medline, and google scholar: a prospective study." systematic reviews 5(39):1-7. https://doi.org/10.1186/s13643-016-0215-7. cals, j. w., and d. kotz. 2016. "literature review in biomedical research: useful search engines beyond pubmed." journal of clinical epidemiology 71: 115-117. https://doi.org/10.1016/j.jclinepi.2015.10.012. carlson, scott. 2006. "challenging google, microsoft unveils a search tool for scholarly articles." chronicle of higher education 52 (33). cavacini, antonio. 2015. "what is the best database for computer science journal articles?" scientometrics 102 (3): 2059-2071. https://doi.org/10.1007/s11192-014-1506-1. chen, xiaotian. 2012. "google books and worldcat: a comparison of their content." online information review 36 (4): 507-516. https://doi.org/10.1108/14684521211254031. ———. 2014. "open access in 2013: reaching the 50% milestone." serials review 40 (1): 21-27. https://doi.org/10.1080/00987913.2014.895556. choong, miew keen, filippo galgani, adam g. dunn, and guy tsafnat. 2014. "automatic evidence retrieval for systematic reviews." journal of medical internet research 16 (10): 1-1. https://doi.org/10.2196/jmir.3369. ciccone, karen, and john vickery. 2015. "summon, ebsco discovery service, and google scholar: a comparison of search performance using user queries." evidence based library & information practice 10 (1): 34-49. https://ejournals.library.ualberta.ca/index.php/eblip/article/view/23845. conrad, lettie y., elisabeth leonard, and mary m. somerville. 2015. "new pathways in scholarly discovery: understanding the next generation of researcher tools." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. https://pdfs.semanticscholar.org/3cb1/315476ccf9b443c01eb9b1d175ae3b0a5b4e.pdf. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 40 dagienė, eleonora, and danutė krapavickaitė. 2016. "how researchers manage their academic activities." learned publishing 29(3):155-163. https://doi.org/10.1002/leap.1030. davis, philip m., and angela cochran. 2015. "cited half-life of the journal literature." arxiv preprint arxiv:1504.07479. https://arxiv.org/abs/1504.07479. delgado lópez-cózar, emilio, nicolás robinson-garcía, and daniel torres-salinas. 2014. "the google scholar experiment: how to index false papers and manipulate bibliometric indicators." journal of the association for information science & technology 65 (3): 446-454. https://doi.org/10.1002/asi.23056. erb, brian, and rob sica. 2015. "flagship database for literature searching or flelpful auxiliary?" charleston advisor 17 (2): 47-50. https://doi.org/10.5260/chara.17.2.47. fagan, jody condit, and david gaines. 2016. "take charge of eds: vet your content." presentation to the ebsco users' group, boston, ma, may 10-11. gehanno, jean-françois, laetitia rollin, and stefan darmoni. 2013. "is the coverage of google scholar enough to be used alone for systematic reviews." bmc medical informatics and decision making 13 (1): 1. https://doi.org/10.1186/1472-6947-13-7. georgas, helen. 2015. "google vs. the library (part iii): assessing the quality of sources found by undergraduates." portal: libraries and the academy 15 (1): 133-161. https://doi.org/10.1353/pla.2015.0012. giustini, dean, and maged n. kamel boulos. 2013. "google scholar is not enough to be used alone for systematic reviews." online journal of public health informatics 5 (2). https://doi.org/10.5210/ojphi.v5i2.4623. gray, jerry e., michelle c. hamilton, alexandra hauser, margaret m. janz, justin p. peters, and fiona taggart. 2012. "scholarish: google scholar and its value to the sciences." issues in science and technology librarianship 70 (summer). https://doi.org/10.1002/asi.21372/full. haddaway, neal r. 2015. "the use of web-scraping software in searching for grey literature." grey journal 11 (3): 186-190. haddaway, neal robert, alexandra mary collins, deborah coughlin, and stuart kirk. 2015. "the role of google scholar in evidence reviews and its applicability to grey literature searching." plos one 10 (9): e0138237. https://doi.org/10.1371/journal.pone.0138237. hammarfelt, björn. 2014. "using altmetrics for assessing research impact in the humanities." scientometrics 101 (2): 1419-1430. https://doi.org/10.1007/s11192-014-1261-3. hands, africa. 2012. "microsoft academic search – http://academic.research.microsoft.com." technical services quarterly 29 (3): 251-252. https://doi.org/10.1080/07317131.2012.682026. information technology and libraries | june 2017 41 harper, sarah fletcher. 2016. "google books review." journal of electronic resources in medical libraries 13 (1): 2-7. https://doi.org/10.1080/15424065.2016.1142835. harzing, anne-wil. 2013. "a preliminary test of google scholar as a source for citation data: a longitudinal study of nobel prize winners." scientometrics 94 (3): 1057-1075. https://doi.org/10.1007/s11192-012-0777-7. ———. 2014. "a longitudinal study of google scholar coverage between 2012 and 2013." scientometrics 98 (1): 565-575. https://doi.org/10.1007/s11192-013-0975-y. ———. 2016a. publish or perish. vol. 5. http://www.harzing.com/resources/publish-or-perish. ———. 2016b. "microsoft academic (search): a phoenix arisen from the ashes?" scientometrics 108 (3): 1637-1647.https://doi.org/10.1007/s11192-016-2026-y. harzing, anne-wil, and satu alakangas. 2016a. "microsoft academic: is the phoenix getting wings?" scientometrics: 1-13. harzing, anne-wil, and satu alakangas. 2016b. "google scholar, scopus and the web of science: a longitudinal and cross-disciplinary comparison." scientometrics 106 (2): 787-804. https://doi.org/10.1007/s11192-015-1798-9. herrera, gail. 2011. "google scholar users and user behaviors: an exploratory study." college & research libraries 72 (4): 316-331. https://doi.org/10.5860/crl-125rl. higgins, julian, and s. green, eds. 2011. cochrane handbook for systematic reviews of interventions. version 5.1.0 ed.: the cochrane collaboration. http://handbook.cochrane.org/. hilbert, fee, julia barth, julia gremm, daniel gros, jessica haiter, maria henkel, wilhelm reinhardt, and wolfgang g. stock. 2015. "coverage of academic citation databases compared with coverage of scientific social media." online information review 39 (2): 255-264. https://doi.org/10.1108/oir-07-2014-0159. hoffmann, anna lauren. 2014. "google books as infrastructure of in/justice: towards a sociotechnical account of rawlsian justice, information, and technology." theses and dissertations. paper 530. http://dc.uwm.edu/etd/530/. ———. 2016. "google books, libraries, and self-respect: information justice beyond distributions." the library 86 (1). https://doi.org/10.1086/684141. horrigan, john b. "lifelong learning and technology." pew research center, last modified march 22, 2016, accessed february 7, 2017, http://www.pewinternet.org/2016/03/22/lifelonglearning-and-technology/. hug, sven e., michael ochsner, and martin p. braendle. 2016. "citation analysis with microsoft academic." arxiv preprint arxiv:1609.05354.https://arxiv.org/abs/1609.05354. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 42 huistra, hieke, and bram mellink. 2016. "phrasing history: selecting sources in digital repositories." historical methods: a journal of quantitative and interdisciplinary history 49 (4): 220-229. https://doi.org/10.1093/llc/fqw002. inger, simon, and tracy gardner. 2016. "how readers discover content in scholarly publications." information services & use 36 (1): 81-97. https://doi.org/10.3233/isu-160800. jackson, joab. 2010. "google: 129 million different books have been published." pc world, august 6, 2010. http://www.pcworld.com/article/202803/google_129_million_different_books_have_been_pu blished.html. jacsó, p. 2008. "live search academic." peter’s digital reference shelf, april. jacsó, péter. 2011. "the pros and cons of microsoft academic search from a bibliometric perspective." online information review 35 (6): 983-997. https://doi.org/10.1108/14684521111210788. jamali, hamid r., and majid nabavi. 2015. "open access and sources of full-text articles in google scholar in different subject fields." scientometrics 105 (3): 1635-1651. https://doi.org/10.1007/s11192-015-1642-2. johnson, paula c., and jennifer e. simonsen. 2015. "do engineering master's students know what they don't know?" library review 64 (1): 36-57. https://doi.org/10.1108/lr-05-2014-0052. jones, edgar. 2010. "google books as a general research collection." library resources & technical services 54 (2): 77-89. https://doi.org/10.5860/lrts.54n2.77. karlsson, niklas. 2014. "the crossroads of academic electronic availability: how well does google scholar measure up against a university-based metadata system in 2014?" current science 107 (10): 1661-1665. http://www.currentscience.ac.in/volumes/107/10/1661.pdf. kemman, max, martijn kleppe, and stef scagliola. 2013. "just google it-digital research practices of humanities scholars." arxiv preprint arxiv:1309.2434. https://arxiv.org/abs/1309.2434. khabsa, madian, and c. lee giles. 2014. "the number of scholarly documents on the public web." plos one 9 (5): https://doi.org/10.1371/journal.pone.0093949 kirkwood jr., hal, and monica c. kirkwood. 2011. "historical research." online 35 (4): 28-32. koler-povh, teja, primož južnic, and goran turk. 2014. "impact of open access on citation of scholarly publications in the field of civil engineering." scientometrics 98 (2): 1033-1045. https://doi.org/10.1007/s11192-013-1101-x. kousha, kayvan, mike thelwall, and somayeh rezaie. 2011. "assessing the citation impact of books: the role of google books, google scholar, and scopus." journal of the american society information technology and libraries | june 2017 43 for information science and technology 62 (11): 2147-2164. https://doi.org/10.1002/asi.21608. kousha, kayvan, and mike thelwall. 2017. "are wikipedia citations important evidence of the impact of scholarly articles and books?" journal of the association for information science and technology. 68(3):762-779. https://doi.org/10.1002/asi.23694. kousha, kayvan, and mike thelwall. 2015. "an automatic method for extracting citations from google books." journal of the association for information science & technology 66 (2): 309320. https://doi.org/10.1002/asi.23170. lee, jongwook, gary burnett, micah vandegrift, hoon baeg jung, and richard morris. 2015. "availability and accessibility in an open access institutional repository: a case study." information research 20 (1): 334-349. levay, paul, nicola ainsworth, rachel kettle, and antony morgan. 2016. "identifying evidence for public health guidance: a comparison of citation searching with web of science and google scholar." research synthesis methods 7 (1): 34-45. https://doi.org/10.1002/jrsm.1158. levy, steven. "making the world’s problem solvers 10% more efficient." backchannel. last modified october 17, 2014, accessed january 14, 2016, https://medium.com/backchannel/the-gentleman-who-made-scholar-d71289d9a82d. los angeles times. 2016. "google, books and 'fair use'." los angeles times, april 19, 2016. http://www.latimes.com/opinion/editorials/la-ed-google-book-search-20160419-story.html martin, kim, and anabel quan-haase. 2016. "the role of agency in historians’ experiences of serendipity in physical and digital information environments." journal of documentation 72 (6): 1008-1026. https://doi.org/10.1108/jd-11-2015-0144. martín-martín, alberto, juan manuel ayllón, enrique orduña-malea, and emilio delgado lópezcózar. 2016a. "2016 google scholar metrics released: a matter of languages... and something else." arxiv preprint arxiv:1607.06260. https://arxiv.org/abs/1607.06260. martín-martín, alberto, enrique orduña-malea, juan m. ayllón, and emilio delgado lópez-cózar. 2016b. "the counting house: measuring those who count. presence of bibliometrics, scientometrics, informetrics, webometrics and altmetrics in the google scholar citations, researcherid, researchgate, mendeley & twitter." arxiv preprint arxiv:1602.02412. https://arxiv.org/abs/1602.02412. martín-martín, alberto, enrique orduña-malea, juan manuel ayllón, and emilio delgado lópezcózar. 2014. "does google scholar contain all highly cited documents (1950-2013)?" arxiv preprint arxiv:1410.8464. https://arxiv.org/abs/1410.8464. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 44 martín-martín, alberto, enrique orduña-malea, juan ayllón, and emilio delgado lópez-cózar. 2016c. "back to the past: on the shoulders of an academic search engine giant." scientometrics 107 (3): 1477-1487. https://doi.org/10.1007/s11192-016-1917-2. martín-martín, alberto, enrique orduña-malea, anne-wil harzing, and emilio delgado lópezcózar. 2017. "can we use google scholar to identify highly-cited documents?" journal of informetrics 11 (1): 152-163. https://doi.org/10.1016/j.joi.2016.11.008. mays, dorothy a. 2015. "google books: far more than just books." public libraries 54 (5): 23-26. http://publiclibrariesonline.org/2015/10/far-more-than-just-books/ meier, john j., and thomas w. conkling. 2008. "google scholar’s coverage of the engineering literature: an empirical study." the journal of academic librarianship 34 (3): 196-201. https://doi.org/10.1016/j.acalib.2008.03.002. moed, henk f., judit bar-ilan, and gali halevi. 2016. "a new methodology for comparing google scholar and scopus." arxiv preprint arxiv:1512.05741.https://arxiv.org/abs/1512.05741. namei, elizabeth, and christal a. young. 2015. "measuring our relevancy: comparing results in a web-scale discovery tool, google & google scholar." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/namei_young.pdf national institute for health and care excellence (nice). "developing nice guidelines: the manual." last modified april 2016, accessed november 27, 2016. https://www.nice.org.uk/process/pmg20. neuhaus, chris, ellen neuhaus, alan asher, and clint wrede. 2006. "the depth and breadth of google scholar: an empirical study." portal: libraries and the academy 6 (2): 127-141. https://doi.org/10.1353/pla.2006.0026. obrien, patrick, kenning arlitsch, leila sterman, jeff mixter, jonathan wheeler, and susan borda. 2016. "undercounting file downloads from institutional repositories." journal of library administration 56 (7): 854-874. https://doi.org/10.1080/01930826.2016.1216224. orduña-malea, enrique, and emilio delgado lópez-cózar. 2014. "google scholar metrics evolution: an analysis according to languages." scientometrics 98 (3): 2353-2367. https://doi.org/10.1007/s11192-013-1164-8. orduña-malea, enrique, and emilio delgado lópez-cózar. 2015. "the dark side of open access in google and google scholar: the case of latin-american repositories." scientometrics 102 (1): 829-846. https://doi.org/10.1007/s11192-014-1369-5. orduña-malea, enrique, alberto martín-martín, juan m. ayllon, and emilio delgado lópez-cózar. 2014. "the silent fading of an academic search engine: the case of microsoft academic information technology and libraries | june 2017 45 search." online information review 38(7):936-953. https://doi.org/10.1108/oir-07-20140169. ortega, josé luis. 2015. "relationship between altmetric and bibliometric indicators across academic social sites: the case of csic's members." journal of informetrics 9 (1): 39-49. https://doi.org/10.1016/j.joi.2014.11.004. ortega, josé luis, and isidro f. aguillo. 2014. "microsoft academic search and google scholar citations: comparative analysis of author profiles." journal of the association for information science & technology 65 (6): 1149-1156. https://doi.org/10.1002/asi.23036. pitol, scott p., and sandra l. de groote. 2014. "google scholar versions: do more versions of an article mean greater impact?" library hi tech 32 (4): 594-611. https://doi.org/0.1108/lht05-2014-0039. prins, ad a. m., rodrigo costas, thed n. van leeuwen, and paul f. wouters. 2016. "using google scholar in research evaluation of humanities and social science programs: a comparison with web of science data." research evaluation 25 (3): 264-270. https://doi.org/10.1093/reseval/rvv049. quint, barbara. 2016. "find and fetch: completing the course." information today 33 (3): 17-17. rothfus, melissa, ingrid s. sketris, robyn traynor, melissa helwig, and samuel a. stewart. 2016. "measuring knowledge translation uptake using citation metrics: a case study of a pancanadian network of pharmacoepidemiology researchers." science & technology libraries 35 (3): 228-240. https://doi.org/10.1080/0194262x.2016.1192008. ruppel, margie. 2009. "google scholar, social work abstracts (ebsco), and psycinfo (ebsco)." charleston advisor 10 (3): 5-11. shultz, m. 2007. "comparing test searches in pubmed and google scholar." journal of the medical library association : jmla 95 (4): 442-445. https://doi.org/10.3163/1536-5050.95.4.442. stansfield, claire, kelly dickson, and mukdarut bangpan. 2016. "exploring issues in the conduct of website searching and other online sources for systematic reviews: how can we be systematic?" systematic reviews 5 (1): 191. https://doi.org/10.1186/s13643-016-0371-9. ştirbu, simona, paul thirion, serge schmitz, gentiane haesbroeck, and ninfa greco. 2015. "the utility of google scholar when searching geographical literature: comparison with three commercial bibliographic databases." the journal of academic librarianship 41 (3): 322-329. https://doi.org/10.1016/j.acalib.2015.02.013. suiter, amy m., and heather lea moulaison. 2015. "supporting scholars: an analysis of academic library websites' documentation on metrics and impact." the journal of academic librarianship 41 (6): 814-820. https://doi.org/10.1016/j.acalib.2015.09.004. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 46 szpiech, ryan. 2014. "cracking the code: reflections on manuscripts in the age of digital books." digital philology: a journal of medieval cultures 3(1): 75-100. https://doi.org/10.1353/dph.2014.0010. testa, matthew. 2016. "availability and discoverability of open-access journals in music." music reference services quarterly 19 (1): 1-17. https://doi.org/10.1080/10588167.2016.1130386. thelwall, mike, and kayvan kousha. 2015b. "web indicators for research evaluation. part 1: citations and links to academic articles from the web." el profesional de la información 24 (5): 587-606.https://doi.org/10.3145/epi.2015.sep.08. thielen, frederick w., ghislaine van mastrigt, l. t. burgers, wichor m. bramer, marian h. j. m. majoie, sylvia m. a. a. evers, and jos kleijnen. 2016. "how to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3)." expert review of pharmacoeconomics & outcomes research: 1-17. https://doi.org/10.1080/14737167.2016.1246962. trapp, jamie. 2016. "web of science, scopus, and google scholar citation rates: a case study of medical physics and biomedical engineering: what gets cited and what doesn't?" australasian physical & engineering sciences in medicine. 39(4): 817-823. https://doi.org/10.1007/s13246-016-0478-2. van noorden, r. 2014. "online collaboration: scientists and the social network." nature 512 (7513): 126-129. https://doi.org/10.1038/512126a. varshney, lav r. 2012. "the google effect in doctoral theses." scientometrics 92 (3): 785-793. https://doi.org/10.1007/s11192-012-0654-4. verstak, alex, anurag acharya, helder suzuki, sean henderson, mikhail iakhiaev, cliff chiung yu lin, and namit shetty. 2014. "on the shoulders of giants: the growing impact of older articles." arxiv preprint arxiv:1411.0275. https://arxiv.org/abs/1411.0275. walsh, andrew. 2015. "beyond "good" and "bad": google as a crucial component of information literacy." in the complete guide to using google in libraries, edited by carol smallwood, 3-12. new york: rowman & littlefield. waltman, ludo. 2016. "a review of the literature on citation impact indicators." journal of informetrics 10 (2): 365-391. https://doi.org/10.1016/j.joi.2016.02.007. ward, judit, william bejarano, and anikó dudás. 2015. "scholarly social media profiles and libraries: a review." liber quarterly 24 (4): 174–204.https://doi.org/10.18352/lq.9958. weideman, melius. 2015. "etd visibility: a study on the exposure of indian etds to the google scholar crawler." paper presented at etd 2015: 18th international symposium on electronic theses and dissertations, new delhi, india, november 4-6. http://www.web information technology and libraries | june 2017 47 visibility.co.za/0168-conference-paper-2015-weideman-etd-theses-dissertation-india-googlescholar-crawler.pdf. weiss, andrew. 2016. "examining massive digital libraries (mdls) and their impact on reference services." reference librarian 57 (4): 286-306. https://doi.org/10.1080/02763877.2016.1145614. whitmer, susan. 2015. "google books: shamed by snobs, a resource for the rest of us." in the complete guide to using google in libraries, edited by carol smallwood, 241-250. new york: rowman & littlefield. wildgaard, lorna. 2015. "a comparison of 17 author-level bibliometric indicators for researchers in astronomy, environmental science, philosophy and public health in web of science and google scholar." scientometrics 104 (3): 873-906. https://doi.org/10.1007/s11192-015-1608-4. winter, joost, amir zadpoor, and dimitra dodou. 2014. "the expansion of google scholar versus web of science: a longitudinal study." scientometrics 98 (2): 1547-1565. https://doi.org/10.1007/s11192-013-1089-2. wu, tim. 2015. "whatever happened to google books?" the new yorker, september 11, 2015. wu, ming-der, and shih-chuan chen. 2014. "graduate students appreciate google scholar, but still find use for libraries." electronic library 32 (3): 375-389. https://doi.org/10.1108/el-082012-0102. yang, le. 2016. "making search engines notice: an exploratory study on discoverability of dspace metadata and pdf files." journal of web librarianship 10 (3): 147-160. https://doi.org/10.1080/19322909.2016.1172539. editorial board thoughts: libraries as makerspace? tod colegrove information technology and libraries | march 2013 2 recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. a number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, i can report that reactions to the associated changes have been quite polarized. those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. i suspect the polarization may be more a function of confusion over what makerspace actually is. this piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. been happening for thousands of years . . . dale dougherty, founder of make magazine and maker faire, at the “maker monday” event of the 2013 american library association midwinter meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” more than one audience member may have been surprised when he continued, “it’s already been happening for hundreds of years—maybe thousands.”1 the o’reilly/darpa makerspace playbook describes the overall goals and concept of makerspace (emphasis added): “by helping schools and communities everywhere establish makerspaces, we expect to build your makerspace users' literacy in design, science, technology, engineering, art, and math. . . . we see making as a gateway to deeper engagement in science and engineering but also art and design. makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. in effect, a makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 building users’ literacies across multiple domains and a gateway to deeper engagement? surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. a familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. youmedia network, funded by the macarthur patrick “tod” colegrove (pcolegrove@unr.edu), a lita member, is head of the delamare science & engineering library at the university of nevada, reno, nevada. mailto:pcolegrove@unr.edu editorial board thoughts: libraries as makerspace? | colegrove 3 institute through the institute of museum and library services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” a more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. as recently as the 1980s, the singular option the library had was to maintain a collection of print texts. through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying cd or dvd media, saving the reader the effort of manually typing in code examples. the associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. with the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. after all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. makerspace: a continuum along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. it is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis computer numerical controlled (cnc) milling machines, and 3d printers. in contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. as opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. information technology and libraries | march 2013 4 it is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. can it be a library at the same time? to some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. with hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. renaissance in addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • a child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. the american libraries association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the united states today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. editorial board thoughts: libraries as makerspace? | colegrove 5 references 1. dale dougherty, “the new stacks: the maker movement comes to libraries” (presentation at the midwinter meeting of the american library association, seattle, washington, january 28, 2013). http://alamw13.ala.org/node/10004. 2. michele hlubinka et al., makerspace playbook, december 2012, accessed february 13, 2012, http://makerspace.com/playbook. 3. alex soojung-kim pang, "if libraries did not exist, it would be necessary to invent them," contemplative computing, february 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-benecessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html expanding and improving our library’s virtual chat service: discovering best practices when demand increases article expanding and improving our library’s virtual chat service discovering best practices when demand increases parker fruehan and diana hellyar information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13117 parker fruehan (fruehanp1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. diana hellyar (hellyard1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. © 2021. abstract with the onset of the covid-19 pandemic and the ensuing shutdown of the library building for several months, there was a sudden need to adjust how the hilton c. buley library at southern connecticut state university (scsu) delivered its services. overnight, the library’s virtual chat service went from a convenient way to reach a librarian to the primary method by which library patrons contacted the library for help. in this article, the authors will discuss what was learned during this time and how the service has been adjusted to meet user needs. best practices and future improvements will be discussed. background the buley library started using springshare's libchat service in january 2015. the chat service was accessible as a button in the header of all the library webpages, and the wording would change depending on the availability of a librarian. at buley library, the chat service is only staffed by our faculty librarians. there were other chat buttons on various individual libguides for either specific librarians or for the general library chat. chat was monitored at the research & information desk by the librarian on duty. the first librarian of the day would log into the shared chat account on the reference desk computer. while each librarian had their own account, using a shared account meant that the librarians could easily hand off a chat interaction during a shift change. while the reference desk was typically busy, librarians would only receive a small number of chats per day. between 2015 and 2019, the library saw an average of 250 chats per year. due to the low usage, there was little focus on libchat training for librarians. for more complicated questions, librarians would often recommend that chat users call, email, or schedule an in-person appointment. since libchat was only monitored while librarians were at the reference desk, it was easy to let it become a secondary mode of reference interaction, particularly if there was a surge of in-person reference questions at any given time. due to the covid-19 pandemic, the library quickly shifted from mostly in-person to solely online services. suddenly, libchat was the virtual reference desk and the main mode of patron interaction. despite this change in how the library interacted with the campus, there was only a slight increase in chat usage in the first two months of the closure. in april 2020, we started to explore our options with libchat in the hopes of increasing visibility and usage. mailto:fruehanp1@southernct.edu mailto:hellyard1@southernct.edu information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 2 evaluating chat widget options considering technical implementation the publicly accessible chat interface is made available completely within a webpage, requiring no clients, external applications, or plugins to make it functional. springshare calls this component the libchat widget, and provides a prepackaged set of website code necessary to create the chat interface. within the libchat system there are a few options for widget placement and presentation. at the time of writing, springshare offers four widget types in its libchat product: in-page chat, button pop-out, slide-out tab, and floating.1 when the service is offline, the system replaces the chat interface with a link to library faqs and the option to submit a question for follow-up. at buley library, prior to the covid-19 pandemic shutdown, the button pop-out was the main widget type used to enter a chat session (see fig. 1). figure 1. previous library website header with chat pop-out button in upper right-hand corner. the pop-out button works by opening a separate pop-up window with the chat interface. this allows the user to navigate to other pages in the previous window without disconnecting from the session. one challenge to the pop-up window method is that many web browsers block pop-up windows by default, requiring a user to recognize and override this setting. another option used mainly on librarian profiles and subject guides is the in-page chat, which embeds the chat interface directly on an existing webpage. many times, these chat widgets are connected to a particular user rather than the queue monitored by all librarians. the user will interact with the chat operator in this dedicated section of the webpage. if a user navigates to a different page in the same window or tab it will disconnect from the chat session. these widget options are easiest when considering web design expertise and time commitment involved in implementation. both the button pop-out and in-page chat can be accomplished with a user having access to a what you see is what you get, or wysiyg, editor on the webpage and the ability to copy and paste a few lines of html code. it does not require any custom then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers 290 a computer-accessed microfiche library r. g. j. zimmermann: department of engineering-economic systems, stanford university, stanford, california. at the time this article was written, the author was a member of the technical staff, space photography laboratory, california institute of technology, pasadena, california. this paper describes a user-interactive system for the selection and display of pictorial information stored on microfiche cards in a computej'controlled viewer. the system is designed to provide rapid access to photographic and graphical data. it is intended to provide a library of photogmphs of planetary bodies and is currently being used to sto1·e selected martian and lunar photogmphy. introduction information is often most usefully stored in pictorial form. photography, for example, has become an important means of recording data, especially in the sciences. a major reason for this importance is that photographs can be used to record information collected by instruments and not normally observable by the unaided eye. such photographs, especially in large quantities, may present a barrier to their use because of the inconvenience of reproducing and handling them. it is apparent that a system to compactly store and to speed access to these photographs would be very useful. such a system, utilizing a microfiche viewer directly controlled by a user-interactive computer program, has been developed to support a library of photographs taken from space. in the past fifteen years, the national aeronautics and space administration has conducted many missions to photograph planetary bodies. these missions have provided millions of pictures of the earth, moon, and mars. a large number of additional pich1res are expected to be taken in the near future. the space photography laboratory of the california institute of technology is establishing, under nasa auspices, a microfiche library of a selection of these photographs. the library currently contains the photographs of mars taken by the mariner 9 spacecraft as well as lunar photographs taken by the lunar orbiter series. the library is expected to be expanded as time and resources permit. it has been operating, with various versions of the control program, since june 1972. the program is: currently being further developed by mr. david neff and miss laura hormicrofiche libraryjzimmermann 291 ner of the space photography laboratory at the california institute of technology. hardware the photographs are kept on 105-by-148mm microfiche cards, sixty frames to a card. this format provides the least reduction of any standard microfiche format and was used to retain the highest possible resolution. the cards are displayed by a microfiche viewer (image systems, culver city, california) which can store up to about 700 cards and has the capability of selecting a card and displaying any frame on it within a maximum of about four seconds. (throughout this paper, "viewer" will be used to refer to the microfiche viewing device. ) the viewer can be equipped with a computer interface which allows the picture display to be directly computer controlled. an installation consists of the viewer with interface, any standard input/output ( ijo) terminal, and the control program, running, in this case, on a time-shared computer. the terminal is used for communication with the control program. the user enters all commands by typing on the terminal keyboard. the viewer is designed to be plugged in between the computer and i/0 terminal. the computer transmits all information on the circuit to which normally (without the viewer) only the terminal is attached. this information includes the viewer picture display control codes which are recognized and intercepted by the viewer. all other information is passed on to the terminal. no further special equipment is necessary. the system described has been implemented on a digital equipment corporation system 10 medium-scale computer with a time-sharing operating system. the program is written mainly in fortran with some assembly language subroutines. it runs in 12k words ( 36 bits /word) of core memory. the program will not run without conversion on any computer other than the dec system 10. software the control program is user-interactive, that is, it accepts information and commands from the user. these commands allow him to indicate what he desires and to control the action taken by the program. the program permits the user to indicate what characteristics he wishes the pictures to have, selects the pictures that satisfy his criteria, and then allows him to control the display of the selected pictures and to obtain any additional information he may need to interpret the pictures. to guide the user, instructions for use of the system, as well as other infonnation the user may need, are displayed on the viewer as they are required. all user responses are extensively checked for validity. any uninterpretable response is rejected with a message indicating the source of the trouble, and may be reentered in corrected form. it is always possible to return to a previous state, so it is impossible to make a "catastrophic" error. in designing the 292 journal of librat'y automation vol. 7/4 december 1974 system, particular attention was paid to integrating the viewer and computer to utilize the unique capabilities of each. for example, most instructions are presented on the viewer where they can be shown quickly and can be scanned easily by the user. only short messages need to be sent and received by the i/0 terminal. data base a picture is described by a number of characteristics, called parameters. for every picture stored in the viewer, the value for each of these parameters is stored in a disc file. in this application, parameters are mainly used to describe characteristics that are available without analyzing the picture for content. in science, these are the experimental conditions-such as viewing and lighting conditions for space photography. because space photographs are taken by missions with different objectives and equipment, it was necessary to design a library system to include pictures with widely varying selection characteristics. in order to accommodate sets of pictures with widely differing characteristics, without wasting storage space or requiring the elimination of useful descriptors, the computer storage has been structured to allow pictures to be grouped into picture sets, each of which is described by its own set of parameters. conversely, any group of pictures for which the same selection parameters are used forms a picture set. the characteristics of each such set of pictures are also stored and the program reconfigures itself to these characteristics whenever a new picture set is encountered. such an organization allows the control program to be used on groups of totally different kinds of pictures. opemtion in selecting a picture set the user is guided along a series of decisions presented on the viewer. at each step the control program directs the viewer to display a frame with a set of possible choices. the user enters his response on the i/0 terminal and the control program uses this response to determine which frame the viewer should be commanded to display next. when the user has selected a set, he is shown the available parameters and apppropriate values for these parameters. after he has specified acceptable values for the parameters he is interested in, the computer program compares these values with the known values in its records for the picture set. the pictures selected by the program are then available for display. as will be described, the user may, at any time, select another picture set or change his parameter specifications. he may also indicate which pictures of those selected by the computer during the comparison search he wishes to have remain available after the next comparison search. this allows comparison of pictures in different picture sets. appendix 1 shows an example of a typical search. the action of the control program can be separated into five phases of microfiche library/zimmermann 293 operation, each with a distinct function. the functions of three of these phases involve user interaction. transfer between phases may also be accomplished by user command. a different group of commands is employed for each of the user-interactive phases. in addition, there is a group of commands which may be used any time a user response is requested; they are listed in appendixes 3 and 4. there are no required commands or sequences of commands. the user proceeds from one phase to another as he desires. in each phase allowing user interaction, the user can enter any valid command at any time. figure 1 shows the phases and possible transfers between phases. a more detailed description of what occurs in each phase will be given after the data organization is described. picture set selection parameter specification search optimization comparison search picture display and information access bold lines enclose user-interactive phases. arrows indicate possible directions of control transfer; bold arrows are control transfers made by user commands. fig. 1. phases and control transfers. description of software data base organization as has been stated, the pictures of the library are grouped into picture sets. the data base may contain any number of picture sets. each such set has a picture file associated with it. this picture file is on disc storage and 294 journal of library automation vol. 7/4 december 1974 contains all the known information stored for a set of pictures. each picture in the set has an associated picture record in the file. in addition, the first record in a picture file, known as the format record, contains all the file specific information about that file. whenever a new picture file is called for, the format record for that file is read from disc storage into main memory and kept for reference. figure 2 shows the organizational structure of the data base. picture files (as many as required) format record picture records / ~.___i ~i ......_i ~if }ij fig. 2. picture file organization, picture records consist of a fixedand a variable-length portion. the variable-length portion contains the known values, for the associated picture, of the specification parameters. since the number of parameters, can vary from file to file, the length of this portion varies from file to file. (however, all picture records within a particular file have the same length and form.) the maximum number of parameters for a system is determined by array dimensions set when the program is compiled. currently these dimensions are set for a maximum of fifty parameters for any file in the system. the fixed-length portion contains (generally) the same type of information for all files. it includes the information needed to display a picture and to obtain interpretive information. when, during the comparison search, a picture is selected on the basis of information in the variable data, the fixed-length portion is copied into a table and kept for use during the picture display phase. each selected picture is represented by an entry in this table. the contents of the fixed-length portion are presented in table 1. as an example, the contents of a picture record for the mariner 9 photographs are given in appendix 5. a picture file's format record describes the file by all characteristics that are allowed to vary from file to file. the format records for all picture files have the same form; each is divided into a number of fields supplying information for a particular function. these fields can be separated into two categories: those which describe the picture records and those which apply to the file as a whole. for fields of the first type, each parameter has an enb·y in the field. for example, one such field contains the location, in microfiche librm·y /zimmermann 295 table 1. the fixed-length portion of a picture record field use fiche code file name picture number unit number id number auxiliary codes ( 3 fields) control code output by the control program to the viewer to display the frame associated with this picture record. the file name of the picture file; this and the picture number uniquely identify the picture record and allow it, and specifically the contents of the variable portion, to be refound. a sequence number assigned each picture record in the file in increasing order. the viewer that the picture associated with this picture record is stored in. the identification number referred to by the user. if the picture has been given an id number by which it is commonly known, it will be kept in this field. viewer control codes for frames containing different versions of, or auxiliary data for, the picture. the actual contents of these fields vary with the picture file as determined from the contents of the format record of that file. a picture record, of the value for each of the parameters. another field has a ten-letter description of each parameter. see appendix 2 for a description of the format field. operation of the control progmm the following is a brief technical description of the control program; detailed documentation is available. the control program is modularly constructed. each phase consists of a major subroutine and its subsidiary subroutines. at the completion of a phase, control is transferred to a main program which determines which phase is to be performed next and transfers control to it. the user-interactive (interrogation) subroutines ask for a user response, attempt to interpret the response and perform the desired function, then ask for another response. an important subroutine used by all the interrogation subroutines collects the characters of the user response into groups of similar characters to form alphabetic keywords, numbers, punctuation marks, relational operators, etc. when an interrogation subroutine is ready for a user request, it calls this "scanning" subroutine. the scanning subroutine outputs an asterisk, indicating it is ready, to the user i/0 terminal. the scanning subroutine supplies the groups of characters, along with a description of the group, to the interrogation subroutine. the interrogation subroutine then attempts to interpret the character groups by comparing them with acceptable responses. if the response is not in one of the acceptable forms, an error message is given to the user and he can try again. the error message includes an indication of where the error was found and describes the error. some commands do not need to be interpreted by the interrogation subroutines; the function they request is the same throughout the program. these are called immediate commands and are listed in appendix 3. these 296 journal of library automation vol. 7/4 december 1974 commands are interpreted, and their functions performed, by the scanning subroutine. picture set selection in selecting a picture set the user is asked to make a series of decisions. for each decision, a frame listing the possible choices is displayed on the viewer. all possible decisions form an inverted tree structure (see figure 3). the user may also return to a previous decision point. the tree structure is implemented in a table in computer storage. there is an entry in this table corresponding to each decision point in the tree. when a decision a martian aa orbital. aaa aab aac flyby :1-iariner hariner nariner iv vi, ix 0 u vii ab sur£ace viking b lunar ba orbital approach baa apollo hand held bab apollo metric bac apollo pan bad lunar orbiter bae ranger bb surface bba apollo bbb surveyor c venus flyby d mercury flyby fig. 3, example of a tree. microfiche libmryjzimmermann 297 is made, the entry corresponding to the new decision point is obtained. an entry at the bottom of the· tree identifies the picture file associated with the picture set selected. in general, an entry contains: ( 1) the viewer control code of the frame displaying the choices; ( 2) a pointer to the entry from which this node was reached; ( 3) the number of possible decisions which can be made at this decision point (to check for valid decisions); and ( 4) pointers to the entries for the decision points reached. parameter specification once the user has made a decision selecting a set of pictures, he is presented with a list of the available parameters and acceptable values for them. for each parameter in which the user is interested, he specifies the parameter number and the values or range of values acceptable to him. this information is stored in two tables which are referred to when the comparison search is made. one table, the parameter table, contains an entry for each parameter specified. this table is cleared whenever a new picture set is called for. an entry in the table includes: ( 1) the parameter number; ( 2) a code indicating which of several methods is to be used in processing the parameter; ( 3) a code providing information on how the user-specified values are to be interpreted; and ( 4) a pointer to the location in a second table, the values table, where the first of the specified values is stored. all additional values are placed in the values table following the addressed value. the processing code (number ( 2) above) allows each parameter to be processed by a unique method. a standard method for a given parameter is kept in a field of the format record. the user can also specify a method other than the standard one. if an entry already exists for a just-entered parameter, the old entry is updated rather than a new one created. search optimization this phase determines the most efficient way to conduct the comparison search from among a set of alternatives. whenever possible, the search is restricted to only a part of the picture file. for each picture file there is a number of parameters for which additional information is available. specifically, if a list of pictures ordered by increasing value of a parameter is available, the pictures which have a particular value of that parameter can be found more quickly through this list than by searching through the whole file for that value of the parameter. if the position, in this ordered list, of the picture at the low end of a range of values (of the parameter it is ordered on) can be found easily, the search can be started at this point and need only be continued until the picture at the high end has been reached. note that the picture records for the intervening pictures must nonetheless be compared with the user specifications since the restriction is only made on the basis of one parameter whereas more than one may have been specified. 298 ]oumal of library automation vol. 7/4 december 1974 a binary search is the method used to search the list for the first picture in a range of values. to use this method, of a set of n picture records the n / 2th is chosen and its value of the parameter is compared with the desired one. since the list of records is in order of the value of this parameter, it is clear in which half of the list a picture with the desired value of the· parameter would have to be. this interval can then be divided and the process continued until the remaining interval consists of only one picture. the main picture :file is itself usually arranged in order of at least one parameter. for other parameters, control lists of picture numbers ordered by value of these parameters can be used for binary searches. however, it is not practical to create these lists for all parameters as they require a fair amount of storage. an entry in such a list contains two words, the value of the parameter and the picture number of the corresponding picture. picture number is a sequence number which determines the position of the picture record relative to the beginning of the picture :file. each picture file has a table in its format record containing identifiers for the parameters for which the binary search technique can be used. if more than one of these has been specified (as stored in the parameter table), it must be determined which parameter restricts the search the most. to do this the upper and lower limits of the specified values of each such parameter are found (from the values table), and from this the expected number of picture records to be compared is computed. this number is multiplied by a factor indicating the speed of the type of search to be used relative to the speed of the simplest type of search. the parameter· with the lowest expected elapsed time of search is selected for the search. comparison search for each picture to be compared, the appropriate picture record is found and specified parameter values are compared with those in the picture record. a control list, selected in the search optimization phase, may be used to determine which picture records are to be compared. for each selected picture an entry containing a portion of the picture record is made in a picture table. the picture table has a limited capacity which is set when the program is compiled. for our application there is currently room for up to 100 entries. if the picture table is filled before the search is finished, the search is suspended and can be continued by a command in the display phase. picture display, information access this phase accepts commands to conb·ol display of the selected pictures and provide access to interpretive information. the picture table entries provide the information needed, either directly or by referring back to the picture record. any of the selected pictures can be viewed at any time. in addition, the user can "mark" preferred pictures to differentiate them from the others. these marked pictures are set apart in the sense that microfiche library/zimmermann 299 many. viewing and information access commands refer optionally to only these pictures. the pictures themselves are the primary source of information, but the user will often want information that is not available from the picture in order to interpret the picture. there are commands that request the control program to type out on the i/0 terminal the information in a picture record. these commands optionally refer to the picture currently displayed, the marked pictures, or all the selected pictures. other commands call for the display of data frames associated with a picture. these frames can contain large volumes of data that need not be kept in computer storage. the viewer control codes for these frames are kept in the picture table. the keyword commands to display data frames can vary from file to file. the valid commands for a file are kept in the file's format record. there are other commands to transfer control to other phases and to keep desired pictures available for display with those selected by the next comparison search. there is also a provision for adding file specific commands to perform any other function. the commands and their functions are listed in appendix 3. performance and costs a typical simple search consisting of logging in, picture selection, parameter specification, search, and display might take five to ten minutes and cost one to two dollars for compute time. most of this is time spent by the user in entering commands. command execution is usually almost immediate as it does not involve a major amount of computation. most of the compute time is accumulated during the comparison search phase. to search through the entire mariner 9 picture file of around 7,000 pictures (about 200,000 words) takes about forty seconds elapsed time and costs about two dollars. a more typical search, however, will allow some search optimization and cost about thirty cents with an elapsed time of ten seconds. of course, these figures should only be used as estimates, even for other dec system 10 systems, as elapsed time depends on system load and this, as well as the rates charged, varies considerably. total monthly compute costs for a system depend entirely on use. likewise, storage costs depend on actual storage space used. for the 200,000-word mariner 9 file our cost is about seventy-five dollars per month. only the most-used picture files actually need be kept on disc; the rest can be copied from magnetic tape if they are needed. all files are backed up on magnetic tape in any case. the rates listed in this paper are those charged by our campus time-sharing system. dec system 10 computer time is available from commercial firms at somewhat higher rates. the cost for a microfiche viewer with computer interface (image systems, culver city, california, model 201) is around $7,000. a thirty-characters-per-second i/0 terminal sells for $1,500 and leases for $90 per month. in addition, an installation may require a microfiche camera and 300 ]oumal of library autonwtion vol. 7/4 december 1974 other photographic equipment and supplies. photographic services are also available from the viewer manufacturer. the hardware cost for an independent system implemented on a minicomputer with 12k to 20k of core and five million words of disc memory is estimated at an additional $30,000 (exclusive of development and photographic costs). implementing a library system in implementing a library system to use the hardware and software described in this paper, two major areas of effort are required. first, the pictorial information must be converted to microfiche format; that is, it must be photographed, or possibly rephotographed if already in photographic form. in addition, a computer data base must be created. if information about the photographs is already available in computer-readable form, this involves writing a program to convert the data to the structure required by the control program. if this type of information is not available, the pictures may need to be investigated and the information coded, and presumably punched onto computer cards, for further processing. the major difficulties we encountered were coordinating the photographic and data base generation tasks, achieving the high resolution we required to retain the detail of the original photographs, and in using early versions of the microfiche viewer (which had a tendency to jam cards). conclusion a system for rapid access to pictorial information, the computer accessed microfiche library ( caml), has been described. caml has been designed to integrate, in an easy-to-use system, the storage capacity and capability for fast retrieval of a special microfiche viewer with the manipulating ability and speed of a computer. it is believed that this system will help overcome the barriers to the full utilization of photographs in large quantities, as well as have applications in the retrieval of other types of pictorial information. acknowledgments the work described in this paper was supported by nasa grant #ngr 05-002-117. the author is grateful to dr. bruce murray and the staff of the space photography laboratory at caltech for their support and advice; he also wishes to acknowledge the efforts of mr. james fuhrman, who assisted in the programming task and contributed many valuable ideas. appendix 1 the following is an example of a typical search. numbers in the left margin indicate when a new frame is displayed on the viewer. these were added later to clarify the interaction between viewer and terminal. user responses and commands are identified by lines beginning with an asterisk. (the control program types asterisks when it mic1'ofiche libra1'yjzlmmermann 301 is ready for input.) in this demonstration, most keywords were completely typed out. it is possible, however, to abbreviate any keyword to the shortest form that will be unique among the acceptable keywords. after the user enters a standard "log in" procedure to identify his account number and verify that he is an authorized user of this account, the control program is automatically initiated. the viewer displays a picture ( 1) of the installation and the user is asked to enter his name. the name, charges, and time of use will later be added . log 9::::;:94-···t·h·h-j job 13 caltech 506b sys~em tty?? po=t·::s~~ord: 1930 27-aug-74 tue ltd start, please enter your name det1dtetrat i otl 2enter name of file desired •~1r·1 i::·:: 3please type in parameters and their values type "done" when you have finished +orbit 222 +canera a •latitude -45 to 45 +specifications parameters from file mmix orbit 222 dr latitude -45.00 to 45.00 cat·1ef.'a a dr +done ·--?3:::2 pictuf.:es: to pf.:ocess, please lo.iait 2 pictures have been selected 4this is the fif.:st pictuf.:e the fdlloi!jitlg pictupes are ff.:om file t·hh~'!. •• 1 please entef.' commands • 5this is the last pictuf.:e 2 •t·1af.:k +type parat-!eters t·1arked paf.:at-!etef.: key for file mt-!ix das time orbit latitude phase angl viewing an slant rang local time filter exposur tm lont3itude camera rdllfile ~ 2, file mmix , id = 9557769 base pict. reader o, 2-e-2 1-a data reader 1), 2-e-2 5-a fdotp reader o, 2-e-2 5-k parameter values: ·~557769 222 :3€ .. 70 140.70 48.18 14.85 29:37 a 15.29 sn,:rt i~4 425606:3 no cm1~1ents solar angl f.:esolution 60.26 2.29 302 journal of library automation vol. 7/4 december 1974 to an accounting file. the user now enters the picture set selection phase. in the current system, only two files (picture sets) are stored and the user is simply presented with a frame ( 2) listing the file names and giving a short description of what is contained in each. the user types the desired file name (mmix-mariner 9 mars photographs) and thus enters the parameter specification phase. the available selection parameters and acceptable values are now shown ( 3) . the user specifies some param+e:>(am i t·ie 1 marked pictures haye been selected 6this is the first marked picture this is the last marked picture the following pictures are from file mmix 2 +respec if~' %warning--original search parameters are still in effect 7 pleas&type ih parameters and their values type "done" when you haye finished +restart 8 e~iter name of file des i red +orbit 9 please type in parameters and their values t'r'pe "done" i,_ihen you have finished +charge:s: $ 0.5:3 10 •help ll+iden "> 5196 • +this is a~ error ++error++: ~ld such ~:eyi .• jdrd--please rehpe litle +do 1022 pictures to process, please ~jait 22 pictures have been selected 12thjs is the first picture the follobiing pictures are from file orbit .. 1 please enter commands +type parameters specified parameter key for file orbit idetl a 1• file orbit• id = 5196 parameter. values: +5 13 :~ 6 5196 +type parameters latitude, longitude, resolution parameter key for file orbit latitude longitude resolution a 6, file orbit• id = 5201 parameter values: 24.48 -47.27 2.90 please turn off viei.~er• terminal• and coupler jc:s 13 [98"394, mnnj legged cff tty77 1948 27-a•y;-74 ji.s:i n2, ... nn) 306 journal of library automation vol. 7/4 december 1974 if not used, file name is assumed to refer to the file last searched. if the parameters are not enumerated, those specified for the picture selection are typed out. the parameters to be typed out can be enumerated or the specification parameters called for. if neither of these is done, the values of all parameters are typed out. parameters typed out are identified by column headings. phase transfer commands function respecify allows respecification of selection parameters-only those parameters which are reentered are changed; previously specified parameters retain their values. search similar to respecify, except only those pictures in the present list are candidates for selection. this is more efficient than again searching through all the pictures. continue if the search was terminated before all pictures had been processed, the search is continued from where it had been suspended. restart to view another set of pictures (all specified parameter values are deleted) . field number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23-28 appendix5 mariner 9 picture records field fixed-length portion fiche code data code file name id number (das) unit# picture number footprint code unused variable portion das time orbit latitude longitude solar lighting angle phase angle viewing angle slant range camera resolution local time filter exposure time role and file of filter version on roll film comments (content descriptors) 168 techniques for special processing of data within bibliographic text paula goossens: royal library albert i, brussels, belgium. an analysis of the codification practices of bibliographic desc1'iptions reveals a multiplicity of ways to solve the p1'oblem of the special processing of ce1tain characters within a bibliographic element. to obtain a clem· insight i'nto this subfect, a review of the techniques used in different systems is given. the basic principles of each technique are stated, examples am given, and advantages and disadvantages are weighed. simple local applications as well as more ambitious shared cataloging p1'0jects are considered. introduction effective library automation should be based on a one-time manual input of the bibliographic descriptions, with multiple output functions. these objectives may be met by introducing a logical coding technique. the higher the requirements of the output, the more sophisticated the storage coding has to be. in most cases a simple identification of the bibliographic elements is not sufficient. the requirement of a minimum of flexibility in filing and printing operations necessitates the ability to locate certain groups of characters within these elements. it is our aim, in this article, to give a review of the techniques solving this last problem. as an introduction, the basic bibliographic element coding methods are roughly schematized in the first section. according to the precision in the element identification, a distinction is made between two groups, called respectively field level and sub:field level systems. the second section contains discussions on the techniques for special processing of data within bibliographic text. three basic groups are treated: the duplication method, the internal coding techniques, and the automatic handling techniques. the different studies are illustrated with examples of existing systems. for the field level projects we confined ourselves to some important german and belgian applications. in the choice of the subfield level systems, which are marc ii based, we tried to be more complete. most of the cited applications, for practical reasons, only concern the treatment of monographs. this cannot be seen as a limitation because the methods discussed are very techniques for special processing/ goossens 169 general by nature and may be used for other material. each system which has recourse to different special processing techniques is discussed in terms of each of these techniques, enabling one to get a realistic overview of the problem. in the last section, a table of the systems versus the techniques used is given. the material studied in this paper provided us with the necessary background for building an internal coding technique in our internal processing format. bibliographic element codification methods field level systems the most rudimentary projects of catalog automation are limited to a coarse division of the bibliographic description into broad fields. these are marked by special supplied codes and cover the basic elements of author, title, imprint, collation, etc. in some of the field level systems, a bibliographic element may be further differentiated according to a more specific content designation, or according to a function identification. for instance, the author element can be split up into personal name and corporate name, or a distinction can be made between a main entry, an added entry, a reference, etc. this approach supports only the treatment of each identified bibliographic element as a whole for all necessary processing operations, filing and printing included. this explains why, in certain applications, some of the bibliographic elements are duplicated, under a variant form, according to the subsequent treatments reflected in the output functions. details on this will be discussed later. here we only mention as an example the deutsche bibliographie and the project developed at the university of bochum.l-4 it is evident that these procedures are limited in their possibilities and are not economical if applied to very voluminous bibliographic files. for this reason, at the same time, more sophisticated systems, using internal coding techniques, came into existence. these allow one to perform separate operations within a bibliographic element, based on a special indication of certain character strings within the text. as there is an overlap in the types of internal coding techniques used in the field level systems and in the subfield level systems, this problem will later be studied as a whole. we limit ourselves to citing some projects falling under this heading. as german applications we have the deutsche bibliographie and the bikas system. 5 in belgium the programs of the quetelet fonds may be mentioned.6· 7 subfield level systems in a subfield level system the basic bibliographic elements, separated into fields, are further subdivided into smaller logical units called subfields. for instance, a personal name is broken into a surname, a forename, a numeration, a title, etc. such a working method provides access to smaller logical units and will greatly facilitate the functions of extraction, sup170 journal of lihm1·y automation vol. 7/3 september 1974 pression, and transposition. thus, more flexibility in the processing of the bibliographic records is obtained. as is well known, the library of congress accomplished the pioneering work in developing the marc ii format: the communications format and the internal processing format. s-n these will be called marc lc and a distinction between the two will only be made if necessary. the marc lc project originated in the context of a shared cataloging program and immediately served as a model in different national bibliographies and in public and university libraries. in this paper we will discuss bnb marc of the british national bibliography, the nypl automated bibliographic system of the new york public library, monocle of the library of the university of grenoble, canadian marc, and fbr (forma bibliothecae regiae), the internal processing format of the royal library of belgium.l2-21 in order to further optimize the coding of a bibliographic description, the library of congress also provided for each field two special codes, called indicators. the function of these indicators differs from field to field. for example, in a personal name one of the indicators describes the type of name, to wit: forename, single surname, multiple surname, and name of family. some of the indicators may act as an internal code. in spite of the well-considered structuring of the bibliographic data in the subfield level systems, not all library objectives may yet be satisfied. to reduce the remaining limitations, some approaches similar to those elaborated in field level systems are supplied. some ( nypl, marc lc internal fmmat, and canadian marc) have, or will have, in a very limited way, recourse to a procedure of duplication of subfields or fields. all cited systems, except nypl, use to a greater or lesser degree internal coding techniques. finally some subfield level systems automatically solve certain filing problems by computer algorithms. this option was taken by nypl, marc lc, and bnb marc. each of these methods will be discussed in detail in the next section. techniques for special processing of data methods for special treatment of words or characters within bibliographic text were for the most part introduced to suppmt exact file arrangement procedures and printing operations. in order to give concrete form to the following explanation, we will illustrate some complex cases. each example contains the printing form and the filing form according to specific cataloging practices for some bibliographic elements. consider the titles in examples 1, 2, and 3, and the surnames in examples 4, 5, and 6. example 1: l'automation des bibliotheques automation bibliotheques example 2: bulletino della r. accademia medica di roma bolletino accademia medica roma techniques for special processing/ goossens 171 example 3: ibm 360 assembler language i b m three hundred sixty assembler language example 4: me kelvy mackelvy example 5: van de castele v andecastele example 6: martin du card martin dugard we do not intend, in this paper, to review the well-known basic rules for building a sort key (the translation of lowercase characters to uppercase, the completion of numerics, etc.). our attention is directed to the character strings that file differently than they are spelled in the printing form. the methods developed to meet these problems are of a very different nature. for reasons of space, not all the examples will be reconsidered in every case; only those most meaningful for the specific application will be chosen. duplication methods we briefly repeat that this method consists of the duplication of certain bibliographic elements in variant fonns, each of them exactly corresponding to a certain type of treatment. in bochum, the title data are handled in this way. one field, called "sachtitel," contains the filing form of the title followed by the year of edition. another field, named "titelbeschreibung," includes the printing form of the title and the other elements necessary for the identification of a work (statements of authorship, edition statement, imprint, series statement, etc.). to apply this procedure to examples 1, 2, and 3, the different forms of each title respectively have to be stored in a printing field and in a sorting field. analogous procedures are, in a more limited way, employed in the deutsche bibliographie. for instance, in addition to the imprint, the name of the publisher is stored in a separate field to facilitate the creation of publisher indexes. the technique of the duplication of bibliographic elements has also been considered in subfield level systems. the nypl format furnishes a filing subfield in those fields needed for the creation of the sort key. this special subfield is generally created by program, although in exceptional cases manual input may be necessary. in the filing subfield the text is preceded by a special character indicating whether or not the subfield has been introduced manually. marc lc (internal format) and canadian marc opt for a more flexible approach in which the filing information is specified with the same precision as the other information. the sorting data are stored in complete fields containing, among others, the same subfields as the corresponding original field. because in most subfield level systems the number of different fields is much higher than in field level systems, the duplication method becomes more intricate. provision of a separately coded field for each normal field 172 j oumal of library automation vol. 7 i 3 september 197 4 which may need filing information is excluded. only one filing field is supplied, which is repeatable and stored after the other fields. in order to link the sorting fields with the original fields, specific procedures have been devised. marc lc, for instance, reserves one byte per field, the sorting field code, to announce the presence or the absence of a related sorting field. the link between the fields themselves is placed in a special subfield of the filing field. 22 in the supposition that examples 3 and 4 originate from the same bibliographical description, this method may be illustrated schematically as follows: tag 100 245 880 880 sorting field code sequence number x 1 x 1 1 2 data $a$mc kelvy $a$ibm 360 assembler language $ja$1001$mackelvy $ja$2451$i b m three hundred sixty assembler language as is well known, the personal author and title fields are coded respectively as tag 100 and tag 245. tag 880 defines a filing field. in the second column, the letter x identifies the presence of a related sorting field. the third column contains a tag sequence number needed for the unequivocal identification of a field. in the last column the sign ·$ is a delimiter. the first $ is followed by the different subfield codes. the other delimiters initiate the subsequent subfields. in tag 100 and 245, the first subfields contain the surname and the short title respectively. in tag 880 the first subfield gives the identification number of the related original field. the further subfield subdivision is exactly the same as in the original fields. in canadian marc a slightly different approach has been worked out. note that in neither of the last two projects has this technique been implemented yet. for an evaluation of the duplication method different means of application must be considered. if not systematically used for several bibliographic elements, the method is very easy at input. the cataloger can fill in the data exactly as they are; no special codes must be imbedded in the text. but it is easy to understand that a more frequent need of duplicated data renders the cataloging work very cumbersome. in regard to information processing, this method consumes much storage space. first, a certain percentage of the data is repeated; second, in the most complete approach of the subfield level systems, space is needed for identifying and linking information. for instance, in marc lc, one byte per field is provided containing the sorting field code, even if no filing information at all is present. finally, programming efforts are also burdened by the need for special linking procedures. in order to minimize the use of the duplication technique, the cited systems reduce their application in different ways. bochum simplified its cataloging rules in order to limit its use to title information. as will be explained further, the deutsche bibliographie also has recourse to internal techniques for special processing/ goossens 173 coding techniques. nypl, marc lc, and canadian marc only call on it if other more efficient methods (see later) fail. they also make an attempt to adapt existing cataloging practices to an unmodified machine handling of nonduplicated and minimally coded data. intemal coding techniques separators separators are special codes introduced within the text, identifying the characters to be treated in a special way. a distinction can be made among four procedures. 1. simple separators. with this method, each special action to be performed on a limited character string is indicated by a group of two identical separators, each represented as a single special sign. illustration on examples 2, 3, 4, and 6 gives: example 2: £ bolletino £ ¢bulletino della r. ¢accademia medica ¢di ¢roma example 3: £i b m three hundred sixty £¢ibm 360 ¢assembler language example 4: m£a£c¢ ¢kelvy example 6: martin du¢ ¢card the characters enclosed between each group of two corresponding codes £ must be omitted for printing operations. in the same way the characters enclosed between two corresponding codes ¢ are to be ignored in the process of filing. in the case that only the starting position of a special action has to be indicated, one separator is sufficient. for instance, if in example 1 we limit ourselves to coding the first character to be taken into account for filing operations, we have: example 1: l' i automation des bibliotheques where a slash is used as sorting instruction code. the simple separator method has tempting positive aspects. occupying a minimum of storage space (maximum two bytes for each instruction), the technique gives a large range of processing possibilities. indeed, excluding the limitation on the number of special signs available as separators, no other restrictions are imposed. this argument will be rated at its true worth only after evaluation of the multiple function separators method and of the indicator techniques. the major disadvantage of the simple separator method lies in its slowness of exploitation. in fact, for every treatment to be performed, each data element which may contain special codes has to be scanned, character by character, to localize the separators within the text and to enable the execution of the appropriate instructions. for example, in the case of a printing operation, the program has to identify the parts of the text to be considered and to remove all separators. the sluggishness of 17 4 i ournal of library automation vol. 7 i 3 september 197 4 execution was for some, as for canadian marc, a reason to disapprove this method.23 as already mentioned, another handicap with cataloging applications is the loss of a number of characters caused by their use as special codes. it is self-evident that each character needed as a separator cannot be used as an ordinary character in the text. for bochum this was a motive to reject this method. many of the field level systems with internal codes have recourse to simple separators. we mention the deutsche bibliographie, in which some separators indicate the keywords serving for automatic creation of indexes and others give the necessary commands for font changes in photocomposition applications. in order to reduce the number of special signs, the deutsche bibliographie also duplicates certain bibliographic data. bikas uses simple separators for filing purposes. the technique is also employed in subfield level systems. in monocle each title field contains a slash, indicating the first character to be taken into account for filing. 2. multiple function separators. designed by the british, the technique of the multiple function separators was adopted in monocle. the basic idea consists of the use of one separator characteristic for instructing multiple actions. in the case of monocle these actions are printing only, filing only, and both printing and filing. in order to give concrete form to this method we apply it to examples 3, 4, and 6, using a vertical bar as special code. example 3: jibm 360 jib m three hundred sixty jassembler language example 4: mjc jacjkelvy example 6: martin dujjjgard the so-called three-bar filing system divides a data element into the following parts: data to be j data to be i data to be filed and printed j printed only filed only i data to be j filed and printed in comparison with the simple separator technique, this method has the advantage of needing fewer special characters. a gain of storage space cannot be assumed directly. as is the case in example 6, if only one special instruction is needed, the set of three separators must still be used. on the other hand, one must note that a repetition of identical groups of multiple function separators within one data element must be avoided. subsequent use of these codes leads to very unclear representations of the text and may cause faulty data storage. this can well be proved if the necessary groups of three bars are inserted in examples 1 and 2. of the studied systems, monocle is the only one to use this method. 3. separators with indicators. as mentioned in the description of subfield level systems, two indicators are added for each field present. in techniques for special p1'0cessing/ goossens 175 order to speed up the processing time in separator applications, indicators may be exploited. in monocle the presence or the absence of three bars in a subfield is signalled by an indicator at the beginning of the corresponding field. this avoids the systematic search for separators within all the subfields that may contain special codes. the number of indicators being limited, it is self-evident that in certain fields they may already be used for other purposes. as a result, some of the separators will be identified at the beginning of the field and others not. this leads to a certain heterogeneity in the general system concept which complicates the programming efforts. under this heading, we have mentioned the use of indicators only in connection with multiple function separators. note that this procedure could be applied as well in simple separator methods. nevertheless, none of the subfield level systems performs in this fashion because it is not necessary for the particular applications. this method is not followed in the field level systems as no indicators are provided. 4. compound separators. a means of avoiding the second disadvantage of the simple separator technique is to represent each separator by a two-character code: the first one, a delimiter, identifies the presence of the separator and is common to each of them; the second one, a normal character, identifies the separator's characteristic. taking the sign £ as delimiter and indicating the functions of nonprinting and nonfiling respectively by the characters a and b, examples 2 and 4 give in this case : example 2: £ abolletino £ a£ bbulletino della r. £ baccademia medica £ bdi £ broma example 4: m£aa£ac£b £bkelvy thus the number of reserved special characters is reduced to one, independent of the number of different types of separators needed. in none of the considered projects is this technique used, probably because of the amount of storage space wasted. indicators as the concept of adding indicators in a bibliographic record format is an innovation of marc lc, the methods described under this heading concern only subfield level systems. although at the moment of the creation of marc lc one did not anticipate the systematic use of indicators for filing, its adherents made good use of them for this purpose. 1. personal name type indicator. as mentioned earlier, in marc lc one of the indicators, in the field of a personal name, provides information on the name type. this enables one to realize special file arrangements. for example, in the case of homonyms, the names consisting only of a forename can be filed before identical surnames. using the same indicator, an exact sort sequence can be obtained for 176 journal of libmry automation vol. 7/3 september 1974 single surnames, including prefixes. knowing that the printing form of example 5 is a single surname, the program for building the sort key can ignore the two spaces. the systems derived from marc lc developed analog indicator codifications adapted to their own requirements. this seems to be an elegant method for solving particular filing problems in personal names. nevertheless, its possibilities are not large enough to give full satisfaction. for instance, example 6 gives a multiple surname with prefix in the second part of the name. the statement of multiple surname in the indicator does not give enough information to create the exact sort form. because of this shortcoming, monocle had recourse to the technique called "separators with indicators." 2. indicators identifying the beginning of filing text. bnb marc reserves one indicator in the title field for identification of the first character of the title to be considered for filing. this indicator is a digit between zero and nine, giving the number of characters to be skipped at the beginning of the text. applying this technique to example i, the corresponding filing indicator must have the value three. without having recourse to other working methods, this title sorts as: example 1: automation des bibliotheques notice that the article des still remains in the filing form. this procedure has the advantage of being very economical in storage space and in processing time. moreover the text is not cluttered with extraneous characters. on the other hand we must disapprove of the limitation of this technique to the indication of nonfiling words at the beginning of a field. the possibility of identifying certain character strings within the text is not provided for. taking examples 2 and 3 we observe that the stated conditions cannot be fulfilled. another negative side is the number of characters to be ignored, which may not exceed nine. also one indicator must be available for this filing indication. after bnb marc, marc lc and canadian marc also introduced this technique. 3. separators with indicators. the use of indicators in combination with separators has been treated above. pointers a final internal coding technique which seems worth studying is the one developed at the royal library of belgium for the creation of the catalogs of the library of the quetelet fonds, a field level system. the pointer technique is rather intricate at input but has many advantages at output. because there is inadequate documentation of this working method, we will try to give an insight into it by schematizing the procedures to be followed to create the final storage structure. at input, the cataloger intechniques for special p1'dcessing/goossens 177 serts the necessary internal codes as simple separators within the text. these codes are extracted by program from the text and placed before it, at the beginning of each field. each separator, now called pointm· characteristic, is supplemented with the absolute beginning address and the length of its action area within the text. in the quetelet fonds the pointer characteristic is represented by one character, the address and length occupy two bytes each. the complete set of pointers (pointer characteristics, lengths, and addresses ) is named pointer field. this field is incorporated in a sort of directory, starting with the sign "&" identifying the beginning of the field, followed by the length of the directory, the length of the text, and the pointer field itself. this is illustrated in figure 1. note that each field contains the five first bytes, even if no pointers are present. in the quetelet fonds, pointers are used for the following purposes: nonfiling, nonprinting, kwic index, indication of a corporate name in the title of a periodical, etc. examples 2, 3, and 4 should be stored in this system as represented in figure 2. directory text i i pointer field i i i i i 1 i i i i i i i i i i i representation of the structure of a field in the internal processing format of the quetelet fonds system. the codes respectively represent: &: field delimiter; ld: length of directory; lt: length of text; x, y, . . . : pointer characteristics; ax, ay, . . . : addresses of the beginning of the related action area inside the text; lx, ly, ... : length of these action areas. fig. 1. structure of direct01y with pointe1' technique. ' the advantages of the pointer technique are numerous. first, we must mention the relative rapidity of the processing of the records. in fact, in order to detect a specific pointer, only the directory has to be consulted. all subsequent instructions can be executed immediately. in contrast with most of the other methods discussed, there is no objection to using pointers for all internal coding purposes needed. this enables one to pursue homogeneity in the storage format, facilitating the development of programs. further, the physical separation of the internal codes and the text allow, in most cases, a direct clean text representation without any reformatting. finally, unpredictable expansions of internal coding processes can easily be added without adaptation of the existing software. a great disadvantage of the pointer technique lies in the creation of the directory. the storage space occupied by the pointers is also great in comparison with the place occupied by internal codes in other methods. a further handicap is the limitation imposed at input due to the use of simple separators. 178 journal of library automation vol. 7 i 3 september 197 4 ~~2,!j5,31~4>.~ 1,~eb ,4>11 ,9lel4,61ct;,3jb.o,l,l,e,t, i,n.o, ,b,u, i, i ,e, t, i ,n,oj 0 5 10 15 ,d,e,l,l,a, ,r,., ,a,c,c,a,d,e,m,i,a, ,m,e,d,i,c,a, ,d:i, ,r,o,m,a,$ ~ ~ ~ ~ ~ ~ ~ ~ ~~1 ,5 1 5,2~a~,$1 2,61sl2,6 1ci>, eli, ,b, ,m, ,t,h,r,e,e, ,h,u,n,d,r,e,d, ,s, i ,x,rl 0 5 10 15 ~ lv, , i,b,m, ,3,6,4>, ,a,s,s,e,m,b, i ,e, r, , i ,a,n,g,u,a,g,e, ~ ~ ~ h ~ ~ ~~ ~~1 ,5,4>,91~4>, 114>,1lelc~>,3 1 ci>, 1jm,a,c, ,k,e, l ,v,y, ~ 0 5 8 representation of examples 2, 3, and 4 in the quetelet fonds format. a represents the pointer characteristic for nonprinting data; b is the pointer characteristic for nonfiling data. fig. 2. pointe1· technique as applied to bibliographic data. in spite of these negative arguments, we see a great interest in this method, and wish to give some suggestions in order to relieve or to eliminate some of them. initially we must realize that the creation of a record takes place only once, while the applications are innumerable. the possibility of automatically adding some of the codes may also be considered. data needing special treatment expressed in a consistent set of logical rules can be coded by program. only exceptions have to be treated manually. in considering the space occupied by the directory, some profit could be imagined by trying to reduce the storage space occupied by the addresses and the lengths. there is also a solution to be found by not having systematically to provide pointer field information. one must realize that only a small percentage of the fields may contain such codes. finally, the restrictions at input may be removed by using complex separators. such a change does not have any repercussion on the directory. as far as we know, the pointer technique has not been used in a subfield level system. at our library an internal processing format of the subfield level type, called fbr, is under development, in which a pointer technique based on the foregoing is incorporated. techniques for special p1'dcessing/goossens 179 automatic handling techniques in order to give a complete review of the methods of handling data within bibliographic text, we must also treat the methods in which both the identification and the special treatment of these data are done during the execution of the output programs. the working method can easily be demonstrated with example 1. only the printing form must be recorded. the program for building the sort key processes a look-up table of nonfiling words including the articles l' and des. the program checks every word of the printing form for a match with one of the words of the nonfiling list. the sort key is built up with all the words which are not present in this table. to treat example 4, an analogous procedure can be worked through. an equivalence list of words for which the filing form differs from the printing form is needed. if, during the construction of the sort key, a match is found with a word in the equivalence list, the correct filing form, stored in this list, is placed in the sort key. the other words are taken in their printing form. in our case, using the equivalence list, me should be replaced by mac. in order to speed up the look-up procedures, different methods of organization of the look-up tables can be devised. other types of automatic processing techniques can be illustrated by the special filing algorithms constructed for a correct sort of dates. for instance, in order to be able to sort b.c. and a.d. dates in a chronological order, the year 0 is replaced by the year 5000. b.c. and a.d. dates are respectively subtracted from or added to this number. thus dates back to 5000 b.c. can be correctly treated. this technique, introduced by nypl, is also used at lc. the advantages of automatic handling techniques are many. no special arrangements must be made at input. only the bibliographic elements must be introduced under the printing form and no special codes have to be added. there is no storage space wasted for storing internal codes. as negative aspects we ascertain that not all cataloging rules may be expressed in rigid systematic process steps. examples 2 and 3 illustrate this point. one must also recognize that the special automatic handling programs must be executed repeatedly when a sort key is built up, increasing the processing time. this procedure may give some help for filing purposes, but we can hardly imagine that it really may solve all internal coding problems. think of the instructions to be given for the choice of character type while working with a type setting machine. the automatic handling technique is very extensively applied in the nypl programs, marc lc has recourse to it for treating dates, and bnb marc for personal names. 24 none of the field level systems considered here uses this method. summary and conclusions table 1 presents, for the discussed systems, a summary of the methods used for treating data in a bibliographic text. the duplication and indicator techniques have the most adherents. however, we must keep in mind table 1. review of the techniques for special processing of data within bibliographic text used or planned in the discussed systems systems techniques ,..... 00 0 automatic duplication internal codes handling ......... 0 separators t separators with indicators indicators pointers multiple personal beginning of 0 simple function name type filing text -t-t deutsche ""· c::>"' bibliographie x x ~ ~ <.-: eo chum x ~ <>+0 ~ bikas a .... 0 ;:i quetelet fonds x < 0 !"'"" -l. marc lc x x x x -cn cj') bnb marc x x x ('t) "'0 ..... ('t) nypl x x s 0" ('t) .... ,..... monocle x x x x co -l. jol>.. canadian marc x x x fer x techniques for special processing/ goossens 181 that in most of the systems the duplication of data only represents an extreme solution. on the other hand, indicators are very limited in their possibilities. as far as the flexibility and application possibilities are concerned, the simple separators and the pointers present the most interesting prospects. automatic handling techniques may produce good results for use in well-defined fields or subfields. from the evaluations given for the different methods, we conclude that for a special application the choice of a method depends greatly on the objectives, namely the sort of special processing facilities needed, the volume of data to be treated, and the frequency of execution. references i. rudolf blum, "die maschinelle herstellung der deutschen bibliographie in bibliothekarischer sicht," zeitschrift fur bibliothekswesen und bibliographie 13:303-21 (1966). 2. die zmd in frankfurt am main; herausgegeben von klaus schneider (berlin: beuth-vertrieb gmbh, 1969), p.133-37, 162-67. 3, magnetbanddienst deutsche bibliographie, beschreibung fur 7-spur-magnetbiinder (frankfurt on the main: zentralstelle fi.ir maschinelle documentation, 1972). 4. ingeborg sobottke, "rationalisierung der alphabetischen katalogisierung," in electronische datenverarbeitung in der universitiitsbibliothek bochum; herausgegeben in verbindung mit der pressestelle der ruhr-universitat bochum von gunther pflug und bernhard adams (bochum: druckund verlagshaus schiirmann & klagges, 1968), p.24-32. 5. datenerfassung und datenverarbeitung in der universitiitsbibliothek bielefeld: eine materialsammlung; hrsg. von elke bonness und harro heim (munich: pullach, 1972). 6. michel bartholomeus, l' aspect informatique de la catalographie automatique (brussels: bibliotheque royale albert j•r, 1970), 7. m. bartholomeus and m. hansart, lecture des ent1·ees bibliog1·aphiques sous format 80 colonnes et creation de l'enregistrement standard; publication interne: mecono b015a (brussels: bibliotheque royale albert j•r, 1969). 8. henriette d. avram, john f. knapp, and lucia j. rather, the marc ii format: a communications format for bibliographic data (washington, d.c.: library of congress, 1968) . 9. books, a marc format: specifications for magnetic tapes containing catalog records for books (5th ed.; washington, d.c.: library of congress, 1972). 10. "automation activities in the processing department of the library of congress," library resources & technical services 16:195-239 (spring 1972). 11. l. e. leonard and l. j. rather, internal marc format specifications for books (3d ed.; washington, d.c.: library of congress, 1972). 12. marc record service proposals (bnb documentation service publications no.1 [london: council of the british national bibliography, ltd., 1968]). 13. marc ii specifications (bnb documentation service publications no.2 [london: council of the british national bibliography, ltd., 1969]). 14. michael gorman and john e. linford, desc1·iption of the bnb marc recorda manual of practice (london: council of the british national bibliography, ltd., 1971). 182 ] ournal of library automation vol 7 i 3 september 197 4 15. edward duncan, "computer filing at the new york public library," in lm·c reports vol.3, no.3 ( 1970), p.66-72. 16. nypl automated bibliographic system overview, internal report. (new york: new york public library, 1972). 17. marc chauveinc, monocle: projet de mise en ordinateur d'une notice catalographique de livre. deuxieme edition (grenoble: bibliotheque universitaire, 1972). 18. marc chauveinc, "monocle," journal of library automation 4:113-28 (sept. 1971). 19. canadian marc (ottawa: national library of canada, 1972). 20. format de communication du marc canadien: monographies (ottawa: bibliotheque nationale du canada, 1973). 21. to be published. 22. private communications ( 1973). 23. private communications ( 1972). 24. private communications ( 1973). microsoft word 9733-16966-4-ce.docx editorial board thoughts: arts into science, technology, engineering, and mathematics – steam, creative abrasion, and the opportunity in libraries today tod colegrove information technologies and libraries | march 2017 4 over the millennia, man’s attempt to understand the universe has been an evolution from the broad to the sharply focused. a wide range of distinctly separate disciplines evolved from the overarching natural philosophy, the study of nature, of greco-roman antiquity: anatomy and astronomy through botany, mathematics, and zoology among many others. similarly, the arts, humanities, and engineering developed from broad over-arching interest into tightly focused disciplines that today are distinctly separate. as these legitimate divisions formed, grew, and developed into ever-deepening specialty, they enabled correspondingly deeper study and discovery1; in response, the supporting collections of the library divided and grew to reflect that increasing complexity. libraries have long been about the organization of, and access to, information resources. subject classification systems in use today, such as the dewey decimal system, are designed to group like items with like, albeit under broad overarching topic. a perhaps inevitable result for print collections housed under such a classification system is the physical isolation of items and, by extension, the individuals researching those topics from one another. under the library of congress system, for example, items categorized as “geography” are physically removed from those in “science;” further still from “technology.” end-users benefit from the possibility of serendipitous discovery while browsing shelves nearby, even as they are effectively shielded from exposure to distracting topics outside of their immediate focus. recent years have witnessed a rediscovery of, and renewed interest in, the fundamental role the library can have in the creation of knowledge, learning, and innovation among its members. as collections shift from print to electronic, libraries are increasingly less bound to the physical constraints imposed by their print collections. rather than a continued focus on hyperspecialization and separation, we have the opportunity to rethink the library: exploring novel configurations and services that might better support its community, and embracing emerging roles of trans-disciplinary collaboration and innovation. the library as intersection libraries reflect the institutional and organizational structures of their communities, even as the tod colegrove (pcolegrove@unr.edu), a member of the ital editorial board, is head of delamare science & engineering library, university of nevada, reno. editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 5 physical organization of the structures built to house print collections mirror the classification system in use. academic libraries are perhaps most entrenched in the structural division: rather than intrinsically promoting collaboration and discovery across disciplines, the organization of print collections, and typically the spaces around them, is designed to foster increased focus and specialization. specialized almost to the exclusion of other areas of study altogether, in branch libraries of a college or university this division can reach a pinnacle; libraries and collections devoted to exclusive topics of engineering, science, music, and others, exist on campuses across the country. amplified by separation and clustering of faculty and researchers, typically by department and discipline, it becomes entirely possible for individuals to “spend a lifetime working in a particular narrow field and never come into contact with the wider context of his or her study.”2 the library is also one of the few places in any community where individuals from a variety of backgrounds and specialties can naturally cross paths with one another. at a college or university, students and faculty from one discipline might otherwise rarely encounter those from other disciplines. whether public, school, or academic library, outside of the library individuals and groups are typically isolated from one another physically, with little opportunity to interact organically. without active intervention and deliberate effort on the part of the library, opportunities for creative abrasion3 and trans-disciplinary collaboration become virtually nonexistent; its potential to “unleash the creative potential that is latent in a collection of unlikeminded individuals,”4 untapped. leveraged properly, however, the intersection of interests and expertise that occurs naturally within the neutral spaces of the library can become a powerful tool that supports not only research, but creativity and innovation a place where ideas and viewpoints can collide, building on one another: “for most of us, the best chance to innovate lies at the intersection. not only do we have a greater chance of finding remarkable idea combinations there, we will also find many more of them.... the explosion of remarkable ideas is what happened in florence during the renaissance, and it suggests something very important. if we can just reach an intersection of disciplines or cultures, we will have a greater chance of innovating, simply because there are so many unusual ideas to go around.”5 difficult and scary the problem? “stimulating creative abrasion is difficult and scary because we are far more comfortable being with folks like us.”6 and yet a quick review of the literature reveals that knowledge creation, innovation, and success are inextricably linked7, with the fundamental understanding of their connection having undergone a dramatic shift: “knowledge is in fact essential to innovate, and while this might sound obvious today, putting knowledge and innovation and not physical assets at the centre of competitive advantage was a tremendous change.”8 as our libraries move toward embracing an even more active role within our communities, our organizational priorities are undergoing similarly dramatic shifts: support for knowledge creation information technologies and libraries | march 2017 6 and innovation becomes more central, even as physical assets shift toward a supporting, even peripheral, role. libraries, as fundamentally neutral hubs of diverse communities, are uniquely positioned to be able to cultivate creative abrasion within and among their communities, fostering not only knowledge creation, but innovation and success. indeed, the combination of physical, electronic, and staff assets can be the raw stuff by which trans-disciplinary engagement is encouraged. the active cultivation and support of creative abrasion, with direct linkage to desired outcomes, becomes arguably one of the most vital services the library can provide its community. rather than deepening the cycle of hyper-specialization, the emergence of makerspace in our libraries is one example of a trend toward enabling libraries to broaden and embrace that support. building on the intellectual diversity within the spaces of the library, staff members, volunteers, and fellow community members can serve as catalyst, triggering groups to “do something with that variety”9 by engaging across traditional boundaries. indeed, “by deliberately creating diverse organizations and explicitly helping team members appreciate thinking-styles different than their own, creative abrasion can result in successful innovation.”10 strategic placement and staff support of makerspace activity can dramatically increase the opportunity for creative abrasion and, by extension, the resulting knowledge creation, creativity and innovation. arts bring a fundamental literacy and resource to stem in recent years, greater emphasis on students acquiring stem (science, technology, engineering, and math) skills has raised the topic to be one of the most central issues in education. considered a key solution to improving the competitiveness of american students on the global stage, the approach of stem education shares the common goal of breaking down the artificial barriers that exist even within the separate disciplines of sciences, technology, engineering, and math in short, increasing the diversity of the learning environment. proponents of steam go further by suggesting that adding art into the mix can bring new energy and language to the table, “sparking curiosity, experimentation, and the desire to discover the unknown in students.” 11 federal agencies such as the u.s. department of education and the national science foundation have funded and underwritten a number of grants, conferences, and workshops in the field, including the seminal forum hosted by the rhode island school of design (risd), “bridging stem to steam: developing new frameworks for art-science-design pedagogy.”12 john maeda, the president of the risd, identifies a direct connection between the approach and the creativity and success of late apple co-founder steve jobs, with steam support “a pathway to enhance u.s. economic competitiveness.”13 proponents go further, arguing the arts bring both a fundamental literacy and resource to the stem disciplines, providing “innovations through analogies, models, skills, structures, techniques, methods, and knowledge.”14 consider the findings of a study of nobel prize winners in the sciences, members of the royal society, and the u.s. national academy of sciences; nobel laureates were: editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 7 twenty-five times as likely as an average scientist to sing, dance, or act; seventeen times as likely to be an artist; twelve times more likely to write poetry and literature; eight times more likely to do woodworking or some other craft; four times as likely to be a musician; and twice as likely to be a photographer.15 from the standpoint of creative abrasion, welcoming the “a” of art into the library support of stem disciplines increases the diversity of the library, and by default the opportunity for creative abrasion. from aristotle and pythagoras through galileo galilei and leonardo da vinci to benjamin franklin, richard feynman, and noam chomsky, a long list of individuals of wideranging genius hints at a potential left largely untapped by our traditional approach. connections between stem disciplines, art, and the innovation arising directly out of their creative abrasion surround us: the electronic screens used on a wide range of technology, including computers, televisions, and cell phones, are the result of a collaboration between a series of painter-scientists and post-impressionist artists such as seurat a combination of red, green, and blue dots generate full-spectrum images in a way not unlike that of the artistic technique of pointillism. the electricity to drive that technology is understood, in part, due to early work by franklin even as he lay the foundations of the free public library with the opening of america’s first lending library, and pursued a broad range of parallel interests. the stitches used in medical surgery are the result of nobel laureate alexis carrel taking his knowledge of lace making from a traditional arena into the operating room. prominent american inventors “samuel morse (telegraph) and robert fulton (steam ship) were among the most prominent american artists before they turned to inventing.”16 in short, “increasing success in science is accompanied by developed ability in other fields such as the fine arts.”17 rather than isolated in monastic study, “almost all nobel laureates in the sciences are actively engaged in arts as adults.”18 perhaps surprisingly, rather than being rewarded by an ever-increasing focus and hyper-specialization, genius in the sciences seems tied to individuals’ activity in the arts and crafts. the study’s authors cite three different nobel prize winners, including j. h. van’t hoff’s 1878 speculation that scientific imagination is correlated with creative activities outside of science19; going on to detail similar findings from general studies dating back over a century. of even more seminal interest, the authors point to a similar connection for adolescents/young adults where milgram and colleagues20 found “having at least one persistent and intellectually stimulating hobby is a better predictor of career success in any discipline than iq, standardized test scores, or grades.”21 discussion the connection between individuals holding a multiplicity of interests, trans-disciplinary activity, and success is clear; what is less clear is to what extent we are fostering that connection in our libraries today. the potential is nevertheless tantalizing: a random group of people, thrown together, is not likely to be very creative. by going beyond specialization and wading into the information technologies and libraries | march 2017 8 deeper waters of supporting and cultivating creative abrasion and avocation among the membership of our libraries, we are fostering success and innovation beyond what might otherwise occur. the decision to catalyze and foster the cross-curricular collaboration that is steam22 is squarely in the hands of the library: in the design of its spaces, and in the interactions of the staff of the library with the communities served. we can choose to actively connect and catalyze across traditional boundaries. as the head of a science and engineering library, one of the early adopters of makerspace and actively exploring the possibilities of steam engagement for several years, i have time and again witnessed the leaps of insight and creativity brought about by creative abrasion. from across disciplines members are engaging with the resources of the library and, with our encouragement, one another in an ever-increasing cycle of knowledge creation, innovation, and success. the impact is particularly dramatic among individuals from strongly differing backgrounds and disciplines: for example, when an engineering student, who considers themselves to be expert with a particular technology, witnesses and interacts with an art student using that same technology and accomplishing something truly unexpected, even seemingly magical. or when a science student approaching a problem from one perspective realizes a practitioner from a different discipline sees the problem from an entirely different, and yet equally valid, point of view. in each case, it’s as if the worldview of each suddenly melts: shifting and expanding, never to return to its original shape. transformative experiences become the order of the day, even as the informal environment offers a wealth of opportunity to engage with and connect end-users to the more traditional resources of library. by actively seeking out opportunities to bring art into traditionally stem-focused activity, and vice-versa, we are deliberately increasing the diversity of the environment. makerspace services and activities, to the extent they are open and visibly accessible to all, are a natural for the spontaneous development of trans-disciplinary collaboration. within the spaces of the library, opportunities to connect individuals around shared avocational interest might range from music and spontaneous performance areas to spaces salted with lego bricks and jigsaw puzzles; the potential connections between our resources and the members of our communities are as diverse as their interests. indeed, when a practitioner from one discipline can interact and engage with others from across the steam spectrum, the world becomes a richer place – and maybe, just maybe, we can fan the flames of curiosity along the way. references 1. bohm, d., and f. d. peat. 1987. science, order, and creativity: a dramatic new look at the creative roots of science and life. london: bantam. 2. ibid., 18-19. 3. hirshberg, jerry. 1998. the creative priority: driving innovative business in the real world. london: penguin. editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 9 4. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books. 5. johansson, frans. 2004. the medici effect: breakthrough insights at the intersection of ideas, concepts, and cultures. boston, massachusetts: harvard business school press, 20. 6. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books, 25. 7. nonaka, ikujiro. 1994. “a dynamic theory of organizational knowledge creation.” organization science 5 (1): 14–37. 8. correia de sousa, milton. 2006. “the sustainable innovation engine.” vine 36 (4): 398–405, accessed february 14, 2017. https://doi.org/10.1108/03055720610716656. 9. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books, 20. 10. adams, karlyn. 2005. the sources of innovation and creativity. education, september, 2005, 33. https://doi.org/10.1007/978-3-8349-9320-5. 11. jolly, anne. 2014. “stem vs. steam: do the arts belong?” education week teacher. http://www.edweek.org/tm/articles/2014/11/18/ctq-jolly-stem-vssteam.html?qs=stem+vs.+steam. 12. rose, christopher, and brian k. smith. 2011. “bridging stem to steam: developing new frameworks for art-science-design pedagogy.” rhode island school district press release. 13. robelen, erik w. 2011. “steam: experts make case for adding arts to stem.” education week. http://www.bmfenterprises.com/aep-arts/wp-content/uploads/2012/02/ed-week-stemto-steam.pdf. 14. root-bernstein, robert. 2011. “the art of scientific and technological innovations – art of science learning.” http://scienceblogs.com/art_of_science_learning/2011/04/11/the-art-ofscientific-and-tech-1/. 15. ibid. 16. ibid. 17. root-bernstein, robert, lindsay allen, leighanna beach, ragini bhadula, justin fast, chelsea hosey, benjamin kremkow, et al. 2008. “arts foster scientific success: avocations of nobel, national academy, royal society, and sigma xi members.” journal of psychology of science and technology. https://doi.org/10.1891/1939-7054.1.2.51. information technologies and libraries | march 2017 10 18. ibid. 19. van’t hoff, jacobus henricus. 1967. “imagination in science,” molecular biology, biochemistry and biophysics, translated by g. f. springer, 1, springer-verlag, pp. 1-18 20. milgram, roberta m., and eunsook hong. 1997. "out-of-school activities in gifted adolescents as a predictor of vocational choice and work." journal of secondary gifted education 8, no. 3: 111. education research complete, ebscohost (accessed february 26, 2017). 21. root-bernstein, robert, lindsay allen, leighanna beach, ragini bhadula, justin fast, chelsea hosey, benjamin kremkow, et al. 2008. “arts foster scientific success: avocations of nobel, national academy, royal society, and sigma xi members.” journal of psychology of science and technology. https://doi.org/10.1891/1939-7054.1.2.51. 22. land, michelle h. 2013. “full steam ahead: the benefits of integrating the arts into stem.” procedia computer science 20. elsevier masson sas: 547–52. https://doi.org/10.1016/j.procs.2013.09.317. content management systems: trends in academic libraries ruth sara connell information technology and libraries | june 2013 42 abstract academic libraries, and their parent institutions, are increasingly using content management systems (cmss) for website management. in this study, the author surveyed academic library web managers from four-year institutions to discover whether they had adopted cmss, which tools they were using, and their satisfaction with their website management system. other issues, such as institutional control over library website management, were raised. the survey results showed that cms satisfaction levels vary by tool and that many libraries do not have input into the selection of their cms because the determination is made at an institutional level. these findings will be helpful for decision makers involved in the selection of cmss for academic libraries. introduction as library websites have evolved over the years, so has their role and complexity. in the beginning, the purpose of most library websites was to convey basic information, such as hours and policies, to library users. as time passed, more and more library products and services became available online, increasing the size and complexity of library websites. many academic library web designers found that their web authoring tools were no longer adequate for their needs and turned to cmss to help them manage and maintain their sites. for other web designers, the choice was not theirs to make. their institution transitioned to a cms and required the academic library to follow suit, regardless of whether the library staff had a say in the selection of the cms or its suitability for the library environment. the purpose of this study was to examine cms usage within the academic library market and to provide librarians quantitative and qualitative knowledge to help make decisions when considering switching to, or between, cmss. in particular, the objectives of this study were to determine (1) the level of saturation of cmss in the academic library community; (2) the most popular cmss within academic libraries, the reasons for the selection of those systems, and satisfaction with those cmss; (3) if there is a relationship between libraries with their own dedicated information technology (it) staff and those with open source (os) systems; and (4) if there is a relationship between institutional characteristics and issues surrounding cms selection. ruth sara connell (ruth.connell@valpo.edu) is associate professor of library services and electronic services librarian, christopher center library services, valparaiso university, valparaiso, in. mailto:ruth.connell@valpo.edu content management systems: trends in academic libraries | connell 43 although this study largely focuses on cms adoption and related issues, the library web designers who responded to the survey were asked to identify what method of web management they use if they do not use a cms and asked about satisfaction with their current system. thus, information regarding cms alternatives (such as adobe’s dreamweaver web content editing software) is also included in the results. as will be discussed in the literature review, cmss have been broadly defined in the past. therefore, for this study participants were informed that only cmss used to manage their primary public website were of interest. specifically, cmss were defined as website management tools through which the appearance and formatting is managed separately from content, so that authors can easily add content regardless of web authoring skills. literature review most of the library literature regarding cms adoption consists of individual case studies describing selection and implementation at specific institutions. there are very few comprehensive surveys of library websites or the personnel in charge of academic library websites to determine trends in cms usage. the published studies including cms usage within academic libraries do not definitively answer whether overall adoption has increased. in 2005 several georgia state university librarians surveyed web librarians at sixty-three of their peer institutions, and of the sixteen responses, six (or 38 percent) reported use of “cms technology to run parts of their web site.” 1 a 2006 study of web managers from wide range of institutions (associates to research) indicated a 26 percent (twenty-four of ninety-four) cms adoption rate.2 a more recent 2008 study of institutions of varying sizes resulted in a little more than half of respondents indicating use of cmss, although the authors note that “people defined cmss very broadly,” 3 including tools like moodle and contentdm, and some of those libraries indicated they did not use the cms to manage their website. a 2012 study by comeaux and schmetzke differs from the others mentioned here in that they reviewed academic library websites of the fifty-six campuses offering ala-accredited graduate degrees (generally larger universities) and used tools and examined page code to try to determine on their own if the libraries used cmss, as opposed to polling librarians at those institutions to ask them to self-identify if they used cmss. they identified nineteen out of fifty-six (34 percent) sites using cmss. the authors offer this caveat, “it is very possible that more sites use cmss than could be readily identified. this is particularly true for ‘home-grown’ systems, which are unlikely to leave any readily discernible source code.” 4 because of different methodologies and population groups studied in these studies, it is not possible to draw conclusions regarding cms adoption rates within academic libraries over time using these results. as mentioned previously, some people define cmss more broadly than others. one example of a product that can be used as a cms, but is not necessarily a cms, is springshare’s libguides. many libraries use libguides as a component of their website to create guides. however, some libraries have utilized the product to develop their whole site, in effect using it as a cms. a case study by information technology and libraries | june 2013 44 two librarians at york college describes why they chose libguides as their cms instead of as a more limited guide creation tool.5 several themes recurred throughout many of the case study articles. one common theme was the issue of lack of control and problems of collaboration between academic libraries and the campus entities controlling website management. amy york, the web services librarian at middle tennessee state university, described the decision to transition to a cms in this way, “and while it was feasible for us to remain outside of the campus cms and yet conform to the campus template, the head of the it web unit was quite adamant that we move into the cms.” 6 in a study by bundza et al., several participants who indicated dissatisfaction with website maintenance mentioned “authority and decision-making issues” as well as “turf struggles.” 7 other articles expressed more positive collaborative experiences. morehead state university librarians kmetz and bailey noted, “when attending conferences and hearing the stories of other libraries, it became apparent that a typical relationship between librarians and a campus it staff is often much less communicative and much less positive than [ours]. because of the relatively smooth collaborative spirit, a librarian was invited in 2003 to participate in the selection of a cms system.” 8 kimberley stephenson also emphasized the advantageous relationships that can develop when a positive approach is used, “rather than simply complaining that staff from other departments do not understand library needs, librarians should respectfully acknowledge that campus web developers want to create a site that attracts users and consider how an attractive site that reflects the university’s brand can be beneficial in promoting library resources and services.” 9 however, earlier in the article she does acknowledge that the iterative and collaborative process between the library and their university relations (ur) department was occasionally contentious and that the web services librarian notifies ur staff before making changes to the library homepage.10 another common theme in the literature was the reasoning behind transitioning to a cms. one commonly cited criterion was access control or workflow management, which allows site administrators to assign contributors editorial control over different sections of the site or approve changes before publishing.11 however, although this feature is considered a requirement by many libraries, it has its detractors. kmetz and bailey indicated that at morehead state university, “approval chains have been viewed as somewhat stifling and potentially draconian, so they have not been activated.” 12 these studies greatly informed the questions used and development of the survey instrument for this study. method in designing the survey instrument, questions were considered based on how they informed the objectives of the study. to simplify analysis, it was important to compile as comprehensive a list of content management systems: trends in academic libraries | connell 45 cmss as possible. this list was created by pulling cms names from the literature review, the web4lib discussion list, and the cmsmatrix website (www.cmsmatrix.org). in order to select institutions for distribution, the 2010 carnegie classification of institutions of higher education basic classification lists were used.13 the author chose to focus on three broad classifications: 1. research institutions consisting of the following carnegie basic classifications: research universities (very high research activity), research universities (high research activity), and dru: doctoral/research universities. 2. master’s institutions consisting of the following carnegie basic classifications: master's colleges and universities (larger programs), master's colleges and universities (medium programs), master's colleges and universities (smaller programs). 3. baccalaureate institutions consisting of the following carnegie basic classifications: baccalaureate colleges—arts & sciences and baccalaureate colleges—diverse fields. the basic classification lists were downloaded into excel with each of the three categories in a different worksheet, and then each institution was assigned a number using the random number generator feature within excel. the institutions were then sorted by those numbers creating a randomly ordered list within each classification. to determine sample size for a stratified random sampling, ronald powell’s “table for determining sample size from a given population” 14 (with a .05 degree of accuracy) was used. each classification’s population was considered separately, and the appropriate sample size chosen from the table. the population size of each of the groups (total number of institutions within that carnegie classification) and the corresponding sample sizes were • research: population = 297, sample size = 165; • master’s: population = 727, sample size = 248; • baccalaureate: population = 662, sample size = 242. the total number of institutions included in the sample size was 655. the author then went through the list of selected institutions and searched online to find their library webpages and find the person most likely responsible for the library’s website. during this process, there were some institutions, mostly for-profits, for which a library website could not be found. when this occurred, that institution was eliminated and the next institution on the list used in its place. in some cases, the person responsible for web content was not easily identifiable; in these cases an educated guess was made when possible, or else the director or a general library email address was used. the survey was made available online and distributed via e-mail to the 655 recipients on october 1, 2012. reminders were sent on october 10 and october 18, and the survey was closed on october 26, 2012. out of 655 recipients, 286 responses were received. some of those responses http://www.cmsmatrix.org/ information technology and libraries | june 2013 46 had to be eliminated for various reasons. if two responses were received from one institution, the more complete response was used while the other response was discarded. some responses included only an answer to the first question (name of institution or declination of that question to answer demographic questions) and no other responses; these were also eliminated. once the invalid responses were removed, 265 remained, for a 40 percent response rate. before conducting an analysis of the data, some cleanup and standardization of results was required. for example, a handful of respondents indicated they used a cms and then indicated that their cms was dreamweaver or adobe contribute. these responses were recoded as non-cms responses. likewise, one respondent self-identified as a non-cms user but then listed drupal as his/her web management tool and this was recoded as a cms response. demographic profile of respondents for the purposes of gathering demographic data, respondents were offered two options. they could provide their institution’s name, which would be used solely to pair their responses with the appropriate carnegie demographic categories (not to identify them or their institution), or they could choose to answer a separate set of questions regarding their size, public/private affiliation, and basic carnegie classification. the basic carnegie classification of the largest response group was master’s with 102 responses (38 percent); then baccalaureate institutions (94 responses or 35 percent), and then research institutions (69 responses or 26 percent). this correlates pretty closely with the distribution percentages, which were 38 percent master’s (248 out of 655), 37 percent baccalaureate (242 out of 655), and 25percent research (165 out of 655). of the 265 responses, 95 (36 percent) came from academic librarians representing public institutions and 170 (64 percent) from private. of the private institutions, the vast majority (166 responses or 98 percent) were not-for-profit, while 4 (2 percent) were for-profits. to define size, the carnegie size and setting classification was used. very small institutions are defined as less than 1,000 full-time equivalent (fte) enrollment, small is 1,000–2,999 fte, medium is 3,000–9,999 fte, and large is at least 10,000 fte. the largest group of responses came from small institutions (105 responses or 40 percent), then medium (67 responses or 25 percent), large (60 responses or 23 percent), and very small (33 responses or 12 percent). results the first question asking for institutional identification (or alternative routing to carnegie classification questions) was the only question for which an answer was required. in addition, because of question logic, some people saw questions that others did not based on how they answered previous questions. thus, the number of responses varies for each question. one of the objectives of this study was to identify if there were traits among institutional characteristics and cms selection and management. the results that follow include both content management systems: trends in academic libraries | connell 47 descriptive statistics and statistically significant inferential statistics discovered using chi-square and fisher’s exact tests. statistically significant results are labeled as such. the responses to this survey show that most academic libraries are using a cms to manage their main library website (169out of 265 responses or 64 percent). overall, cms users expressed similar (although slightly greater) satisfaction levels with their method of web management (see table 1.) table 1 satisfaction by cms use use a cms to manage library website yes no user is highly satisfied or satisfied yes 79 responses or 54% 41 responses or 47% no 68 responses or 46% 46 responses or 53% total 147 responses or 100% 87 responses or 100% non-cms users non-cms users were asked what software or system they use to govern their site. by far, the most popular system mentioned among the 82 responses was adobe dreamweaver, with 24 (29 percent) users listing it as their only or primary system. some people listed dreamweaver as part of a list of tools used; for example “php / mysql, integrated development environments (php storm, coda), dreamweaver, etc.,” and if all mentions of dreamweaver are included, the number of users rises to 31 (38 percent). some version of “hand coded” was the second most popular answer with 9 responses (11 percent), followed by adobe contribute with 7 (9 percent). many of the “other” responses were hard to classify and were excluded from analysis. some examples include: • ftp to the web • voyager public web browser ezproxy • excel, e-mail, file folders on shared drives among the top three non-cms web management systems, dreamweaver users were most satisfied, selecting highly satisfied or satisfied in 15 out of 24 (63 percent) cases. hand coders were highly satisfied or satisfied in 5out of 9 of cases (56 percent), and adobe contribute users were only highly satisfied or satisfied in 3 out of 7 (43 percent) cases. respondents not using a cms were asked whether they were considering a move to a cms within the next two years. most (59 percent) said yes. research libraries were much more likely to be planning such a move (81percent) than master’s (50 percent) or baccalaureate (45 percent) libraries (see table 2.) a chi-square test rejects the null hypothesis that the consideration of a move to cms is independent of basic carnegie classification; this difference was significant at the p = 0.038 level. information technology and libraries | june 2013 48 table 2 non-cms users considering a move to a cms within the next two years by carnegie classification* baccalaureate master’s research total no 11 responses or 55% 11 responses or 50% 4 responses or 19% 26 responses or 41% yes 9 responses or 45% 11 responses or 50% 17 responses or 81% 37 responses or 59% total 20 responses or 100% 22 responses or 100% 21 responses or 100% 63 responses or 100% chi-square=6.526, df=2, p=.038 *excludes “not sure” responses non-cms users were asked to provide comments related to topics covered in the survey, and here is a sampling of responses received: • cmss cost money that our college cannot count on being available on a yearly basis. • the library doesn't have overall responsibility for the website. university web services manages the entire site, i submit changes to them for inclusion and updates. • we are so small that the time to learn and implement a cms hardly seems worth it. so far this low-tech method has worked for us. • the main university site was moved to a cms in 2008. the library was not included in that move because of the number of pages. i hear rumors that we will be forced into the cms that is under consideration for adoption now. the library has had zero input in the selection of the new cms. cms users when respondents indicated their library used a cms, they were routed to a series of cms related questions. the first question asked which cms their library was using. of the 153 responses, the most popular cmss were drupal (40); wordpress (15); libguides (14), which was defined within the survey as a cms “for main library website, not just for guides”; cascade server (12); ektron (6); and modx and plone (5 each). these users were also asked about their overall satisfaction with their systems. among the top four cmss, libguides users were the most satisfied, selecting highly satisfied or satisfied in 12 out of 12 (100 percent) cases. the remaining three systems’ satisfaction ratings (highly satisfied or satisfied) were as follows: wordpress (12out of 15 cases or 80 percent), drupal (26out of 38 cases or 68 percent), and cascade server (3 out of 11 cases or 27 percent). when asked whether they would switch systems if given the opportunity, most (61out of 109 cases or 56 percent) said no. looking at the responses for the top four cmss, responses echo the content management systems: trends in academic libraries | connell 49 satisfaction responses. libguides users were least likely to want to switch (0 out of 7 cases or 0 percent), followed by wordpress (1 out of 5 cases or 17 percent), drupal (8out of 23 cases or 26 percent), and cascade server (3 out of 7 or 43 percent) users. respondents were asked whether their library uses the same cms as their parent institution. most (106 out of 169 cases or 63 percent) said yes. libraries at large institutions (over 10,000 fte) were much less likely (34 percent) than their smaller counterparts to share a cms with their parent institution (see table 3.) a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of size: at a significance level of p = 0.001, libraries at smaller institutions are more likely to share a cms with their parent. table 3 cms users whose libraries use the same cms as their parent institution by size large medium small very small total no 23 responses (66%) 15 responses (33%) 19 responses (27%) 6 responses (35%) 63 responses (37%) yes 12 responses (34%) 31 responses (67%) 52 responses (73%) 11 responses (65%) 106 responses (63%) total 35 responses (100%) 46 responses (100%) 71 responses (100%) 17 responses (100%) 169 responses (100%) chi-square=15.921, df=3, p=.001 not surprisingly, a similar correlation holds true for comparing shared cmss and simplified basic carnegie classification. baccalaureate and master’s libraries were more likely to share cmss with their institutions (69 percent and 71 percent respectively) than research libraries (42 percent) (see table 4.) at a significance level of p = 0.004, a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of basic carnegie classification. table 4 cms users whose libraries use the same cms as their parent institution, by carnegie classification baccalaureate master’s research total no 19 responses (31%) 18 responses (29%) 26 responses (58%) 63 responses (37%) yes 43 responses (69%) 44 responses (71%) 19 responses (42%) 106 responses (63%) total 62 responses (100%) 62 responses (100%) 45 responses (100%) 169 responses (100%) chi-square = 11.057, df = 2, p = .004 information technology and libraries | june 2013 50 when participants responded that their library shared a cms with the parent institution, they were asked a follow up question about whether the library made the transition with the parent institution. most (80 out of 99 cases or 81 percent) said yes, the transition was made together. however, private institutions were more likely to have made the switch together (88 percent) than public (63 percent) (see table 5.) a fisher’s exact test rejects the null hypothesis that transition to cms is independent of institutional control: at a significance level of p = 0.010, private institutions are more likely than public to move to a cms in concert. table 5 users whose libraries and parent institutions use the same cms: transition by public/private control* private public total switched independently 9 responses (13%) 10 responses (37%) 19 responses (19%) switched together 63 responses (88%) 17 responses or (63%) 80 responses (81%) total 72 responses (101%)** 27 responses (100%) 99 responses (100%) fisher’s exact test: p = .010 * excludes responses where people indicated “other” ** due to rounding, total is greater than 100% similarly, a relationship existed between transition to cms and basic carnegie classification. baccalaureate institutions (93 percent) were more likely than master’s (80 percent), which were more likely than research institutions (53 percent) to make the transition together (see table 6.) a chi-square test rejects the null hypothesis that the transition to cms is independent of basic carnegie classification: at a significance level of p = 0.002, higher degree granting institutions are less likely to make the transition together. table 6 users whose libraries and parent institutions use the same cms: transition by carnegie classification* baccalaureate master’s research total switched independently 3 responses (7%) 8 responses (21%) 8 responses (47%) 19 responses (19%) switched together 40 responses (93%) 31 responses (80%) 9 responses (53%) 80 responses (81%) total 43 responses (100%) 39 responses (101%)** 17 responses (100%) 99 responses (100%) chi-square = 12.693, df = 2, p = .002 *excludes responses where people indicated “other” **due to rounding, total is greater than 100% content management systems: trends in academic libraries | connell 51 this study indicates that for libraries that transitioned to a cms with their parent institution, the transition was usually forced. out of the 88 libraries that transitioned together and indicated whether they were given a choice, only 8 libraries (9 percent) had a say in whether to make that transition. and even though academic libraries were usually forced to transition with their institution, they did not usually have representation on campus-wide cms selection committees. only 25 percent (22 out of 87) respondents indicated that their library had a seat at the table during cms selection. when comparing cms satisfaction ratings among libraries that were represented on cms selection committees versus those that had no representation, it is not surprising that those with representation were more satisfied (13 out of 22 cases or 59 percent) than those without (21 out of 59 cases or 36 percent). the same holds true for those libraries given a choice whether to transition. those given a choice were satisfied more often (6out of 8 cases or 75 percent) than those forced to transition (21 out of 71 cases or 30 percent). respondents who said that they were not on the same cms as their institution were asked why they chose a different system. many of the responses indicated a desire for freedom from the controlling influence of either it and marketing arms of the institution : • we felt drupal offered more flexibility for our needs than cascade, which is what the university at large was using. i've heard more recently that the university may be considering switching to drupal. • university pr controls all aspects of the university cms. we want more freedom. • we are a service-oriented organization, as opposed to a marketing arm. we by necessity need to be different. cms users were asked to provide a list of three factors most important in their selection of their cms and to rank their list in order of importance. the author standardized the responses, e.g. “price” was recorded as “cost.” the factors listed first, in order of frequency, were ease of use (15), flexibility (10), and cost (6). ignoring the ranking, 38 respondents listed ease of use somewhere in their “top three”, while 23 listed cost, and 16 listed flexibility. another objective of this study was to determine if there was a positive correlation between libraries with their own dedicated it staff and those who chose open source cmss. therefore cms users were asked if their library had its own dedicated it staff, and 66 out of 143 libraries (46 percent) said yes. then the cmss used by respondents were translated into two categories, open source or proprietary systems (when a cms listed was unknown it was coded as a missing value), and a fisher’s exact test was run against all cases that had values for both variables to see if a correlation existed. although those with library it had open source systems more frequently than those without, the difference was not significant (see table 7.) information technology and libraries | june 2013 52 table 7 libraries with own it personnel by open source cmss library has own it yes no total cms is open source yes 37 responses (73%) 32 responses (57%) 69 responses (65%) no 14 responses (28%) 24 responses (43%) 38 responses (36%) total 51 responses (101%)* 56 responses or (100%) 107 responses (101%)* fisher’s exact test: p = .109 *due to rounding, total is greater than 100% in another question, people were asked to self-identify if their organization uses an open source cms, and if so asked whether they have outsourced any of its implementation or design to an outside vendor. most (61 out of 77 cases or 79 percent) said they had not outsourced implementation or design. one person commented, “no, i don't recommend doing this. the cost is great, you lose the expertise once the consultant leaves, and the maintenance cost goes through the roof. hire someone fulltime or move a current position to be the keeper of the system.” one of the advantages of having a cms is the ability to give multiple people, regardless of their web authoring skills, the opportunity to edit webpages. therefore, cms users were asked how many web content creators they have within their library. out of 152 responses, the most frequent range cited was 2–5 authors (72 responses or 47 percent), followed by (33 responses or 22 percent) with only one author, 6–10 authors (20 responses or 13 percent), 21–50 authors (16 responses or 11 percent), 11–20 authors (6 responses or 4 percent), and over 50 authors (5 responses or 3 percent). because this question was an open ended response and responses varied greatly, including “over 100 (over 20 are regular contributors)” and “1–3”, standardization was required. when a range or multiple numbers were provided, the largest number was used. respondents were asked whether their library uses a workflow management process requiring page authors to receive approval before publishing content. of the 131 people who responded yes or no, most (88 responses or 67 percent) said no. cms users were asked to provide comments related to topics covered in the survey. many comments mentioned issues of control (or lack thereof), while another common theme was concerns with specific cmss. here is a sampling of responses received: • having dedicated staff is a necessity. there was a time when these tools could be installed and used by a techie generalist. those days are over. a professional content person and a professional cms person are a must if you want your site to look like a professional site... content management systems: trends in academic libraries | connell 53 i'm shocked at how many libraries switched to a cms yet still have a site that looks and feels like it was created 10 years ago. • since the cms was bred in-house by another university department, we do not have control over changing the design or layout. the last time i requested a change, they wanted to charge us. • our university marketing department, which includes the web team, is currently in the process of switching [cmss]. we were not invited to be involved in the selection process for a new cms, although they did receive my unsolicited advice. • we compared costs for open source and licensed systems, and we found the costs to be approximately equivalent based on the development work we would have needed in an open source environment. • the library was not part of the original selection process for the campus' first cms because my position didn't exist at that time. now that we have a dedicated web services position, the library is considered a "power user" in the cms and we are often part of the campus wide discussions about the new cms and strategic planning involving the campus website. • we currently do not have the preferred level of control over our library website; we fought for customization rights for our front page, and won on that front. however, departments on campus do not have permission to install or configure modules, which we hope will change in the future. • there’s a huge disconnect between it /administration and the library regarding unique needs of the library in the context of web-based delivery of information. discussion comparing the results of this study to previous studies indicates that cms usage within academic libraries is rising. the 64 percent cms adoption rate found in this survey, which used a more narrow definition of cms than some previous studies cited in the literature review, is higher than adoption rates in any of said studies. as more libraries make the transition, it is important to know how different cmss have been received among their peers. although cms users are slightly more satisfied than non-cms users (54 percent vs. 47 percent), the tools used matter. so if a library using dreamweaver to manage their site is given an option of moving with their institution to a cms and that cms is cascade server, they should strongly consider sticking with their current non-cms method based on the respective satisfaction levels reported in this study (63 percent vs. 27 percent). satisfaction levels are important, but should not be considered in a vacuum. for example, although libguides users reported very high satisfaction levels (100 percent were satisfied or very satisfied), users were mostly (11 out of 14 users or 79 percent) small or very small schools, while the remaining three (21percent) were medium schools. no large schools reported using libguides as their cms. libguides may be wonderful for a smaller school without need of much information technology and libraries | june 2013 54 customization or, in some cases, access to technical expertise but may not be a good cms solution for larger institutions. one of the largest issues raised by survey respondents was libraries’ control, or lack thereof, when moving to a campus-selected cms. given the complexity of academic libraries websites, library representation on campus-wide cms selection committees is warranted. not only are libraries more satisfied with the results when given a say in the selection, but libraries have special needs when it comes to website design that other campus units do not. including library representation ensures those needs are met. some of the respondents’ comments regarding lack of control over their sites are disturbing to libraries being forced or considering a move to a campus cms. clearly, having to pay another campus department to make changes to the library site is not an attractive option for most libraries. nor should libraries have to fight for the right or ability to customize their home pages. developing good working relationships with the decision makers may help prevent some of these problems, but likely not all. this study indicates that it is not uncommon for academic libraries to be forced into cmss, regardless of the cmss acceptability to the library environment. conclusion the adoption of cmss to manage academic libraries’ websites is increasing, but not all cmss are created equal. when given input into switching website management tools, library staff have many factors to take into consideration. these include, but are not limited to, in-house technical expertise, desirability of open source solutions, satisfaction of peer libraries with considered systems, and library specific needs, such as workflow management and customization requirements. ideally, libraries would always be partners at the table when campus-wide cms decisions are being made, but this study shows that this does not happen in most cases. if a library suspects that it is likely to be required to move to a campus-selected system, its staff should be alert for news of impending changes so that they can work to be involved at the beginning of the process to be able to provide input. a transition to a bad cms can have long-term negative effects on the library, its users, and staff. a library’s website is its virtual “branch” and vitally important to the functioning of the library. the management of such an important component of the library should not be left to chance. references 1. doug goans, guy leach, and teri m. vogel, “beyond html: developing and re-imagining library web guides in a content management system,” library hi tech 24, no. 1 (2006): 29–53, doi:10.1108/07378830610652095. 2. ruth sara connell, “survey of web developers in academic libraries,” the journal of academic librarianship 34, no. 2 (march 2008): 121–129, doi:10.1016/j.acalib.2007.12.005. http://dx.doi.org/10.1016/j.acalib.2007.12.005 content management systems: trends in academic libraries | connell 55 3. maira bundza, patricia fravel vander meer, and maria a. perez-stable, “work of the web weavers: web development in academic libraries,” journal of web librarianship 3, no. 3 (july 2009): 239–62. 4. david comeaux and axel schmetzke, “accessibility of academic library web sites in north america—current status and trends (2002–2012).” library hi tech 31, no. 1 (january 28, 2013): 2. 5. daniel verbit and vickie l. kline, “libguides: a cms for busy librarians,” computers in libraries 31, no. 6 (july 2011): 21–25. 6. amy york, holly hebert, and j. michael lindsay, “transforming the library website: you and the it crowd,” tennessee libraries 62, no. 3 (2012). 7. bundza, vender meer, and perez-stable, “work of the web weavers: web development in academic libraries.” 8. tom kmetz and ray bailey, “migrating a library’s web site to a commercial cms within a campus-wide implementation,” library hi tech 24, no. 1 (2006): 102–14, doi:10.1108/07378830610652130. 9. kimberley stephenson, “sharing control, embracing collaboration: cross-campus partnerships for library website design and management,” journal of electronic resources librarianship 24, no. 2 (april 2012): 91–100. 10. ibid. 11. elizabeth l. black, “selecting a web content management system for an academic library website,” information technology & libraries 30, no. 4 (december 2011): 185–89; andy austin and christopher harris, “welcome to a new paradigm,” library technology reports 44, no. 4 (june 2008): 5–7; holly yu , “chapter 1: library web content management: needs and challenges,” in content and workflow management for library web sites: case studies, ed. holly yu (hersey, pa: information science publishing, 2005), 1–21; wayne powel and chris gill, “web content management systems in higher education,” educause quarterly 26, no. 2 (2003): 43– 50; goans, leach, and vogel, “beyond html.” 12. kmetz and bailey, “migrating a library’s web site.” 13. carnegie foundation for the advancement of teaching, 2010 classification of institutions of higher education, accessed february 4, 2013, http://classifications.carnegiefoundation.org/descriptions/basic.php. 14. ronald r. powell , basic research methods for librarians (greenwood, 1997). http://classifications.carnegiefoundation.org/descriptions/basic.php identifying emerging relationships in healthcare domain journals via citation network analysis kuo-chung chu, hsin-ke lu, and wen-i liu information technology and libraries | march 2018 39 kuo-chung chu (kcchu@ntunhs.edu.tw) is professor, department of information management, and dean, college of health technology, national taipei university of nursing and health sciences; hsin-ke lu (sklu@sce.pccu.edu.tw) is associate professor, department of information management, and dean, school of continuing education, chinese culture university; wen-i liu (wenyi@ntunhs.edu.tw, corresponding author) is professor, department of nursing, and dean, college of nursing, national taipei university of nursing and health sciences. abstract online e-journal databases enable scholars to search the literature in a research domain or to crosssearch an interdisciplinary field. the key literature can thereby be efficiently mapped. this study builds a web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. the system focuses on the pubmed central dataset and facilitates specific keyword searches in each research domain for authors, journals, and core issues. in addition, we use data mining techniques for co-citation analysis. the results could help researchers develop an in-depth understanding of the research domain. an automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. the proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers. introduction healthcare is a multidisciplinary research domain of medical services provided both inside and outside a hospital or clinical setting. article retrieval for systematic reviews in the domain is much more elusive than retrieval for reviews in clinical medicine because of the interdisciplinary nature of the field and the lack of a significant body of evaluative literature. other connecting research fields consist of the respective research fields of the application domain (i.e., the health sciences, including medicine and nursing).1 in addition, valuable knowledge and methods can be taken from the fields of psychology, the social sciences, economics, ethics, and law. further, the integration of those disciplines is attracting increasing interest.2 researchers may use bibliometrics to evaluate the influence of a paper or describe the relationship between citing and cited papers. citation analysis, one of several possible bibliometric approaches, is more popular than others because of the advent of information technologies.3 citation analysis counts the frequency of cited papers from a set of citing papers to determine the most influential scholars, publications, or universities in a discipline. it can be classified into two basic types: the first type counts only the citations in a paper that are authored by an individual, while the second mailto:kcchu@ntunhs.edu.tw mailto:sklu@sce.pccu.edu.tw mailto:wenyi@ntunhs.edu.tw identifying emerging issues in the healthcare domain | chu, lu, and liu 40 https://doi.org/10.6017/ital.v37i1.9595 type analyzes co-citations to identify intellectual links among authors in different articles. this paper focuses on the second type of citation analysis. small defined co-citation analysis as “the frequency with which two items of earlier literature are cited together by the later literature.”4 it is not only the most important type of bibliometric analysis, but also the most sophisticated and popular method. many other methods originate from citation analysis, including document co-citation analysis, bibliographic coupling,5 author cocitation analysis,6 and co-word analysis.7 there are levels of co-citation analysis: document, author, and journal. co-citation could be used to establish a cluster or “core” of earlier literature.8 the pattern of links between documents can establish a structure to highlight the relationship of research areas. citation patterns change when previously less-cited papers are cited more frequently, or old papers are no longer cited. changing citation patterns imply the possibility of new developments in research areas; furthermore, we can investigate changing patterns to understand the scientific trend within a research domain.9 co-citation analysis can help obtain a global overview of research domains.10 the aim of this paper is to detect emerging issues in the healthcare research domain via citation network analysis. our results can provide a basis for knowledge that researchers can use to construct a search strategy. structural knowledge is intrinsic to problem solving. because of the interdisciplinary nature of the healthcare domain and the broadness of the term, research is performed in several research fields, such as nursing, nursing informatics, long-term care, medical informatics, geriatrics, information technology, telecommunications, and so forth. although electronic journals enable searching by author, article, and journal title using keywords or full text, the results are limited to article content and references and therefore do not provide an in-depth understanding of the knowledge structure in a specific domain. the knowledge structure includes the core journals, core issues, the analysis of research trends, and the changes in focus of researchers. for a novice researcher, however, the literature survey remains a troublesome process in terms of precisely identifying the key articles that highlight the overview concept in a specific domain. the process is complicated and time-consuming, and it limits the number of articles collected for retrospective research. the objective of this paper is to provide information about the challenges and methodology of relevant literature retrieval by systematically reviewing the effectiveness of healthcare strategies. to this end, we build a platform for automatically gathering the full text of ejournals offered by the pubmed central (pmc) database.11 we then analyze the co-citation results to understand the research theme of the domain. methods this paper tries to build a value-added literature database system for co-citation analysis of healthcare research. the results of the analysis will be visually presented to provide the structure of the domain knowledge to increase the productivity of researchers. information technology and libraries | march 2018 41 dataset for co-citation analysis, a data source of related articles on healthcare is required. for this paper, the articles were retrieved from the pmc database using search terms related to the healthcare domain. to build the article analysis system, we used bibliometrics to locate the relevant references while analysis techniques were implemented by the association rule algorithm of data mining. the pmc database, which is produced by the us national institutes of health and is implemented and maintained by the us national center for biotechnology information of the us national library of medicine, provides electronic articles from more than one thousand full-text journals for free. we could understand the publication status from the open access subset (oas) and access to the oai (open archives initiative) protocol for metadata harvesting, which includes the full text in xml and pdf. regarding access permission, pmc offers a dataset of many open access journal articles. this paper used a dedicated xml-formatted dataset (https://www.ncbi.nlm.nih.gov/pmc/tools/oai/). the xml-formatted dataset followed the specification of dtd (document type definition) files, which are sorted by journal title. each article has a pmcid (pmc identification), which is useful for data analysis. in addition to the dataset, the pmc also provides several web services to help widely disseminate articles to researchers. pubmed central (pmc) citation database searching module citation module web view users data sourcemiddle-end pre-processeingback-end front-end xml files web serverdb server keyword co-citation module statistical module figure 1. the system architecture of citation analysis with four subsystems. https://www.ncbi.nlm.nih.gov/pmc/tools/oai/ identifying emerging issues in the healthcare domain | chu, lu, and liu 42 https://doi.org/10.6017/ital.v37i1.9595 system architecture our development environment consisted of the following four subsystems: front-end, middle-end, back-end, and pre-processing. the front-end creates a “web view,” a visualization of the results for our web-based co-citation analysis system. the system architecture is shown in figure 1. front-end development subsystem we used adobe dreamweaver cs5 as a visual development tool for the design of web templates. the php programming language was chosen to build the co-citation system that would be used to access and analyze the full-text articles. in terms of the data mining technique, we implemented the apriori algorithm with the php language.12 the results were exported as xml to a charting process, where we used amcharts (https://www.amcharts.com/), to create stock charts, column charts, pie charts, scatter charts, line charts, and so forth. middle-end server subsystem the system architecture was a microsoft windows-based environment with a xampp 2.5 web server platform (https://www.apachefriends.org/download.html). xampp is a cross-platform web development kit that consists of apache, mysql, php, and perl. it works across several operating systems, such as linux, windows, apache, macos, and oracle solaris, and provides ssl encryption, a phpmyadmin database management system, webalizer traffic management and control suite, a mail server (mercury mail transport system), and filezilla ftp server. back-end database subsystem to speed up co-citation analysis, the back-end database system used mysql 5.0.51b with interface phpmyadmin 2.11.7 for easy management of the database. mysql includes the following features: • using c and c++ to code programs, users can develop an application programming interface (api) through visual basic, c, c + +, eiffel, java, perl, php, python, ruby, and tcl languages with the multithreading capability that can be used in multi-cpu systems and easily linked to other databases. • performance of querying articles is quick because sql commands are optimally implemented, providing many additional commands and functions for a user-friendly and flexible operating database. an encryption mechanism is also offered to improve data confidentiality. • mysql can handle a large-scale dataset. the storage capacity is up to 2tb for win32 nts systems and up to 4tb for linux ext3 systems. • it provides the software myodbc as an odbc driver for connecting many programming languages, and it several languages and character sets to achieve localization and internationalization. pre-processing subsystem the pmc provides access to the article via oas, oai services, e-utilities, and ftp. we used ftp to download a compressed (zip) file packaged with a filename following the pattern “articles?-?.xml.tar.gz” on october 28, 2012 (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc), where “?-?” is “0-9” or “a-z”. the size of the zip file was approximately 6.17gb. after extraction, the size of the articles was approximately 10gb. the 571,890 articles from 3,046 journals were grouped and https://www.amcharts.com/ https://www.apachefriends.org/download.html ftp://ftp.ncbi.nlm.nih.gov/pub/pmc information technology and libraries | march 2018 43 sorted by journal title in a folder labeled with an abbreviated title. an xml file would, for example, be named “aapsj-10-1-2751445.nxml,” where “aapsj” was the abbreviated title of the journal american association of pharmaceutical scientists journal, “10” was the volume of the journal, “1” was number of the issue, and “2751445” was the pmcid. we used related technologies for developing systems that include php language, array usage, and the apriori algorithm to analyze the articles and build the co-citation system.13 finally, several analysis modules were created to build an integrated co-citation system. research procedure the following is our seven-step research procedure to fulfill the integrated co-citation system: 1. parse xml file: select tags for construction of database; choose fields for co-citation analysis (for example, , , and ). 2. present web-based article: design webpage and css style; present web-based xml file by indexing variable . 3. build an abstract database: the database consists of several fields: , , , , , , and . 4. develop searching module: pass the keyword to the method “post” in sql query language and present the search result in the webpage. 5. develop statistical module: the statistical results include number of article and cited articles, the journals and authors cited in all articles, and the number of cited articles. 6. develop citation module: visually present the statistical results in several formats; rank searched journals; rank searched and cited journals in all the articles. 7. develop co-citation module: analyze the association between articles with the apriori algorithm. association rule algorithms the association rule (ar), usually represented by ab, means that the transaction containing item a also contains item b. there are many such rules in most of the dataset, but some were useless. to validate the rules, two indicators, support and confidence, can be applied. support, which means usefulness, is the number of times the rules feature in the transactions, whereas confidence means certainty, which is the probability that b occurs whenever the a occurs. we chose the rules for which the values of both support and confidence were greater than a predefined threshold. for example, a rule stipulating “toastjam” has support of 1.2 percent and confidence of 65 percent, implying that 1.2 percent of the transactions contain “toast” and “jam” and that 65 percent of the transactions containing “toast” also contained “jam.” the principle for generating the ar is based on two features of the documents: (1) find the highfrequency items that set their supports greater than the threshold; (2) for each dataset x and its subnet y, check the rule xy if the support is greater than the threshold, in which the rule xy means that the occurrence in the rule containing x also contains y. most studies focus on searching high-frequency item sets.14 the most popular approach for identifying the item sets is apriori algorithm, as shown in figure 2.15 the algorithm rationale is that if the support of item set i is less identifying emerging issues in the healthcare domain | chu, lu, and liu 44 https://doi.org/10.6017/ital.v37i1.9595 than or equal to the threshold, i is not a high-frequency item set. new item set i that inserts any item a into i would not be a high-frequency item set. according to the rationale, the apriori algorithm is an iteration-based approach. first, it generates candidate item set c1 by calculating the number of occurrences of each attribute and finding that the high-frequency item set l1 has support greater than the threshold. second, it generates item set c2 by joining l1 to c1, iteratively finding l2 and generating c3, and so on. 1: l1 = {large 1-item sets}; 2: for (k=2; lk-1; k++) do begin 3: ck = candidate_gen (lk-1); 4: for all transactions td do begin /* generate candidate k-dataset*/ 5: ct = subset (ck, t); 6: for all candidates c ct do 7: c_count=c_count+1; 8: end 9: lk ={cck | c_count ≥ minsuppport} 10: end 11: return l = lk; figure 2. the apriori algorithm. the apriori algorithm is one of the most commonly used methods for ar induction. the candidate_gen algorithm, as shown in figure 3, includes join and prune operations for generating candidate sets.16 steps 1 to 4 generate all possible candidate item sets c from lk-1. steps 5 to 8: delete the item set that is not a frequent item set by the apriori algorithm. step 9 returns candidate set ck to the main algorithm. 1: for each item set x1 lk-1 2: for each item set x2 lk-1 3: c = join (x1[1], x1[2], x1[k-2], x1[k-1], x2[k-1]) 4: where x1[1] = x2[1], x1[k-2] = x2[k-2], x1[k-1] < x2[k-1]; 5: for item sets c ck do 6: for all (k-1)-subsets s of c do 7: if (s lk-1) then add c to ck; 8: else delete c from ck; 9: return ck; figure 3. the candidate_gen algorithm. information technology and libraries | march 2018 45 results we searched the pmc database with keywords “healthcare,” “telecare,” “ecare,” “ehealthcare,” and “telemedicine” and located 681 articles with a combined 14,368 references. values were missing from the year field for 4 of the references; this was also the case for 635 of a total of 52,902 authors. according to the keyword search for the healthcare domain, a pie chart of the journal citation analysis, as shown in figure 4, the top-ranked journal in terms of citations was the british medical journal (bmj). it was cited approximately 439 times, 18.89 percent of the total, followed by the journal of the american medical association (jama), which was cited approximately 344 times, 14.80 percent of the total. the trend of healthcare citation 1852 to 2009 peaked in 2006 at approximately 1,419 citations, with more than half of the total occurring in this year. figure 4. top-cited journals in the healthcare domain by percentage of total citations (n = 2324) with the keyword search for the healthcare domain, figure 5 shows a pie chart of the author citations. the most-cited author was j. w. varni, professor of pediatric cardiology at the university of michigan mott children’s hospital in ann arbor. this author was cited approximately 149 times, equivalent to 23.24 percent of the total, followed by d. n. herndon, professor at the department of plastic and hand surgery, friedrich-alexander university of erlangen in germany. this author was cited approximately 73 times, 11.39 percent of the total. by identifying the affiliations of the topranked authors, researchers can access related information in their field of interest. the co-citation analysis was conducted using the apriori algorithm. the relationship of co-citation journals with a supporting degree greater than 38 from 1852 to 2009 is shown in figure 6. each identifying emerging issues in the healthcare domain | chu, lu, and liu 46 https://doi.org/10.6017/ital.v37i1.9595 journal was denoted by a node, where the node with double circle meant the journal is co-cited with the other in a citing article. bmj, which covers the fields of evidence-based nursing care, obstetrics, healthcare, nursing knowledge and practices, and others, is the core journal of the healthcare domain. figure 5. top-cited authors in journals of the healthcare domain by percentage of total citations (n = 641) figure 6. the relationship of co-citation journals with bmj. information technology and libraries | march 2018 47 to identify the focus of the journal, we analyze the co-citation in three periods. in 1852–1907, journals are not in co-citation relationships; in 1908–61, five candidates had a supporting degree greater than 1 (see table 1); and in 1962–2009, twenty-eight candidates had a supporting degree greater than 14 (see table 2 (for example, bmj and lancet had sixty-eight co-citations). table 1. candidates in co-citation analysis with a supporting degree greater than 1 (1908–61). no journals no. of journals co-cited support 1 publ math inst hung acad sci, publ math 2 3 2 jaoa, j osteopath 2 1 3 antioch rev, j abnorm soc psychol 2 1 4 n engl j med, am surg 2 1 5 arch neurol psychiatry, j neurol psychopathol, z ges neurol psychiat 3 1 table 2. candidates in co-citation analysis with a supporting degree greater than 14 (1962–2009). no journals no. of journals co-cited support 1 bmj, lancet 2 68 2 bmj, jama 2 65 3 jama, med care 2 64 4 bmj, arch intern med 2 61 5 lancet, jama 2 52 6 soc sci med, bmj 2 52 7 jama, arch intern med 2 51 8 lancet, med care 2 50 9 crit care med, prehospital disaster med 2 49 10 n engl j med, bmj 2 49 11 n engl j med, lancet 2 49 12 n engl j med, jama 2 47 13 n engl j med, med care 2 47 14 qual saf health care, bmj 2 47 15 bmj, crit care med 2 42 16 med care, bmj 2 38 17 n engl j med, j bone miner res 2 33 identifying emerging issues in the healthcare domain | chu, lu, and liu 48 https://doi.org/10.6017/ital.v37i1.9595 18 n engl j med, j pediatr surg 2 26 19 lancet, j pediatr surg 2 25 20 jama, nature 2 25 21 lancet, jama, bmj 3 24 22 n engl j med, lancet, bmj 3 21 23 intensive care med, bmj 2 21 24 bmj, n engl j med, jama 3 20 25 n engl j med, jama, lancet 3 20 26 jama, med care, lancet 3 14 27 jama, med care, n engl j med 3 14 28 bmj, jama, lancet, n engl j med 4 14 the link of co-citation journals in three periods from 1852 to 2009 can be summarized as follows: (1) three journals were highly cited but were not in a co-citation relationship in 1852–1907 (see figure 7); (2) five clusters of the healthcare journals in co-citation relationships were found for the years 1908–61 (see figure 8); and (3) 1962–2009 had a distinct cluster of four journals within the healthcare domain (see figure 9). figure 7. the relationship of co-citation journals for the healthcare domain in 1852–1907. information technology and libraries | march 2018 49 figure 8. the relationship of co-citation journals for the healthcare domain in 1908–61. journals with double circles are co-cited with the other in a citing article. journals with triple circles are cocited with the other two in a citing article. figure 9. the relationship of co-citation journals for the healthcare domain in 1962–2009. the thick line and circle indicates the journals are co-cited in a citing article. conclusions identifying emerging issues in the healthcare domain | chu, lu, and liu 50 https://doi.org/10.6017/ital.v37i1.9595 this paper presented an automated literature system for co-citation analysis to facilitate understanding of the sequence structure of journal articles cited in the healthcare domain. the system visually presents the results of its analysis to help researchers quickly identify the key articles that provide an overview of the healthcare domain. this paper used the keywords related to healthcare for its analysis and found that bmj is a core journal in the domain. the co-citation analysis found a single cluster within the healthcare domain comprising four journals: bmj, jama, lancet, and the new england journal of medicine. this paper focused on a co-citation analysis of journals. authors, articles, and issues featured in the co-citation analysis can be further studied in an automated way. a period analysis of publication years is also important. further analyses can facilitate understanding of the changes in a research domain and the trend of research issues. in addition, the automatic generation of a map would be a worthwhile topic for the future study. acknowledgements this article was funded by the ministry of science and technology of taiwan (most), formerly known as national science council (nsc), with grant no: nsc 100-2410-h-227-003. for the remaining authors none were declared. all the authors have made significant contributions to the article and agree with its content. there is no known conflict of interest in this study. references 1 a. kitson et al., “what are the core elements of patient-centered care? a narrative review and synthesis of the literature from health policy, medicine and nursing,” journal of advanced nursing 69 (2013): 4–8, https://doi.org/10.1111/j.1365-2648.2012.06064.x. 2 s. j. brownsell et al., “future systems for remote health care,” journal of telemedicine and telecare 5 (1999): 145–48, https://doi.org/10.1258/1357633991933503; b. g. celler, n. h. lovell, and d. k. chan, “the potential impact of home telecare on clinical practice,” medical journal of australia 171 (1999): 518–20; r. walker et al., “what it will take to create new internet initiatives in health care,” journal of medical systems 27 (2003): 95–98, https://doi.org/10.1023/a:1021065330652. 3 i. marshakova-shaikevich, the standard impact factor as an evaluation tool of science fields and scientific journals,” scientometrics 35 (1996): 283–85, https://doi.org/10.1007/bf02018487; i. marshakova-shaikevich, “bibliometric maps of field of science,” information processing & management 41(2005):1536–45, https://doi.org/10.1016/j.ipm.2005.03.027; a. r. ramosrodrí guez and j. ruí z-navarro, “changes in the intellectual structure of strategic management research: a bibliometric study of the strategic management journal, 1980–2000,” strategic management journal 25, no. 10 (2004): 982–1000, https://doi.org/10.1002/smj.397. 4 h. small, “co-citation in the scientific literature: a new measure of the relationship between two documents,” journal of american society for information science 24 (1973): 266–68. https://doi.org/10.1111/j.1365-2648.2012.06064.x https://doi.org/10.1258/1357633991933503 https://doi.org/10.1023/a:1021065330652 https://doi.org/10.1007/bf02018487 https://doi.org/10.1016/j.ipm.2005.03.027 https://doi.org/10.1002/smj.397 information technology and libraries | march 2018 51 5 m. m. kessler, “bibliographic coupling between scientific papers,” american documentation 14 (1963): 10–25, https://doi.org/10.1002/asi.5090140103; b. h. weinberg, “bibliographic coupling: a review,” information storage and retrieval 10 (1974): 190–95. 6 h. d. white and b. c. griffith, “author cocitation: a literature measure of intellectual structure,” journal of the american society for information science 32 (1981): 164–70, https://doi.org/10.1002/asi.4630320302. 7 y. ding, g. g. chowdhury, and s. foo, “bibliometric cartography of information retrieval research by using co-word analysis,” information processing & management 37 no. 6 (november 2001): 818–20, https://doi.org/10.1016/s0306-4573(00)00051-0. 8 small, “co-citation,” 266. 9 d. sullivan et al., “understanding rapid theoretical change in particle physics: a month-bymonth co-citation analysis,” scientometrics 2 (1980): 312–16, https://doi.org/10.1007/bf02016351. 10 n. shibata et al., “detecting emerging research fronts based on topological measures in citation networks of scientific publications,” technovation 28 (2008): 762–70, https://doi.org/10.1016/j.technovation.2008.03.009. 11 weinberg, “bibliographic coupling.” 12 white and griffith, “author cocitation.” 13 r. agrawal and r. srikant. “fast algorithm for mining association rules in large databases” (paper, international conference on very large databases [vldb], september 12–15, 1994, santiago de chile). 14 r. agrawal, t. imielinski, and a. swami, “mining association rules between sets of items in large databases” (paper, acm sigmod international conference on management of data, washington, dc, may 25–28, 1993. 15 agrawal and srikant, “fast algorithm,” 3. 16 ibid., 4. https://doi.org/10.1002/asi.5090140103 https://doi.org/10.1002/asi.4630320302 https://doi.org/10.1016/s0306-4573(00)00051-0 https://doi.org/10.1007/bf02016351 https://doi.org/10.1016/j.technovation.2008.03.009 abstract introduction methods dataset system architecture front-end development subsystem middle-end server subsystem back-end database subsystem pre-processing subsystem research procedure association rule algorithms results conclusions acknowledgements references web services and widgets for library information systems | han 87on the clouds: a new way of computing | han 87 shape cloud computing. for example, sun’s well-known slogan “the network is the computer” was established in late 1980s. salesforce.com has been providing on-demand software as a service (saas) for customers since 1999. ibm and microsoft started to deliver web services in the early 2000s. microsoft’s azure service provides an operating system and a set of developer tools and services. google’s popular google docs software provides web-based word-processing, spreadsheet, and presentation applications. google app engine allows system developers to run their python/java applications on google’s infrastructure. sun provides $1 per cpu hour. amazon is well-known for providing web services such as ec2 and s3. yahoo! announced that it would use the apache hadoop framework to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. these examples demonstrate that cloud computing providers are offering services on every level, from hardware (e.g., amazon and sun), to operating systems (e.g., google and microsoft), to software and service (e.g., google, microsoft, and yahoo!). cloud-computing providers target a variety of end users, from software developers to the general public. for additional information regarding cloud computing models, the university of california (uc) berkeley’s report provides a good comparison of these models by amazon, microsoft, and google.4 as cloud computing providers lower prices and it advancements remove technology barriers—such as virtualization and network bandwidth—cloud computing has moved into the mainstream.5 gartner stated, “organizations are switching from factors related to cloud computing: infinite computing resources available on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., cpu by hour, storage by day).2 national institute of standards and technology (nist) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 as there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. the term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the web. this article discusses using cloud computing on an it-infrastructure level, including building virtual server nodes and running a library’s essential computer systems in remote data centers by paying a fee instead of running them on-site. the article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvantages of using the new approach. all kinds of clouds major it companies have spent billions of dollars since the 1990s to on the clouds: a new way of computing this article introduces cloud computing and discusses the author’s experience “on the clouds.” the author reviews cloud computing services and providers, then presents his experience of running multiple systems (e.g., integrated library systems, content management systems, and repository software). he evaluates costs, discusses advantages, and addresses some issues about cloud computing. cloud computing fundamentally changes the ways institutions and companies manage their computing needs. libraries can take advantage of cloud computing to start an it project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. s cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on providing access to scholarly materials and research data. there is a growing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. because of the current economic crisis, academic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. one of the hottest topics in it is cloud computing. cloud computing is not new to many of us because we have been using some of its services, such as google docs, for years. in his latest book, the big switch: rewiring the world, from edison to google, carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” his examples include amazon’s ec2 (elastic computing cloud), and s3 (simple storage) services.1 amazon’s chief technology officer proposed the following yan hantutorial yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson. 88 information technology and libraries | june 201088 information technology and libraries | june 2010 company-owner hardware and software to per-use service-based models.”6 for example, the u.s. government website (http://www.usa .gov/) will soon begin using cloud computing.7 the new york times used amazon’s ec2 and s3 services as well as a hadoop application to provide open access to public domain articles from 1851 to 1922. the times loaded 4 tb of raw tiff images and their derivative 11 million pdfs into amazon’s s3 in twenty-four hours at very reasonable cost.8 this project is very similar to digital library projects run by academic libraries. oclc announced its movement of library management services to the web.9 it is clear that oclc is going to deliver a web-based integrated library system (ils) to provide a new way of running an ils. duraspace, a joint organization by fedora commons and dspace foundation, announced that they would be taking advantage of cloud storage and cloud computing.10 on the clouds computing needs in academic libraries can be placed into two categories: user computing needs and library goals. user computing needs academic libraries usually run hundreds of pcs for students and staff to fulfill their individual needs (e.g., microsoft office, browsers, and image-, audio-, and video-processing applications). library goals a variety of library systems are used to achieve libraries’ goals to support research, learning, and teaching. these systems include the following: ■■ library website: the website may be built on simple html webpages or a content management system such as drupal, joomla, or any home-grown php, perl, asp, or jsp system. ■■ ils: this system provides traditional core library work such as cataloging, acquisition, reporting, accounting, and user management. typical systems include innovative interfaces, sirsidynix, voyager, and opensource software such as koha. ■■ repository system: this system provides submission and access to the institution’s digital collections and scholarship. typical systems include dspace, fedora, eprints, contentdm, and greenstone. ■■ other systems: for example, federated search systems, learning object management systems, interlibrary loan (ill) systems, and reference tracking systems. ■■ public and private storage: staff file-sharing, digitization, and backup. due to differences in end users and functionality, most systems do not use computing resources equally. for example, the ils is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. cloud computing brings a fundamental shift in computing. it changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. the availability of cloud computing providers allows organizations to focus on their business and leave general computing maintenance to the major it companies. in the fall of 2008, the author started to research cloud computing providers and how he could implement cloud computing for some library systems to save staff and equipment costs. in january 2009, the author started his plan to build library systems “on the clouds.” the university of arizona libraries (ual) has been a key player in the process of rebuilding higher education in afghanistan since 2001. ual librarian atifa rawan and the author have received multiple grant contracts to build technical infrastructures for afghanistan’s academic libraries. the technical infrastructure includes the following: ■■ afghanistan ils: a bilingual ils based on the open-source system koha.11 ■■ afghanistan digital libraries website (http://www.afghan digitallibraries.org/): originally built on simple html pages, later rebuilt in 2008 using the content management system joomla. ■■ a digitization management system. the author has also developed a japanese ill system (http://gif project.libraryfinder.org) for the north american coordinating council on japanese library resources. these systems had been running on ual’s internal technical infrastructure. these systems run in a complex computing environment, require different modules, and do not use computing resources equally. for example, the afghan ils runs on linux, apache, mysql, and perl. its opac and staff interface run on two different ports. the afghanistan digital libraries website requires linux, apache, mysql, and php. the japanese ill system was written in java and runs on tomcat. there are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ these systems need to be accessed in a system mode by people who are not ual employees. ■■ system rebooting time can be substantial in this infrastructure because of server setup and it policy. ■■ the current on-site server has web services and widgets for library information systems | han 89on the clouds: a new way of computing | han 89 reached its life expectancy and requires a replacement. by analyzing the complex needs of different systems and considering how to use resources more effectively, the author decided to run all the systems through one cloud computing provider. by comparing the features and the costs, linode (http://www.linode.com/) was chosen because it provides full ssh and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. in addition, other customers have provided positive reviews. in january 2009, the author purchased one node located in fremont, california, for $19.95 per month. an implementation plan (see appendix) was drafted to complete the project in phases. the author owns a virtual server and has access to everything that a physical server provides. in addition, the provider and the user community provided timely help and technical support. the migration of systems was straightforward: a linux kernel (debian 4.0) was installed within an hour, domain registration was complete and the domains went active in twenty-four hours, the afghanistan digital libraries’ website (based on joomla) migration was complete within a week, and all supporting tools and libraries (e.g., mysql, tomcat, and java sdk) were installed and configured within a few days. a month later, the afghanistan ils (based on koha) migration was completed. the ill system was also migrated without problem. tests have been performed in all these systems to verify their usability. in summary, the migration of systems was very successful and did not encounter any barriers. it addresses the issues facing us: after the migration, ssh log-ins for users who are not university employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. the hardware is maintained by the provider. the administrative gui for the linux nodes is shown in figure 1. since migration, no downtime because of hardware or other failures caused by the provider has been observed. after migrating all the systems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). another linux node (located in atalanta, georgia) was purchased for backup and monitoring (see figure 2). nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. nagios provides the following functions: (1) monitoring critical computing components, such as the network, systems, services, and servers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. a backup script is also run as a prescheduled job to back up the systems on a regular basis. figure 1. linux node administration web interface figure 2. two linux nodes located in two remote data centers node 1: 64.62.xxx.xxx (fremont, ca) node 2: 74.207.xxx.xxx (atlanta, ga) nagios backup afghan digital libraries website afghan ils interlibrary loan system dspace 90 information technology and libraries | june 201090 information technology and libraries | june 2010 findings and discussions since january 2009, all the systems have been migrated and have been running without any issues caused by the provider. the author is very satisfied with the outcomes and cost. the annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 from the author ’s experience, cloud computing provides the following advantages over the traditional way of computing in academic institutions: ■■ cost-effectiveness: from the above example and literature review, it is obvious that using cloud computing to run applications, systems, and it infrastructure saves staff and financial resources. uc berkeley’s report and zawodny’s blog provide a detailed analysis of costs for cpu hours and disk storage.13 ■■ flexibility: cloud computing allows organizations to start a project quickly without worrying about up-front costs. computing resources such as disk storage, cpu, and ram can be added when needed. in this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ data safety: organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. this strategy is very difficult to achieve in a traditional off-site backup. ■■ high availability: cloud computing providers such as microsoft, google, and amazon have better resources to provide more up-time than almost any other organizations and companies do. ■■ the ability to handle large amounts of data: cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. on-demand data storage, high availability and data safety are critical features for academic libraries.14 however, readers should be aware of some technical and business issues: ■■ availability of a service: in several widely reported cases, amazon’s s3 and google gmail were inaccessible for a duration of several hours in 2008. the author believes that the commercial providers have better technical and financial resources to keep more up-time than most academic institutions. for those wanting no single point of failure (e.g., a provider goes out of business), the author suggests storing duplicate data with a different provider or locally. ■■ data confidentiality: most academic libraries have open-access data. this issue can be solved by encrypting data before moving to the clouds. in addition, licensing terms can be negotiated with providers regarding data safety and confidentiality. ■■ data transfer bottlenecks: accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ legal jurisdiction: legal jurisdiction creates complex issues for both providers and end users. for example, canadian privacy laws regulate data privacy in public and private sectors. in 2008, the office of the privacy commissioner of canada released a finding that “outsourcing of canada .com email services to u.s.-based firm raises questions for subscribers,” and expressed concerns about public sector privacy protection.15 this brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 summary the author introduces cloud computing services and providers, presents his experience of running multiple systems such as ils, content management systems, repository software, and the other system “on the clouds” since january 2009. using cloud computing brings significant cost savings and flexibility. however, readers should be aware of technical and business issues. the author is very satisfied with his experience of moving library systems to cloud computing. his experience demonstrates a new way of managing critical computing resources in an academic library setting. the next steps include using cloud computing to meet digital collections’ storage needs. cloud computing brings fundamental changes to organizations managing their computing needs. as major organizations in library fields, such as oclc, started to take advantage of cloud computing, the author believes that cloud computing will play an important role in library it. acknowledgments the author thanks usaid and washington state university for providing financial support. the author thanks matthew cleveland’s excellent work “on the clouds.” references 1. nicholars carr, the big switch: rewiring the world, from edison to google web services and widgets for library information systems | han 91on the clouds: a new way of computing | han 91 (london: norton, 2008). 2. werner vogels, “a head in the clouds—the power of infrastructure as a service” (paper presented at the cloud computing and in applications conference (cca ’08), chicago, oct. 22–23, 2008). 3. peter mell and tim grance, “draft nist working definition of cloud computing,” national institute of standards and technology (may 11, 2009), http:// csrc.nist.gov/groups/sns/cloud-computing/index.html (accessed july 22, 2009). 4. michael armbust et al., “above the clouds: a berkeley view of cloud computing,” technical report, university of california, berkeley, eecs department, feb. 10, 2009, http://www.eecs.berkeley .edu/pubs/techrpts/2009/eecs-200928.html (accessed july 1, 2009). 5. eric hand, “head in the clouds: ‘cloud computing’ is being pitched as a new nirvana for scientists drowning in data. but can it deliver?” nature 449, no. 7165 (2007): 963; geoffery fowler and ben worthen, “the internet industry is on a cloud—whatever that may mean,” wall street journal, mar. 26, 2009, http://online.wsj.com/article/ sb123802623665542725.html (accessed july 14, 2009); stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed july 8, 2009). 6. gartner, “gartner says worldwide it spending on pace to supass $3.4 trillion in 2008,” press release, aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed july 7, 2009). 7. wyatt kash, “usa.gov, gobierno usa.gov move into the internet cloud,” government computer news, feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed july 14, 2009). 8. derek gottfrid, “self-service, prorated super computing fun!” online posting, new york times open, nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed july 8, 2009). 9. oclc online computing library center, “oclc announces strategy to move library management services to web scale,” press release, apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed july 5, 2009). 10. duraspace, “fedora commons and dspace foundation join together to create duraspace organization,” press release, may 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed july 8, 2009). 11. yan han and atifa rawan, “afghanistan digital library initiative: revitalizing an integrated library system,” information technology & libraries 26, no. 4 (2007): 44–46. 12. fowler and worthen, “the internet industry is on a cloud.” 13. jeremy zawodney, “replacing my home backup server with amazon’s s3,” online posting, jeremy zawodny’s blog, oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed june 19, 2009). 14. yan han, “an integrated high availability computing platform,” the electronic library 23, no. 6 (2005): 632–40. 15. office of the privacy commissioner of canada, “tabling of privacy commissioner of canada’s 2005–06 annual report on the privacy act: commissioner expresses concerns about public sector privacy protection,” press release, june 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed july 14, 2009); office of the privacy commissioner of canada, “findings under the personal information protection and electronic documents act (pipeda),” (sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed july 14, 2009). 16. stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed july 8, 2009). appendix. project plan: building ha linux platform using cloud computing project manager: project members: object statement: to build a high availability (ha) linux platform to support multiple systems using cloud computing in six months. scope: the project members should identify cloud computing providers, evaluate the costs, and build a linux platform for computer systems, including afghan ils, afghanistan digital libraries website, repository system, japanese interlibrary loan website, and digitization management system. resources: project deliverable: january 1, 2009—july 1, 2009 92 information technology and libraries | june 201092 information technology and libraries | june 2010 phase i ■■ to build a stable and reliable linux platform to support multiple web applications. the platform needs to consider reliability and high availability in a cost-effective manner ■■ to install needed libraries for the environment ■■ to migrate ils (koha) to this linux platform ■■ to migrate afghan digital libraries’ website (joomla) to this platform ■■ to migrate japanese interlibrary loan website ■■ to migrate digitization management system phase ii ■■ to research and implement a monitoring tool to monitor all web applications as well as os level tools (e.g. tomcat, mysql) ■■ to configure a cron job to run routine things (e.g., backup ) ■■ to research and implement storage (tb) for digitization and access phase iii ■■ to research and build linux clustering steps: 1. os installation: debian 4 2. platform environment: register dns 3. install java 6, tomcat 6, mysql 5, etc. 4. install source control env git 5. install statistics analysis tool (google analytics) 6. install monitoring tool: ganglia or nagios 7. web applications 8. joomla 9. koha 10. monitoring tool 11. digitization management system 12. repository system: dspace, fedora, etc. 13. ha tools/applications note calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ a medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ ignore telecommunication cost, utility cost, and space cost. ■■ ignore software developer’s time because it is equal for both options. appendix. project plan: building ha linux platform using cloud computing (cont.) reproduced with permission of the copyright owner. further reproduction prohibited without permission. harvesting information from a library data warehouse su, siew-phek t;needamangala, ashwin information technology and libraries; mar 2000; 19, 1; proquest pg. 17 harvesting information from a library data warehouse data warehousing technology has been defined by john ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform. "1 this concept has been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. in the academic sector, several universities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and student data.2 these data warehouses across all industries and academia have met with varying degrees of success. data warehousing technology and its related issues have been widely discussed and published. 3 little has been done, however, on the application of this cutting edge technology in the library environment using library data. i motivation of project daniel boorstin, the former librarian of congress, mentions that "for most of western history, interpretation has far outrun data." 4 however, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." his insights tie directly to many large organizations that long have been rich in data but poor in information and knowledge. library managers are increasingly finding the importance of obtaining a comprehensive and integrated view of the library operations and the services it provides. this view is helpful for the purpose of making decisions on the current operations and for their improvement. due to financial and human constraints for library support, library managers increasingly encounter the need to justify everything they dofor example, the library's operation budget. the most frustrating problem they face is knowing that the information needed is available somewhere in the ocean of data but there is no easy way to obtain it. for example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. or, whether they satisfy users' needs. another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. such information can play a pivotal role in performing collection development and library management more efficiently and effectively. unfortunately, the data needed to make these types of decisions are often scattered in different files maintained siew-phek t. su and ashwin needamangala by a large centralized system, such as notis, that does not provide a general querying facility or by different file/ data management or application systems. this situation makes it very difficult and time-consuming to extract useful information. this is precisely where data warehousing technology comes in. the goal of this research and development work is to apply data warehousing and data mining technologies in the development of a library decision support system (loss) to aid the library management's decision making. the first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the george a. smathers libraries of the university of florida into a relational database system (microsoft access). data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational representation suitable for processing by the relational database management system. a graphical user interface (gui) is developed to allow decision makers to query for the data warehouse's contents using either some predefined queries or ad hoc queries. the second phase is to apply data mining techniques on the library data warehouse for knowledge discovery. this paper covers the first phase of this research and development work. our goal is to develop a general methodology and inexpensive software tools, which can be used by different functional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. for meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. i local environment the university of florida libraries has a collection of more than two million titles, comprising over three million volumes. it shares a notis-based integrated system with nine other state university system (sus) libraries for acquiring, processing, circulating, and accessing its collection. all ten sus libraries are under the consortium umbrella of the florida center for library automation (fcla). siew-phekt. su (pheksu@mail.uflib.ufl.edu) is associate chair of the central bibliographic services section, resource services department, university of florida libraries, and ashwin needamangala (nsashwin@grove.ufl.edu) is a graduate student at the electrical and computer engineering department, university of florida. harvesting information from a library data warehouse i su and needamangala 17 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i library data sources the university of florida libraries' online database, luis, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circulation data (charge out and browse information, withdrawn and inventory information), and owning location data (where item is shelved). these voluminous data are stored in separate files. the notis system as used by the university of florida does not provide a general querying facility for accessing data across different files. extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. this is a tedious task since many application programs would have to be written to meet the different information needs. the challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjointed files, and to bring them into a warehouse that is maintained by a database management system such as microsoft access. the selection of access and pc hardware for this project is motivated by cost consideration. we envision that multiple special purpose warehouses be established on multiple pc systems to provide decision support to different library units. the library decision support system (loss) is developed with the capability of handling and analyzing an established data warehouse. for testing our methodology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. these titles were published by domestic u.s. publishers and have a high percentage of dlc/dlc records (titles cataloged by the library of congress). they were acquired by firm order and approval plan, the publication coverage is the calendar year 1996-1997. analysis is only on the first item record (future project will include all copy holdings). although the size of the test data used is small, it is sufficient to test our general methodology and the functionality of our software system. fcla d82 tables and key list most of the data from the twenty-thousand-title domain that go into the loss warehouse are obtained from the db2 tables maintained by fcla. fcla developed and maintains the database of a system called ad hoc report request over the web (arrow) to facilitate querying and generating reports on acquisitions activities . the data are stored in 0b2 tables. 5 for our research and development purpose, we needed db2 tables for only the twenty-thousand titles that we identified as our initial project domain. these titles all have an identifiable 035 field in the bibliographic records (zybp1996, zybcip1996, zybp1997 or zybpcip1997). we used the batchbam program developed by gary strawn of northwestern university library to extract and list the unique bibliographic record numbers in separate files for fcla to pick up. 6 using the unique bibliographic record numbers, fcla extracted the 0b2 tables from the arrow database and exported the data to text files. these text files then were transferred to our system using the file transfer protocol (frp) and inserted as tables into the loss warehouse. bibliographic and item records extraction fcla collects and stores complete acquisitions data from the order records as db2 tables. however, only brief bibliographic data and no item record data are available . bibliographic and item record data are essential for inclusion in the loss warehouse in order to create a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. because these required data do not exist in any computer readable form, we designed a method to obtain them. using the identical notis key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. we then wrote a program in microsoft visual basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the loss warehouse. screen scraping concept screen scraping is a process used to capture data from a host application. it is conventionally a three-part process: • displaying the host screen or data to be scraped. • finding the data to be captured. • capturing the data to a pc or host file, or using it in another windows application. in other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. numerous commercial applications for screen scraping are available on the market. however, we used an approach slightly different from the conventional one. although we had to capture only certain fields from the notis screen, there were other factors that we had to take into consideration. they are: • the location of the various fields with respect to the screen coordinates changes from record to record . this makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. • the data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. for accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be captured within a specified time interval so that the data are uniform. this makes the time taken for capturing the data extremely important. • most of the fields present on the screen needed to be captured. taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. this made the process both simpler and faster . the unnecessary fields were filtered out during the cleanup process . i system architecture the architecture of the loss system is shown in figure 1 and is followed by a discussion on its components' functions. notis notis (northwestern online totally integrated system) was developed at the northwestern university library and introduced in 1970. since its inception, notis has undergone many versions. university of florida libraries is one of the earliest users of notis. fcla has made many local modifications of the notis system since uf libraries started using it. as a result, the uf notis is different from the rest of the notis world in many respects . notis can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (opac) at the university of florida libraries, the notis system runs on an ibm 370 main frame computer that runs the os/390 operating system . host explorer host explorer is a software program that provides a tcp /ip link to the main frame computer . it is a terminal emulation program supporting the ibm main frame, as/400, and vax hosts . host explorer delivers an enhanced user environment for all windows nt platforms, windows 95 and windows 3.x desktops. exact tn3270e, tn5250, vt420/320/220/101/100/52, wyse 50/60 and ansi-bbs display is extended to leverage the wealth of the windows desktop. it also supports all db2tables loss host explorer data cleansing and extraction warehouse graphical user interface figure 1. loss architecture and its components tcp /ip based tn3270 and tn3270e gateways. the host explorer program is used as the terminal emulation program in loss. it also provides vba compatible basic scripting tools for complete desktop macro development. users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. the function of host explorer in the loss is v ery simple. it has to "visit" all screens in the notis system corresponding to each notis number present in the batchbam file, and capture all the data on the screens. in order to do this, we wrote a macro that read the notis number one at a time from the batchbam file and input the number into the command string of host explorer . the macro essentially performed the following functions: • read the notis numbers from the batchbam file. • inserted the notis number into the command string of host explorer . • toggled the screen capture option in host explorer so that data are scraped from the screen only at necessary times. • saved all the scraped data into a flat file. after the macro has been executed, all the data scraped from the notis screen reside in a flat file. the data present harvesting information from a library data warehouse i su and needamangala 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. in this file have to be cleansed in order to make them suitable for insertion into the library warehouse. a visual basic program is written to perform this function. the details of this program will be given in the next section. i data cleansing and extraction this component of the loss is written in the visual basic programming language. its main function is to cleanse the data that have been scraped from the notis screen. the visual basic code saves the cleansed data in a text-delimited format that is recognized by microsoft access. this file is then imported into the library warehouse maintained by microsoft access. the detailed working of the code that performs the cleansing operation is discussed below. the notis screen that comes up for each notis number has several parts that are critical to the working of the code. they are: • notis number present in the top-right of the screen (in this case, akr9234) • field numbers that have to be extracted. example: 010::, 035:: • delimiters. the " i " symbol is used as the delimiter throughout this code. for example, in the 260 field of a bibliographic record, "i a" delimits the place of publication, " i b" the name of the publisher and, "i c" the date of publication. we shall now go step by step through the cleansing process. initially we have the flat file containing all the data that have been scraped from the notis screens. • the entire list of notis numbers from the batchbam file is read into an array called bam_number$. • the file containing the data that have been scraped is read into a single string called bibrecord$. • this string is then parsed using the notis numbers from the bam_number$ array. • we now have a string that contains a single notis record. this string is called single_record$. • the program runs in a loop till all the records have been read. • each string is now broken down into several smaller strings based on the field numbers. each of these smaller strings contains data pertaining to the corresponding field number. • a considerable amount of the data present on the notis screen is unnecessary from the point of view of our project. we need only certain fields from the notis screen. but even from these fields we need the data only from certain delimiters. therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each individual field. the data present in the other delimiters are discarded. • the data collected from the various fields and their corresponding delimiters are assigned to corresponding variables. some variables contain data from more than one delimiter concatenated together. the reason for this can be explained as follows. there are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. example: the catalog_source field which has data from " i a" and " i c" is of the form " i a dlc i c dlc" while the lang code field which has data from "i a" and" i h" is of the form" i a eng i h rus." but we split this into two fields: lang_code_l containing "eng" and lang_code_2 containing "rus." • the data collected from the various fields are saved in a flat file in the text-delimited format. microsoft access recognizes this format. a screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. the flat file, which we now have, can be imported into the library warehouse. i graphical user interface in order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. through the gui, the user can enact the following processes or operations through a main menu: • connection to notis • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating the first option opens hostexplorer and provides a connection to notis. lt provides a shortcut to closing or minimizing ldss and opening hostexplorer. the screen scraping option activates the data scraping process. the data cleansing and extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. the importing data option imports the data in the text-delimited format into the warehouse. the viewing collected data option allows the user to view the contents of a selected relational table stored in 20 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "record humber","system control humber","catalogin source","language codes 1","language code~ "akr9234", "ybp1996 0507--clarr done", "a dlc i c dlc ", "1 : i a eng "," i h rus", "e-ur-ru", "306/. 0~ "rks6472", "ybp1996 0507--clrrr done"," a dlc i c dlc ", "1 : i a eng "," i h rus", "hull", "891. 73/ 44 "aks6493", "ybp1996 0507--clarr done"," a dlc i c dlc ","hull", "hull", "hull"," 001. 4/225/ 028563 i ~f "ajx7554", "ybp1996 05 08--clarr done"," a uk i c uk ","hull", "hull", "e-uk---", "362. 1 / 068 12 2 o",' "akb3478", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-fr---", "843/. 7 12 2 o", "t " "akc6442","ybp19960508--clarr done","a dlc c dlc ","1 : la eng ","lh ger","e-fr---","194 12 "ake9837", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "e-gr---", "883/. 01 12 20",' "akk9486", "ybp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "e-uk---", "822/. 052309 12 ~% l'akl2258", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-xr---", "929. 4/2/ 08992401 1• "akm2455", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-gx---", "943. 086 12 2 o",' "akm4649", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "hull", "863/ .64 i 2 20", "hu] ' "akh0246","ybp19960508--clarr done","a dlc c dlc ","hull","hull","n-us--la e-uk-en","700/. "akh181 o", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull" ,"hull", "e-uk---", "305. 6/2 042/ 0903.: "akh3749","ybp19960508--clarr done","a dlc c dlc ","hull","hull","f-ke--la f-so --","327.{ "akq727 4", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "hull", "355. 4/2 12 2 o", "hu] "akq9180", "y.bp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "n-us---", "23 0/. 93/ 09 12 2,f "akr 0424", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "n-us-mi", "331 . 88/1292/ 097' "rkr1411", "ybp1996 05 08--clarr done"," a cl i c cl ","hull", "hull", "n-us---", "3 05. 896/ 073 12 2 o' "akr1846", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e-uk-ni", "hull", "hull", "x, "akr2169", "ybp1996jt5 08--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-sc", "323. 1/196073/ 091 "akr2245" ,"ybp19960508--c .larr d.one" ," a dlc i c dlc ","hull", "hull", "hull", "306 .4/6 i 2 20", "hu1 "akr2255", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "3 03. 48/2 12 2 o", "2r "akr226 o", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-", "3 03. 48/2 12 2 o", "akr2281", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "t-----i a r------", "333. , · "akr2287", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "57 4. 5/262 12 2 o", "t "rkr2357", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "361 . 6/1 / 094 12 l "akr2358", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull" ,"hull", "333. 7/2/01 12 20" ,' ¥' "akr2371", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "3 07. 72/ 094 12 211 "akr2386", "ybp1996 05 08--clarr done", "dlc i c dlci", "hull" ,/'hull", "e-uk---", "hull", "hull", "xu, "rkr25 03", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "575. 1 / 09 12 2 o", "hl 'i-r---· ---------·----figure 2. a text-delimited file the warehouse. the querying option activates ldss's querying facility that provides wizards to guide the formulations of different types of queries, as discussed later in this article . the last option, report generating, is for the user to specify the report to be generated. i data mining tool a very important component of loss is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. many data mining tools are now available in the commercial world. for our project, we are investigating the use of a neural-network-based data mining tool developed by limin fu of the university of florida.? the tool allows the discovery of association rules based on a set of training data provided to the tool. this part of our research and development work is still in progress . the existing gui and report generation facilities will be expanded to include the use of this mining tool. i library warehouse fcla exports the data existing in the 0b2 tables into text files. as a first step towards creating the database, these text files are transferred using ftp and form separate relational tables in the library warehouse. the data that harvesting information from a library data warehouse i su and needamangala 21 reproduced with permission of the copyright owner. further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. characteristics data in the warehouse are snapshots of the original data files. only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. data are filtered as they pass from the operational environment to the data warehouse environment. this filtering process is necessary particularly when a pc system, which has limited secondary storage and main memory space, is used. once extracted and stored in the warehouse, data are not updateable. they form a read-only database. however, different snapshots of the original files can be imported into the warehouse for querying and analysis. the results of the analyses of different snapshots can then be compared. structure data warehouses have a distinct structure. there are summarization and detail structures that demarcate a data warehouse. the structure of the library data warehouse is shown in figure 3. the different components of the library data warehouse as shown in figure 3 are: • notis and 0b2 tables. bibliographic and circulation data are obtained from notis through the screen scraping process and imported into the warehouse. fcla maintains acquisitions data in the form of db2 tables. these are also imported into the warehouse after conversion to a suitable format. • warehouse. the warehouse consists of several relational tables that are connected by means of relationships. the universal relation approach could have been used to implement the warehouse by using a single table. the argument for using the universal relation approach would be that all the collected data fall under the same domain. but let us examine why this approach would not have been suitable. the different data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. now, if all these data were incorporated into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. for example, if we group the circulation data and the pay data together in a single table, it would not make sense. however, the pay data and the circulation data are related through the bib_key. hence, our use of the conventional approach of havuser .....--------~----.----------......----=--___ bibliographic data view circulation data view ufbib, ufpay, ufinv, ufcirc, uford warehouse pay data view import screen scraping notis fcla db2 tables figure 3. structure of the library data warehouse ing several tables connected by means of relationships is more appropriate. • views. a view in sql terminology is a single table that is derived from other tables. these other tables could be base tables or previously defined views. a view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. in the context of the ldss, views can be implemented by means of the adhoc query wizard. the user can define a query /view using the wizard and save it for future use. the user can then define a query on this query i view. • summarization. the process of implementing views falls under the process of summarization. summarization provides the user with views, which make it easier for users to query on the data of their interests. as explained above, the specific warehouse we established consists of five tables. table names including "_wh" indicates that it contains current detailed data of the warehouse. current detailed data represents the most recent snapshot of data that has been taken from the notis system. the summarized views are derived from the current detailed data of the warehouse. since current detailed data of the warehouse are the basic data of the 22 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix a. i decision support by querying the warehouse the warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the bib_key (biblio_key). the data stored across these tables can be traver sed by matching the key values associated with their tuples or records . decision makers can issue all sorts of sql-type queries to retrieve useful information from the warehouse. two general types of queries can be distinguished : predefined queries and ad hoc queries . the former type refers to queries that are frequently used by decision makers for accessing information from different snapshots of data imported into the warehouse . the latter type refers to queries that are exploratory in nature. a decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a relationship. alternatively, data mining tools can be applied to analyze the data contents of the warehouse and discover rules of their relationships (or associations). predefined queries below are some sample queries posted in english. their corresponding sql queries can be processed using loss. l. number and percentage of approval titles circulated and noncirculated. 2. number and percentage of firm order titles circulated and noncirculated . 3. amount of financial resources spent on acquiring noncirculated titles. 4. number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles . 5. number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. numbers of original and "shared" cataloging records of noncirculated titles. 7. identify the broad subject areas of circulated and noncirculated titles . 8. identify titles that have been circulated "n" number of times and by subjects . 9. number of circulated titles without the 505 field. each of the above english queries can be realized by a number of sql queries. we shall use the first two english queries and their corresponding sql queries to explain how the data warehouse contents and the querying facility of microsoft access can be used to support decision making. the results of sql queries also are given . the first english query can be divided into two parts (see figure 4), each realized by a number of sql queries as shown below . sample query outputs query 1: number and percentage of approval titles circulated and noncirculated result : total approval titles circulated noncirculated 1172 980 192 83.76 % 16.24 % similar to the above sql queries, we can translate the second english query into a number of sql queries and the result is given below: query 2: number and percentage of firm order titles circulated and noncirculated result : total firm order titles circulated noncirculated report generation 1829 1302 527 71.18 % 28.82 % the results of the two predefined english queries can be presented to users in the form of a report. total titles 3001 approval 1172 39% circulated 980 83.76 % noncirculated 192 16.24 % firm order 1829 61% circulated 1302 71.18 % noncirculated 527 28 .82 % from the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 percent, as compared to firm order titles of 71.18 percent. it is important to note that the result of the above queries is taken from only one snapshot of the circulation data. analysis from several snapshots is needed in order to compare the results and arrive with reliable information. we now present a report on the financial resources spent on acquiring and processing noncirculated titles. in order to generate this report, we need the output of queries four and five listed earlier in this article. the corresponding outputs are shown below. query 4: number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles. harvesting information from a library data warehouse i su and needamangala 23 reproduced with permission of the copyright owner. further reproduction prohibited without permission. result: total dlc/dlc records circulated noncirculated 2852 2179 673 76.40% 23.60% query 5: number and percentage of "shared" cataloging records in circulated and noncirculated titles. result: total "shared" records circulated noncirculated 149 100 49 67.11% 32.89% in order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. for the sake of simplicity, we consider only the following factors: 1. the cost of cataloging each item with dlc/dlc record approval titles circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. loss can compute factors 3 and 4. the financial report , taking into consideration the value of the above factors, could be as shown below. processing cost of each dlc title = $10.00 673 x $10.00 = $ 6,730.00 processing cost of each shared title = $20.00 sql query t.o retrieve the distinct bibliographic keys of all the approval titles: select distinct bibscreen.bib_key from bibscreen right join pa yl on bibscreen.bib_key = pa y l.bib_num where (((payl.fund_key) like "*07*")); sql query to count the number of approval titles that have been circulated: select count (appr_title.bib_key) as countofbib_key from (bibscreen inner join appr_title on bibscreen.bib_key = appr _title.bib_key) inner join itemscreen on bibscreen.bib_key = itemscreen .biblio_key where (((itemscreen.charges)>0)) order by count(appr _title.bib_key); sql query to calculate the percentage: select cnt_appr_ti tle_circ.countofbib_ke y, int(([cnt_appr_titl e_circ]![countofbib _key])*lo0/ count([bibscreen)![bib_key])) as percent_apprcirc from bibscreen, cnt_appr_title _circ group by cnt _appr _title_circ.countofbib _key; approval titles noncirculated sql query for counting the number of approval titles that have not been circulated: select distinct count(appr_title.bib_key) as countofbib_ke y from (appr _title inner join bibscreen on appr_title.bib_key bibscreen.bib_key) inner join itemscreen on bibscreen .bib_key = itemscreen.biblio_ke y where ( ( (itemscreen.charges)=0) ); sql query to calculate the percentage: select cnt_appr_title_noncirc.countofbib_ke y, int(([cnt_appr_title_noncirc)![countofbib_ke y])*lo0/ count([bibscreen]! [bib _key]))) as percent_appr _noncirc from bibscreen, cnt_appr _title_noncirc group by cnt_appr_title_noncirc .countofbib_ke y; figure 4. example of an english query divided into two parts 24 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 49 x $20.00 = $ 980.00 average price paid per noncirculated item = $48.00 722 x $48.00 = $34,656.00 average size of book = 288 pages = 3 cm average cost of 1 cm of shelf space= $0.10 722 x $0.30 = $216.60 grand total = $42,582.60 again it is important to point out that several snapshots of the circulation data have to be taken to track and compare the different analyses before deriving the reliable information. ad hoc queries alternately, if the user wishes to issue a query that has not been predefined, the ad hoc query wizard can be used. the following example illustrates the use of the ad hoc query wizard. assume the sample query is: how many circulated titles in the english subject area cost more than $35? we now take you on a walk-through of the adhoc query wizard starting from the first step till the output is obtained. figure 4 depicts step 1 of the ad hoc query wizard. the sample query mentioned above requires the following fields: • biblio_key for a count of all the titles which satisfy the given condition. • charges to specify the criteria of "circulated title". • fund_key to specify all titles under the "english" subject area. • paid_amt to specify all titles which cost more than $35. step 2 of the ad hoc query wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. step 4 (figure 7) displays the user-defined query in sql form and allows the user to save the query for future reuse. the output of the query is shown below in figure 8. the figure shows the number of circulated titles in the english subject area that cost more than $35. alternatively, the user might wish to obtain a listing of these 33 titles. figure 9 shows the listing. i conclusion in this article, we presented the design and development of a library decision support system based on data warehousing and data mining concepts and techniques. we described the functions of the components of loss. the screen scraping and data cleansing and extraction figure 4. step 1: ad hoc query wizard ~ e.9,~lang__;c~,tfe ... 1 lik~ "'ft,f" j.esi: !han eg,. crfi;irget t 4 gr~er th'jn·eii, q:,arges,> 0 equal tci'e_g_cfiarge~= !1 not . . figure 5. step 2: ad hoc query wizard harvesting information from a library data warehouse i su and needamangala 25 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 6. step three : ad hoc query wizard figure 7. step four: ad hoc query wizard figure 8. query output figure 9. listing of query output 26 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. processes were described in detail. the process of importing data stored in luis as separate data files into the library data warehouse was also described. the data contents of the warehouse can provide a very rich information source to aid the library management in decision making. using the implemented system, a decision maker can use the gui to establish the warehouse, and to activate the querying facility provided by microsoft access to explore the warehouse contents . many types of queries can be formulated and issued against the database. experimental results indicate that the system is effective and can provide pertinent information for aiding the library management in making decisions. we have fully tested the implemented system using a small sample database . our on going work includes the expansion of the database size and the inclusion of a data mining component for association rule discovery. extensions of the existing gui and report generation facilities to accommodate data mining needs are expected. i acknowledgments we would like to thank professor stanley su for his support and advice on the technical aspect of this project. we would also like to thank donna alsbury for providing us with the 0b2 data, daniel cromwell for loading the 0b2 files and along with nancy williams and tim hartigan for their helpful comments and valuable discussions on this project. references and notes 1. john ladley , "operational data stores: building an effective strategy, " data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997). 2. information on har vard university's adapt proj ect. accessed march 8, 2000, www.adapt.harvard .edu/; information on the arizona state university data administration and institutional analysis warehou se. accessed march 8, 2000, www .asu .edu / data_admin / wh-1.html; information on the university of minnesota clarity project. accessed march 8, 2000,www.clarity.umn .edu/; information on the uc san diego darwin project. accessed march 8, 2000, www.act .ucsd .edu/ dw i darwin.html; information on university of wisconsinmadison infoaccess . accessed march 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; information on the univer sity of nebraska data warehouse-nulook. accessed march 8, 2000, www .nulook.uneb.edu /. 3. ramon barquin and herbert edelstein, eds ., building, using, and managing the data warehouse (englewood cliffs, n .j.: prentice hall , 1997); ramon barquin and herbert edelstein, eds ., planning and designing the data warehouse (upper saddle river, n.j .: prentice hall, 1996); joyce bischoff and ted alexander, data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997); jeff byard and donovan schneider, "the ins and outs (and everything in between) of data war ehousing ," acm sigmod 1996 tutorial notes, may 1996. accessed march 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; surajit chaudhuri and umesh dayal, "an overview of data warehousing and olap technolog ," acm sigmod record 26(1), march 1997. accessed march 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; b. devlin , data warehouse: from architecture to implementation (reading, mass.: addison-wesle y, 1997); u. fayyad and others, eds ., advances in knowledge discovery and data mining (cambridge, mass.: the mit pr., 1996); joachim hammer, "data war ehousing overview, terminology, and research issues." accessed march 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / overview / index .htm ; w. h. inmon, building the data warehouse (new york, n.y.: john wiley, 1996); ralph kimball , "dangerous preconceptions." accessed march 8, 2000, www .dbmsmag.com/9608d05.html ; ralph kimball , the data warehouse toolkit (new york, n.y.: john wiley, 1996); ralph kimball, "mastering data extraction," in dbms magazine, june 1996. (provides an overview of the process of extracting , cleaning, and loading data .) accessed march 8, 2000, www .dbmsmag.com / 9606d05 .html ; alberto mendelzon , "bibliography on data warehousing and olap." accessed march 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. daniel j. boorstin, "the age of negative discovery," cleopatra's nose: essays on the unexpected (new york: random hous e, 1994). 5. information on the arrow system . accessed march 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. gary strawn, "batchbaming." accessed march 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. li-min fu, "oomrul: leaming the domain rules ." accessed march 8, 2000, www .cise.ufl .edu / -fu / domrul.html. harvesting information from a library data warehouse i su and needamangala 27 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a warehouse data tables ufcirc_wh uford _wh ufpay_wh attribute domain attribute domain attribute domain bib_key text(s0) id autonumber inv_key text(20) status text(20) ord_num text(20) ord_num text(20) enum / chron text(20) ord_div number ord_div number midspine text(20) process_uni t text(20) process _unit text(20) temp_locatn text(20) bib_num text(20) bib_key text(20) pieces number order_da te da te / time ord_seq_num number ch arges number mod_date date / time inv_seq_num number last_use date / tune vendor_code text(20) status text(20) browse s number vndadr_order text(20 create_ date da te / tune value text(20) vndadr_claim text(20) lst_update da te / time invnt_date date / time vndadr_retum text(20) currency text(20) created date / time vend_ title_n um text(20) paid_am t num ber ord_unit text(20) usd_amt n u mber rcv_unit text(20) fund_key text(20) ufinv_wh ord_scope text(20 exp_class text(20) pur_ord_prod text(20) fiscal_year text(20) attribute domain action _int number copies number inv_key text(20) libspecl text(20) type_pay text(lo) create _dat e date / time libspec2 text(20) text text(20) mod_date date / time vend_note text(20) db2_11mestamp date / time approv _stat text(20) ord_note text(20) vend_adr _code text(20) source text(20) vend_code text(20) ref text(20) ufbib_wh action_date text(20) copyctl _num number attribute domain vend_inv _date date/tune mediu m text(20) approval_date date / tune piece_cnt n umber bib_key text(20) appro ver_id text(20) div_no te text(20) system_control _num text(s0) vend_inv _num text(20) acr_stat text(20) ca talog_source text(20) inv_tot number rel_stat text(20) lan g_code_l text(20) cale_ tot_rym ts num ber lst_date date / time lang_code_2 text(20) calc_net _tot_pymts number action_date text(20) geo_code text(20) currency text(20) libspec3 text(20) dewey_num text(20) discount_percen t number libspec4 text(20) edition text(20) vouch_no te text(20) encum b_units number pagina tion text(20) official_ vend text(20) currency text(20) size text(20) process _unit text(20) est_price number series_440 text(20) intemal_note text(20) encumb_outs num ber series_490 text(20) db2_ timestamp text(20) fund _key text(20) conten t text(20) fiscal_ year text(20) subject_l text(20) copies n u mber subject_2 text(20) xpay_method text(20) subject_3 text(20) vol_isu_date text(20) authors_l text(20) title_author text(20) au thors_2 text(20) db2_ timestamp date / time au th ors_3 text(20) series text(20) 28 information technology and libraries i march 2000 library discovery products: discovering user expectations through failure analysis irina trapido information technology and libraries | september 2016 9 abstract as the new generation of discovery systems evolve and gain maturity, it is important to continually focus on how users interact with these tools and what areas they find problematic. this study looks at user interactions within searchworks, a discovery system developed by stanford university libraries, with an emphasis on identifying and analyzing problematic and failed searches. our findings indicate that users still experience difficulties conducting author and subject searches, could benefit from enhanced support for browsing, and expect their overall search experience to be more closely aligned with that on popular web destinations. the article also offers practical recommendations pertaining to metadata, functionality, and scope of the search system that could help address some of the most common problems encountered by the users. introduction in recent years, rapid modernization of online catalogs has brought library discovery to the forefront of research efforts in the library community, giving libraries an opportunity to take a fresh look at such important issues as the scope of the library catalog, metadata creation practices, and the future of library discovery in general. while there is an abundance of studies looking at various aspects of planning, implementation, use, and acceptance of these new discovery environments, surprisingly little research focuses specifically on user failure. the present study aims to address this gap by identifying and analyzing potentially problematic or failed searches. it is hoped that focusing on common error patterns will help us gain a better understanding of users’ mental models, needs, and expectations that should be considered when designing discovery systems, creating metadata, and interacting with library patrons. terminology in this paper, we adopt a broad definition of discovery products as “tools and interfaces that a library implements to provide patrons the ability to search its collections and gain access to materials.”1 these products can be further subdivided into the following categories: irina trapido (itrapido@stanford.edu) is electronic resources librarian at stanford university libraries, stanford, california. mailto:itrapido@stanford.edu library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 10 • online catalogs (opacs)—patron-facing modules of an integrated library system. • discovery layers (also referred to as “discovery interfaces” or “next-generation library catalogs”)—new catalog interfaces, decoupled from the integrated library system and offering enhanced functionality, such as faceted navigation, relevance-ranked results, as well as the ability to incorporate content from institutional repositories and digital libraries. • web-scale discovery tools, which in addition to providing all interface features and functionality of next generation catalogs, broaden the scope of discovery by systematically aggregating content from library catalogs, subscription databases, and institutional digital repositories into a central index. literature review to identify and investigate problems that end users experience in the course of their regular searching activities, we analyzed digital traces of user interactions with the system recorded in the system’s log files. this method, commonly referred to as transaction log analysis, has been a popular way of studying information-seeking in a digital environment since the first online search systems came into existence, allowing researchers to monitor system use and gain insight into the users’ search process. server logs have been used extensively to examine user interactions with web search engines, consistently showing that web searchers tend to engage in short search sessions, enter brief search statements, do not browse the results beyond the first page, and rarely resort to advanced searching.2 a similar picture has emerged from transaction log studies of library catalogs. researchers have found that library users employ the same surface strategies: queries within library discovery tools are equally short and simply constructed; 3 the majority of search sessions consist of only one or two actions.4 patrons commonly accept the system’s default search settings and rarely take advantage of a rich set of search features traditionally offered by online catalogs, such as boolean searching, index browsing, term truncation, and fielded searching.5 although advanced searching in library discovery layers is uncommon, faceted navigation, a new feature introduced into library catalogs in the mid-2000s, quickly became an integral part of the users’ search process. research has shown that facets in library discovery interfaces are used both in conjunction with text searching, as a search refinement tool, and as a way to browse the collection with no search term entered.6 a recent study that analyzed interaction patterns in a faceted library interface at the north carolina state university using log data and user experiments demonstrated that users of faceted interfaces tend to issue shorter queries, go through fewer iterations of query reformulation, and scan deeper along the result list than those who use nonfaceted search systems. the authors also concluded that facets increase search accuracy, especially for complex and open-ended tasks, and improve user satisfaction.7 information technology and libraries | september 2016 11 another traditional use of transaction logs has been to gauge the performance of library catalogs, mostly through measuring success and failure rates. while the exact percentage of failed searches varied dramatically depending on the system’s search capabilities, interface design, the size of the underlying database, and, most importantly, on the researchers’ definition of an unsuccessful search, the conclusion was the same: the incidence of failure in library opacs was extremely high.8 in addition to reporting error rates, these studies also looked at the distribution of errors by search type (title, author, or subject search) and categorized sources of searching failure. most researchers agreed that typing errors and misspellings accounted for a significant portion of failed searches and were common across all search types.9 subject searching, which remained the most problematic area, often failed because of a mismatch between the search terms chosen by the user and the controlled vocabulary contained in the library records, suggesting that users experienced considerable difficulties in formulating subject queries with library of congress subject headings.10 other errors reported by researchers, such as the selection of the wrong search index or the inclusion of the initial article for title searches, were also caused by users’ lack of conceptual understanding of the search process and the system’s functions.11 these research findings were reinforced by multiple observational studies and user interviews, which showed that patrons found library catalogs “illogical,” “counter-intuitive,” and “intimidating,”12 and that patrons were unwilling to learn the intricacies of catalog searching.13 instead, users expected simple, fast, and easy searching across the entire range of library collections, relevance-ranked results that exactly matched what users expected to find, and convenient and seamless transition from discovery to access.14 today’s library discovery systems have come a long way: they offer one-stop search for a wide array of library resources, intuitive interfaces that require minimal training to be searched effectively, facets to help users narrow down the result set, and much more.15 but are today’s patrons always successful in their searches? usability studies of next-generation catalogs and, more recently, of web-scale discovery systems have pointed to patron difficulties associated with the use of certain facets, mostly because of terminological issues and inconsistencies in the underlying metadata.16 researchers also reported that users had trouble interpreting and evaluating the results of their search;17 users also were confused as to what resources were covered by the search tool.18 our study builds on this line of research by systematically analyzing real-life problematic searches as reported by library users and recorded in transaction logs. background stanford university is a private, four-year or above research university offering undergraduate and graduate degrees in a wide range of disciplines to about sixteen thousand students. the study analyzed the use of searchworks, a discovery platform developed by stanford university libraries. searchworks features a single search box with a link to advanced search on every page, relevanceranked results, faceted navigation, enhanced textual and visual content (summaries, tables of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 12 content, book cover images, etc.), as well as “browse shelf” functionality. searchworks offers searching and browsing of catalog records and digital repository objects in a single interface; however, it does not allow article-level searching. searchworks was developed on the basis of blacklight (projectblacklight.org), an open-source application for searching and interacting with collections of digital objects.19 thanks to blacklight’s flexibility and extensibility, searchworks enables discovery across an increasingly diverse range of collections (marc catalog records, archival materials, sound recordings, images, geospatial data, etc.) and allows to continuously add new features and improvements (e.g., https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released). study objectives the goal of the present study was two-fold. first, we sought to determine how patrons interact with the discovery systems, which features they use and with what frequency. second, this study aimed to identify and analyze problems that users encounter in their search process. method this study used data comprising four years of searchworks use, which was recorded in apache solr logs. the analysis was performed at the aggregate level; no attempts were made to identify individual searchers from the logs. at the preprocessing stage, we created and used a series of perl scripts to clean and parse the data and extract only those transactions where the user entered a search query and/or selected at least one facet value. page views of individual records were excluded from the analysis. the resulting output file contained the following parameters for each transaction: a time stamp, search mode used (basic or advanced), query terms, search index (“all fields,” “author,” “title,” “subject,” etc.), facets selected, and the number of results returned. the query stream was subsequently partitioned into task-based search sessions using a combination of syntactic features (word cooccurrence across multiple transactions) and temporal features (session time-outs: we used fifteen minutes of inactivity as a boundary between search sessions). the analysis was conducted over the following datasets: dataset 1. aggregate data of approximately 6 million search transactions conducted between february 13, 2011, and december 31, 2014. we performed quantitative analysis of this set to identify general patterns of system use. dataset 2. a sample of 5,101 search sessions containing 11,478 failed or potentially problematic interactions performed in the basic search mode and 2,719 sessions containing 3,600 advanced searches, annotated with query intent and potential cause of the problem. the searches were performed during eleven twenty-four-hour periods, representing different years, academic http://projectblacklight.org/ https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released information technology and libraries | september 2016 13 quarters, times of the school year (beginning of the quarter, midterms, finals, breaks), and days of the week. this dataset was analyzed to identify common sources of user failure. dataset 3. user feedback messages submitted to searchworks between january 2011 and december 2014 through the “feedback” link, which appears on every searchworks page. while the majority of feedback messages were error and bug reports, this dataset also contained valuable information about how users employed various features of the discovery layer, what problems they encountered, and what features they felt would improve their search experience. for the manual analysis of dataset 2, all searches within a search session were reconstructed in searchworks and, in some cases, also in external sources such as worldcat, google scholar, and google. they were subsequently assigned to one of the following categories: known-item searches (searches for a specific resource by title, combination of title and author, a standard number such as issn or isbn, or a call number), author searches (queries for a specific person or organization responsible for or contributing to a resource), topical searches, browse searches (searches for a subset of the library collection, e.g., “rock operas,” “graphic novels,” “dvds,” etc.), invalid queries, and queries where the search intent could not be established. to identify potentially problematic transactions, the following heuristic was employed: we selected all search sessions where at least one transaction failed to retrieve any records, as well as sessions consisting predominantly of known-item or author searches, where the user repeated or reformulated the query three or more times within a five-minute time frame. we hypothesized that this search pattern could be part of the normal query formulation process for topical searches, but it could serve as an indicator of the user’s dissatisfaction with the results of the initial query for known-item and author searches. we identified seventeen distinct types of problems, which we further aggregated into the following five groups: input errors, absence of the resource from the collection, queries at the wrong level of granularity, erroneous or too restrictive use of limiters, and mismatch between the search terms entered and the library metadata. each search transaction in dataset 2 was manually reviewed and assigned to one or more of these error categories. findings usage patterns our analysis of the aggregate data suggests that keyword searching remains the primary interaction paradigm with the library discovery system, accounting for 76 percent of all searches. however, users also increasingly take advantage of facets both for browsing and refining their searches: the use of facets grew from 25 percent in 2011 to 41 percent in 2014. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 14 although both the basic and the advanced search modes allow for “fielded” searches, where the user can specify which element of the record to search (author, title, subject, etc.), searchers rarely made use of this feature, relying mostly on the system’s defaults (the “all fields” search option in the basic search mode): users selected a specific search index in less than 25 percent of all basic searches. advanced searching was infrequent and declining (from 11 percent in 2011 to 4 percent in 2014). typically, users engaged in short sessions with a mean session length of 1.5 queries. search queries were brief: 2.9 terms per query on average. single terms made up 23 percent of queries; 26 percent had two terms, and 19 percent had three terms. error patterns the breakdown of errors by category and search mode is shown in figure 1. in the following sections, we describe and analyze different types of errors. figure 1. breakdown of errors by category and search mode input errors input errors accounted for the largest proportion of problematic searches in the basic search mode (29 percent) and for 5 percent of problems in the advanced search. while the majority of such errors occurred at the level of individual words (misspellings or typographical errors), entire search statements were also imprecise and erroneous (e.g., “diary of an economic hit man” instead of “confessions of an economic hit man” and “dostoevsky war and peace” instead of “tolstoy war and peace”). it is noteworthy that in 46 percent of all search sessions containing information technology and libraries | september 2016 15 problems of this type, users subsequently entered a corrected query. however, if such errors occurred in a personal name, they were almost half as likely to be corrected. absence of the item sought from the collection queries for materials that were not in the library’s collection accounted for about a quarter of all potentially problematic searches. in the advanced search modality, where the query is matched against a specific search field, such queries typically resulted in zero hits and can hardly be considered failures per se. however, in the default cross-field search, users were often faced with the problem of false hits and had to issue multiple progressively more specific queries to ascertain that the desired resource was absent from the collection. queries at the wrong level of granularity a substantial number of user queries failed because they were posed at the level of specificity not supported by the catalog. such queries accounted for the largest percentage of problematic advanced searches (63 percent), where they consisted almost exclusively of article-level searching: users either tried to locate a specific article (often by copying the entire citation or its part from external sources) or conducted highly specific topical searches more suitable for a fulltext database. in the basic search mode, the proportion of searches at the wrong granularity level was much lower, but still substantial (20 percent). in addition to searches for articles and narrowly defined subject searches, users also attempted to search for other types of more granular content, such as book chapters, individual papers in conference proceedings, poems, songs, etc. erroneous or too restrictive use of limiters another common source of failure was the selection of the wrong search index or a facet that was too restrictive to yield any results. the majority of these errors were purely mechanical: users failed to clear out search refinements from their previous search or entered query terms into the wrong search field. however, our analysis also revealed several conceptual errors, typically stemming from a misunderstanding of the meaning and purpose of certain limiters. for example, “online,” “database,” and “journal/periodical” facets were often perceived by the user as a possible route to article-level content. even seemingly straightforward limiters such as “date” caused confusion, especially when applied to serial publications: users attempted to employ this facet to drill down to the desired journal issue or article, most likely acting on the assumption that the system included article-level metadata. lack of correspondence between the users’ search terms and the library metadata a significant number of problems in this group involved searches for non-english materials. when performed in their english transliteration, such queries often failed because of users’ lack of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 16 familiarity with the transliteration rules established by the library community, whereas searches in the vernacular scripts tended to produce incomplete or no results because not all bibliographic records in the database contained parallel non-roman script fields. author and title searches often failed because of the users’ tendency to enter abbreviated queries. for example, personal name searches where the user truncated the author’s first or middle name to an initial while the bibliographic records only contained this name in its full form were extremely likely to fail. abbreviations were also used in searches for journals, conference proceedings, and occasionally even for book titles (e.g., “ai: a modern approach” instead of “artificial intelligence: a modern approach”). such queries were successful only if the abbreviation used by the searcher was included in the bibliographic records as a variant title. a somewhat related problem occurred when the title of a resource contained a numeral in its spelled out form but was entered as a digit by the user. because these title variations are not always recorded as additional access points in the bibliographic records, the desired item either did not appear in the result set or was buried too deep to be discovered. topical searches within the subject index were also prone to failure, mostly because patrons were unaware that such searches require the use of precise terms from controlled vocabularies and resorted to natural language searching instead. user feedback our analysis of user feedback revealed substantial differences in how various user groups approach the search system and which areas of it they find problematic. students were often frustrated by the absence of spelling suggestions, which, as one user put it, “left the users wander [to?] in the dark” as to the cause of searching failure. this user group also found certain social features desirable: for example, one user suggested that having ratings for books would be helpful in his choice of a good programming book. by contrast, faculty and researchers were more concerned about the lack of the more advanced features, such as cross-reference searching and left-anchored browsing of the title, subject, and author indexes. however, there were several areas that both groups found problematic: students and faculty alike saw the system’s inability to assist in the selection of the correct form of the author’s name as a major barrier to effective author searching and also converged on the need for more granular access to formats of audiovisual materials. discussion scope of the discovery system the results of our analysis point to users’ lack of understanding of what is covered by the discovery layer. users are often unaware of the existence of separate specialized search interfaces for different categories of materials and assume that the library discovery layer offers google-like information technology and libraries | september 2016 17 searching across the entire range of library resource types. moreover, they are confused by the multiple search modalities offered by the discovery layer: one of the common misconceptions in searchworks is that the advanced search will allow the user to access additional content rather than offer a different way of searching the same catalog data. in addition to the expanded scope of the discovery tools, there is also a growing expectation of greater depth of coverage. according to our data, searching in a discovery layer occurs at several levels: the entire resource (book, journal title, music recording), its smaller integral units (book chapters, journal articles, individual musical compositions, etc.), and full text. user search strategies the search strategies employed by searchworks users are heavily influenced by their experiences with web search engines. users tend to engage in brief search sessions and use short queries, which is consistent with the general patterns of web searching. they rely on relevance ranking and are often reluctant to examine search results in any depth: if the desired item does not appear within the first few hits, users tend to rework their initial search statement (often with only a minimal change to the search terms) rather than scrolling down to the bottom of the results screen or looking beyond the first page of results. given these search patterns, it is crucial to fine-tune relevance-ranking algorithms to the extent that the most relevant results are displayed not just on the first page but are included in the first few hits. while this is typically the case for unique and specific queries, more general searches could benefit from a relevance-ranking algorithm that would leverage the popularity of a resource as measured by its circulation statistics. adding this dimension to relevance determination would help users make sense of large result sets generated by broad topical queries (e.g., “quantum mechanics,” “linear algebra,” “microeconomics”) by ranking more popular or introductory materials higher than more specialized ones. it could also provide some guidance to the user trying to choose between different editions of the same resource and improve the quality of results of author searches by ranking works created by the author before critical and biographical materials. users’ query formulation strategies are also modeled by google, where making search terms as specific as possible is often the only way to increase the precision of a search. faceted search systems, however, require a different approach: the user is expected to conduct a broad search and subsequently focus it by superimposing facets on the results. qualifying the search upfront through keywords rather than facets is not only ineffective, but may actually lead to failure. for example, a common search pattern is to add the format of a resource as a search term (e.g., “fortune magazine,” “science journal,” “gre e-book,” “nicole lopez dissertation,” “woody allen movies”), and because the format information is coded rather than spelled out in the bibliographic records, such queries either result in zero hits or produce irrelevant results. in a similar vein, making the query overly restrictive by including the year of publication, publisher, or edition library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 18 information often causes empty retrievals because the library might not have the edition specified by the user or because the query does not match the data in the bibliographic record. thus our study lends further weight to claims that even in today’s reality of sophisticated discovery environments and unmediated searching, library users can still benefit from learning the best search techniques that are specifically tailored to faceted interfaces.20 error tolerance input errors remain one of the major sources of failure in library discovery layers. users have become increasingly reliant on error recovery features that they find elsewhere on the web, such as “did you mean . . . ” suggestions, automatic spelling corrections, and helpful suggestions on how to proceed in situations where the initial search resulted in no hits. but perhaps even more crucial are error-prevention mechanisms, such as query autocomplete, which helps users avoid spelling and typographical errors and provides interactive search assistance and instant feedback during the query formulation process. our visual analysis of the logs from the most recent years revealed an interesting search pattern, where the user enters only the beginning of the search query and then increments it by one or two letters: pr pro proq proque proques proquest such search patterns indicate that users expect the system to offer query expansion options and show the extent to which the query autocomplete feature (currently missing from searchworks) has become an organic part of the users’ search process. topical searching while next-generation discovery systems represent a significant step toward enabling more sophisticated topical discovery, a number of challenges still remain. apart from mechanical errors, such as misspellings and wrong search index selections, the majority of zero-hit topical searches were caused by a mismatch between the user’s query and the vocabulary in the system’s index. in many cases such queries were formulated too narrowly, reflecting the users’ underlying belief that the discovery layer offers full-text searching across all of the library’s resources. in addition to keyword searching, libraries have traditionally offered a more sophisticated and precise way of accessing subject information in the form of library of congress subject headings (lcsh). however, our results indicate that these tools remain largely underused: users took advantage of this feature in only 21 percent of all subject searches in our sample. we also found information technology and libraries | september 2016 19 that 95 percent of lcsh usage came from clicks on subject heading links within individual bibliographic records rather than from “subject” facets, corroborating the results of earlier studies.21 there is a whole range of measures that could help patrons leverage the power of controlled vocabulary searching. they include raising the level of patron familiarity with the lcshs, integrating cross-references for authorized subject terms, enabling more sophisticated facetbased access to subject information by allowing users to manipulate facets independently, and exposing hierarchical and associative relationships among lcshs. ideally, once the user has identified a helpful controlled vocabulary term, it should be possible to expand, refine, or change the focus of a search through broader, narrower, and related terms in the lcsh’s hierarchy as well as to discover various aspects of a topic through browse lists of topical subdivisions or via facets. known-item searching important as it is for the discovery layer to facilitate topical exploration, our data suggests that searchworks remains, first and foremost, a known-item lookup tool. while a typical searchworks user rarely has problems with known-work searches, our analysis of clusters of closely related searches has revealed several situations where users’ known-item search experience could be improved. for example, when the desired resource is not in the library’s collection, the user is rarely left with empty result sets because of automatic word-stemming and cross-field searching. while this is a boon for exploratory searching, it becomes a problem when the user needs to ensure that the item sought is not included in the library’s collection. another common scenario arises when the query is too generic, imprecise, or simply erroneous, or when the search string entered by the user does not match the metadata in the bibliographic record, causing the most relevant resources to be pushed too far down the results list to be discoverable. providing helpful “did you mean . . . ” suggestions could potentially help the user distinguish between these two scenarios. another feature that would substantially benefit the user struggling with the problem of noisy retrievals is highlighting the user’s search terms in retrieved records. displaying search matches could alleviate some of the concerns over lack of transparency as to why seemingly irrelevant results are retrieved, repeatedly expressed in user feedback, as well as expedite the process of relevance assessment. author searching author searching remains problematic because of a convergence of factors: a. misspellings. according to our data, typographical errors and misspellings are by far the most common problem in author searching. when such errors occur in personal names, they are much more difficult to identify than errors in the title, and in the absence of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 20 index-based spell-checking mechanisms, often require the use of external sources to be corrected. b. mismatch between the form and fullness of the name entered by the user and the form of the name in the bibliographic record. for example, a user’s search for “d. reynolds” will retrieve records where “d” and “reynolds” appear anywhere in the record (or anywhere in the author fields, if the user opts for a more focused “author” search), but will not bring up records where the author’s name is recorded as “reynolds, david.” c. lack of cross-reference searching of the lc name authority file. if the user searches for a variant name represented by a cross-reference on an authority record, she might not be directed to the authorized form of the name. d. lack of name disambiguation, which is especially problematic when the search is for a common name. while the process of name authority control ensures the uniqueness of name headings, it does not necessarily provide information that would help users distinguish between authors. for instance, the user often has to know the author’s middle name or date of birth to choose the correct entry, as exemplified by the following choices in the “author” facet resulting from the query “david kelly”: kelly, david kelly, david (david d.) kelly, david (david francis) kelly, david f. kelly, david h. kelly, david patrick kelly, david st. leger kelly, david t. kelly, david, 1929 july 11– kelly, david, 1929– kelly, david, 1929–2012 kelly, david, 1938– kelly, david, 1948– kelly, david, 1950– kelly, david, 1959– e. errors and inaccuracies in the bibliographic records. given the past practice of creating undifferentiated personal-name authority records, it is not uncommon to have one name heading for different authors or contributors. conversely, situations where a single person is identified by multiple headings (largely because some records still contain obsolete or variant forms of a personal name) are also prevalent and may information technology and libraries | september 2016 21 become a significant barrier to effective retrieval as they create multiple facet values for the same author or contributor. f. inability to perform an exhaustive search on the author’s name. a fielded “author” search will miss the records where the name does not appear in the “author” fields but appears elsewhere in the bibliographic record. g. relevance ranking. because search terms occurring in the title have more weight than search terms in the “author” fields, works about an author are ranked higher than works of the author. browsing like many other next-generation discovery systems, searchworks features faceted navigation, which facilitates both general-purpose browsing and more targeted search. in searchworks, facets are displayed from the outset, providing a high-level overview of the collection and jumping-off points for further exploration. rather than having to guess the entry vocabulary, the searcher may just choose from the available facets and explore the entire collection along a specific dimension. however, findings from our manual analysis of the query stream suggest that facets as a browsing tool might not be used to their fullest potential: users often resort to keyword searching when faceted browsing would have been a more optimal strategy. there are at least two factors that contribute to this trend. the first is users’ lack of awareness of this interface feature: it is common for searchworks users to issue queries such as “dissertations,” “theses,” and “newspapers” instead of selecting the appropriate value of the “format” facet. second, many of the facets that could be useful in the discovery process are not available as top-level browsing categories. for example, users expect more granular faceting of audiovisual resources, which would include the ability to browse by content type (“computer games,” “video games”) and genre (“feature films,” “documentaries,” “tv series,” “romantic comedies”). another category of resources commonly accessed by browsing is theses and dissertations. users frequently try to browse dissertations by field or discipline (issuing searches such as “linguistics thesis,” “dissertations aeronautics,” “phd thesis economics,” “biophysics thesis”), by program or department and by the level of study (undergraduate, master’s, doctoral), and could benefit from a set of facets dedicated to these categories. browsing for books could be enhanced by additional faceting related to intellectual content, such as genre and literary form (e.g., “fantasy,” “graphic novels,” “autobiography,” “poetry”) and audience (e.g., “children’s books”). users also want to be able to browse for specific subsets of materials on the basis of their location (e.g., permanent reserves at the engineering library). browsing for new acquisitions with the option of limiting to a specific topic is also a highly desirable feature. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 22 while some browsing categories are common across all types of resources, others only apply to specific types of materials (e.g., music, cartographic/geospatial materials, audiovisual resources, etc.). for example, there is a strong demand among music searchers for systematic browsing by specific musical instruments and their combinations. ideally, the system should offer both an optimal set of initial browse options and intuitive context-specific ways to progressively limit or expand the search. offering such browsing tools may require improvements in system design as well as significant data remediation and enhancement because much of the metadata that could be used to create these browsing categories is often scattered across multiple fixed and variable fields in the bibliographic records, inconsistently recorded, or not present at all. one of the hallmarks of modern discovery tools has been their increased focus on developing tools that would facilitate serendipitous browsing. searchworks was one of the pioneers to offer virtual “browse shelf” feature, which is aimed at emulating browsing the shelves in a physical library. however, because this functionality relies on the classification number, it does not allow browsing of many other important groups of materials, such as multimedia resources, rare books, or archival resources. call-number proximity is only one of the many dimensions that could be leveraged to create more opportunities for serendipitous discoveries. other methods of associating related content might include recommendations based on subject similarity, authorship, keyword associations, forward and backward citations, and use. implications for practice addressing the issues that we identified would involve improvements in several areas: • scope. our findings indicate that library users increasingly perceive the discovery interface as a portal to all of the library’s resources. meeting this need goes far beyond offering the ability to search multiple content sources from a single search box: it is just as important to help users make sense of the results of their search and to provide easy and convenient ways to access the resources that they have discovered. and whatever the scope of the library discovery layer is, it needs to be communicated to the user with maximum clarity. • functionality. users expect a robust and fault-tolerant search system with a rich suite of search-assistance features, such as index-based alternative spelling suggestions, result screens displaying keywords in context, and query auto-completion mechanisms. these features, many of which have become deeply embedded into user search processes elsewhere on the web, could prevent or alleviate a substantial number of issues related to problematic user queries (misspellings, typographical errors, imprecise queries, etc.), enable more efficient recovery from errors by guiding users to improved results, and facilitate discovery of foreign-language materials. equally important is the continued focus on relevance ranking algorithms, which ideally should move beyond simple keyword information technology and libraries | september 2016 23 matching techniques toward incorporating social data as well as leveraging the semantics of the query itself and offering more intelligent and possibly more personalized results depending on the context of the search. • metadata. the quality of the user experience in the discovery environments depends as much on the metadata as it does on the functionality of the discovery layer. thus it remains extremely important to ensure consistency, granularity, and uniformity of metadata, especially as libraries are increasingly faced with the problem of integrating heterogeneous pools of metadata into a single discovery tool. conclusions and future directions the analysis of the transaction log data and user feedback has helped us identify several common patterns of search failure, which in turn can reveal important assumptions and expectations that users bring to the library discovery. these expectations pertain primarily to the system’s functionality: in addition to simple, intuitive, and visually appealing interfaces and relevanceranked results, users expect a sophisticated search system that would consistently produce relevant results even for incomplete, inaccurate, or erroneous queries. users also expect a more centralized, comprehensive, and inclusive search environment that would enable more in-depth discovery by offering article-level, chapter-level, and full-text searching. finally, the results of this study have underscored the continued need for a more flexible and adaptive system that would be easy to use for novices while offering advanced functionality and more control over the search process for the “power” users, a system that would provide targeted support for the different types of information behavior (known-item look-up, author searching, topical exploration, browsing) and would facilitate both general inquiry and very specialized searches (e.g., searches for music, cartographic and geospatial materials, digital collections of images, etc.). just like discovery itself, building discovery tools is a dynamic, complex, iterative process that requires intimate knowledge of ever-changing and evolving user needs and expectations. it is hoped that ongoing focus on user problems and frustrations in the new discovery environments can complement other assessment methods by identifying unmet user needs, thus helping create a more holistic and nuanced picture of users’ search and discovery behaviors. references 1. marshall breeding, “library resource discovery products: context, library perspectives, and vendor positions,” library technology reports 50, no. 1 (2014): 5–58. 2. craig silverstein et al., “analysis of a very large web search engine query log,” sigir forum 33, no. 1 (1999): 6–12; bernard j. jansen, amanda spink, and tefko saracevic, “real life, real users, and real needs: a study and analysis of user queries on the web,” information library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 24 processing & management 36, no. 2 (2000): 207–27, http://dx.doi.org/10.1016/s03064573(99)00056-4; amanda spink, bernard j. jansen, and h. cenk ozmultu, “use of query reformulation and relevance feedback by excite users,” internet research 10, no. 4 (2000): 317–28; amanda spink et al., “searching the web: the public and their queries,” journal of the american society for information science & technology 52, no. 3 (2001): 226–34; bernard j. jansen and amanda spink, “an analysis of web searching by european allteweb.com users,” information processing & management 41, no. 2 (2005): 361–81, http://dx.doi.org/10.1016/s0306-4573(03)00067-0. 3. cory lown and bradley hemminger, “extracting user interaction information from the transaction logs of a faceted navigation opac,” code4lib 7, june 26, 2009, http://journal.code4lib.org/articles/1633; eng pwey lau and dion ho-lian goh, “in search of query patterns: a case study of a university opac,” information processing & management 42, no. 5 (2006): 1316–29, http://dx.doi.org/10.1016/j.ipm.2006.02.003; heather moulaison, “opac queries at a medium-sized academic library: a transaction log analysis,” library resources & technical services 52, no. 4 (2008): 230–37. 4. william h. mischo et al., “user search activities within an academic library gateway: implications for web-scale discovery systems,” in planning and implementing resource discovery tools in academic libraries, edited by mary pagliero popp and diane dallis, 153–73 (hershey, : information science reference, 2012); xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30, no. 5 (2014): 422–33, http://dx.doi.org/10.1080/10447318.2013.873281. 5. eng pwey lau and dion ho-lian goh, “in search of query patterns”; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.”. 6. lown and hemminger, “extracting user interaction; kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.” 7. xi niu and bradley hemminger, “analyzing the interaction patterns in a faceted search interface,” journal of the association for information science & technology 66, no. 5 (2015): 1030–47, http://dx.doi.org/10.1002/asi.23227. 8. steven d. zink, “monitoring user search success through transaction log analysis: the wolfpac example,” reference services review 19, no. 1 (1991): 49–56; deborah d. blecic et al., “using transaction log analysis to improve opac retrieval results,” college & research libraries 59, no. 1 (1998): 39–50; holly yu and margo young, “the impact of web search http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(03)00067-0 http://journal.code4lib.org/articles/1633 http://dx.doi.org/10.1016/j.ipm.2006.02.003 http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1080/10447318.2013.873281 information technology and libraries | september 2016 25 engines on subject searching in opac,” information technology & libraries 23, no. 4 (2004): 168–80; moulaison, “opac queries at a medium-sized academic library.” 9. thomas peters, “when smart people fail,” journal of academic librarianship 15, no. 5 (1989): 267–73; zink, “monitoring user search success through transaction log analysis”; rhonda h. hunter, “successes and failures of patrons searching the online catalog at a large academic library: a transaction log analysis,” reference quarterly (spring 1991): 395–402. 10. karen antell and jie huang, “subject searching success: transaction logs, patron perceptions, and implications for library instruction,” reference & user services quarterly 48, no. 1 (2008): 68–76; hunter, “successes and failures of patrons searching the online catalog at a large academic library”; peters, “when smart people fail.” 11. peters, “when smart people fail.”; moulaison, “opac queries at a medium-sized academic library”; blecic et al., “using transaction log analysis to improve opac retrieval results.” 12. lynn silipigni connaway, debra wilcox johnson, and susan e. searing, “online catalogs from the users’ perspective: the use of focus group interviews,” college & research libraries 58, no. 5 (1997): 403–20, http://dx.doi.org/10.5860/crl.58.5.403. 13. karl v. fast and d. grant campbell, “‘i still like google’: university student perceptions of searching opacs and the web,” asist proceedings 41 (2004): 138–46; eric novotny, “i don’t think i click: a protocol analysis study of use of a library online catalog in the internet age,” college & research libraries 65, no. 6 (2004): 525–37, http://dx.doi.org/10.5860/crl.65.6.525. 14. xi niu et al., “national study of information seeking behavior of academic researchers in the united states,” journal of the american society for information science & technology 61, no. 5 (2010): 869–90, http://dx.doi.org/10.1002/asi.21307; lynn sillipigni connaway, timothy j. dikey, and marie l. radford, “if it is too inconvenient i’m not going after it: convenience as a critical factor in information-seeking behaviors,” library & information science research 33, no. 3 (2011): 179–90; karen calhoun, joanne cantrell, peggy gallagher and janet hawk, online catalogs: what users and librarians want: an oclc report (dublin, oh: oclc online computer library center, 2009). 15. f. william chickering and sharon q. young, “evaluation and comparison of discovery tools: an update,” information technology & libraries 33, no.2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 16. william denton and sarah j. coysh, “usability testing of vufind at an academic library,” library hi tech 29, no. 2 (2011): 301–19, http://dx.doi.org/10.1108/07378831111138189; jennifer emanuel, “usability of the vufind next-generation online catalog,” information technology & libraries 30, no. 1 (2011): 44–52; erin dorris cassidy et al., “student searching http://dx.doi.org/10.5860/crl.58.5.403 http://dx.doi.org/10.5860/crl.65.6.525 http://dx.doi.org/10.1002/asi.21307 http://dx.doi.org/10.6017/ital.v33i2.3471 http://dx.doi.org/10.1108/07378831111138189 library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 26 with ebsco discovery: a usability study,” journal of electronic resources librarianship 26, no. 1 (2014): 17–35, http://dx.doi.org/10.1080/1941126x.2014.877331. 17. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186– 207; andrew d. asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88. 18. jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology & libraries 31, no. 1 (2012): 83–112; megan johnson, “usability test results for encore in an academic library,” information technology & libraries 32, no. 3 (2013): 59–85. 19. elizabeth (bess) sadler, “project blacklight: a next generation library catalog at a first generation university,” library hi tech 27, no. 1 (2009): 57–67, http://dx.doi.org/10.1108/07378830910942919; bess sadler, “stanford's searchworks: unified discovery for collections?” in more library mashups: exploring new ways to deliver library data, edited by nicole c. engard, 247–260 (london: facet, 2015). 20. andrew d. asher, lynda m. duke and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88; kelly meadow and james meadow, “search query quality and web-scale discovery: a qualitative and quantitative analysis,” college & undergraduate libraries 19, no. 2–4 (2012): 163–75, http://dx.doi.org/10.1080/10691316.2012.693434. 21. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; kathleen bauer and alice peterson-hart, “does faceted display in a library catalog increase use of subject headings?” library hi tech 30, no. 2 (2012), 347–58, http://dx.doi.org/10.1108/07378831211240003. http://dx.doi.org/10.1080/1941126x.2014.877331 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378830910942919 http://dx.doi.org/10.1080/10691316.2012.693434 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378831211240003 abstract introduction references jeng it is our flagship: surveying the landscape of digital interactive displays in learning environments lydia zvyagintseva information technology and libraries | june 2018 50 lydia zvyagintseva (lzvyagintseva@epl.ca) is the digital exhibits librarian at the edmonton public library in edmonton, alberta. abstract this paper presents the findings of an environmental scan conducted as part of a digital exhibits intern librarian project at the edmonton public library in 2016. as part of the library’s 2016–2018 business plan objective to define the vision for a digital exhibits service, this research project aimed to understand the current landscape of digital displays in learning institutions globally. the resulting study consisted of 39 structured interviews with libraries, museums, galleries, schools, and creative design studios. the environmental scan explored the technical infrastructure of digital displays, their user groups, various uses for the technologies within organizational contexts, the content sources, scheduling models, and resourcing needs for this emergent service. additionally, broader themes surrounding challenges and successes were also included in the study. despite the variety of approaches taken among learning institutions in supporting digital displays, the majority of organizations have expressed a high degree of satisfaction with these technologies. introduction in 2020, the stanley a. milner library, the central branch of the edmonton (alberta) public library (epl) will reopen after extensive renovations to both the interior and exterior of the building. as part of the interior renovations, epl will have installed a large digital interactive display wall modeled after the cube at queensland university of technology (qut) in brisbane, australia. to prepare for the launch of this new technology service, epl hired a digital exhibits intern librarian in 2016, whose role consisted of conducting research to inform the library in defining the vision for a digital display wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. as a result, the author carried out an environmental scan and a literature review related to digital display, as well as their consequent service contexts. for the purposes of this paper, “digital displays” refers to the technology and hardware used to showcase information, whereas “digital exhibits” refers to content and software used on those displays. wherever the service of running, managing, or using this technology is discussed, it is framed as “digital display service” and concerns both technical and organizational aspects of using this technology in a learning institution. method the data were collected between may 30 and august 20, 2016. a series of structured interviews were conducted by skype, phone, and email. the study population was driven by searching google mailto:lzvyagintseva@epl.ca it is our flagship | zvyagintseva 51 https://doi.org/10.6017/ital.v37i2.9987 and google news for keywords such as “digital interactive and library,” “interactive display,” “public display,” or “visualization wall” to identify organizations that have installed digital displays. a list of the study population was expanded by reviewing websites of creative studios specializing in interactive experiences and through a snowball effect once the interviews had begun. a small number of vendors, consisting primarily of creative agencies specializing in digital interactive services, were also included in the study population. participants were then recruited by email. the goal of this project was to gain a broad understanding of the emergent technology, content, and service model landscape related to digital displays. as a result, structured interviews were deemed to be the most appropriate method of data collection because of their capacity to generate a large amount of qualitative and quantitative data. in total, 39 interviews were conducted. a list of interview questions prepared for the interviews is included in appendix a. additionally, a complete list of the study population can also be found in appendix b. predominantly, organizations from canada, the united states, australia, and new zealand are represented in this study. literature review definitions • public displays, a term used in the literature to refer to a particular type of digital display, can refer to “small or large sized screens that are placed indoor . . . or outdoor for public viewing and usage” and which may be interactive to support information browsing and searching activities.”1 in public displays, a large proportion of users are passers-by and thus first-time users.2 in academic environments, these technologies may be referred to as “video walls” and have been characterized as display technologies with little interactivity and input from users, often located in high-traffic, public areas with content prepared ahead of time and scheduled for display according to particular priorities.3 • semi-public displays, on the other hand, can be understood as systems intended to be used by “members of a small, co-located group within a confined physical space, and not general passers-by.”4 in academic environments, they have been referred to as “visualization spaces” or “visualization studios,” and can be defined as workspaces with real-time content displayed for analysis or interpretation, often placed in in libraries or research department units.5 for the purposes of this paper, “digital displays” refers to both public and semi-public displays, as organizations interviewed as part of this study had both types of displays, occasionally simultaneously. • honeypot effect describes how people interacting with an information system, such as a public display, stimulate other users to observe, approach, and engage in interaction with that system.6 this phenomenon extends beyond digital displays to tourism, art, or retail environments, where a site of interest attracts attention of passers-by and draws them to participate in that site. interactivity the area of interactivity with public displays has been studied by many researchers, with three commonly used modes of interaction clearly identified: touch, gesture, and remote modes. information technology and libraries | june 2018 52 • touch (or multi-touch): this is the most common way users interact with personal mobile devices such as smartphones and tablets. multi-touch interaction on public displays should support many individuals interacting with the digital screen simultaneously, since many users expect immediate access and will not take turns. for example, some technologies studied in this report support up to 30 touch points at any given time, while others, like qut’s the cube, allow for a near infinite number of touch points. though studies show that this technique is fast and natural, it also requires additional physical effort from the user.7 while touch interaction using infrared sensors has a high touch recognition rate, its shortcomings have been identified as being expensive and being influenced by light interference, such as light around the touch screen.8 • gesture: this is interaction is through movement of the user’s hands, arms, or entire body, recognized by sensors such as the microsoft kinect or leap motion systems. although studies show that this type of interaction is quick and intuitive, it also brings “a cognitive load to the users together with the increased concern of performing gestures in public spaces.”9 specifically, body gestures were found not to be well suited to passing-by interaction, unlike hand gestures, which can be performed while walking. hand gestures also have an acceptable mental, physical and temporal workload.10 research into gesturebased interaction shows that “more movement can negatively influence recall” and is therefore not suited for informational exhibits.11 similarly, people consider gestures to be too much work “when they require two hands and large movements” to execute.12 not surprisingly, research suggests that gestures deemed to be socially acceptable for public spaces are small, unobtrusive ones that mimic everyday actions. they are also more likely to be adopted by users. • remote: these are interactions using another device, such as mobile phones, tablets, virtual-reality headsets, game controllers, and other special devices. connection protocols may include bluetooth, sms messaging, near-field communication, radio-frequency identification, wireless-network connectivity, and other methods. mobile-based interaction with public displays has received a lot of attention in research, media, and commercial environments because this mode allows users to interact from variable distance with minimal physical effort. however, users often find mobile interaction with a public display “too technical and inconvenient” because it requires sophisticated levels of digital literacy in addition to having access to a suitable device.13 some suggest that using personal devices for input also helps “avoid occlusion and offers interaction at a distance” without requiring multi-touch or gesture-based interactions.14 as well, subjects in studies on mobile interaction often indicate their preference for this mode because of its low mental effort and low physical demand. however, it is possible that these studies focused on users with high degrees of digital literacies rather than the general public with varying degrees of access and comfort with mobile technologies. user engagement attracting user attention is not necessarily guaranteed by virtue of having a public display. according to research, the most significant factors that influence user engagement with public digital displays are age, display content, and social context. it is our flagship | zvyagintseva 53 https://doi.org/10.6017/ital.v37i2.9987 age hinrichs found that children were the first to engage in interaction with public displays and would often recruit adults accompanying them toward the installation.15 on the other hand, the hinrichs found adults to be more hesitant in approaching the installation: “they would often look at it from a distance before deciding to explore it further.”16 these findings suggest that designing for children first is an effective strategy for enticing interaction from users of all ages. display content studies on engagement in public digital display environments indicate that both passive and active types of engagement exist with digital displays. the role of emotion in the content displayed also cannot be overlooked. specifically, clinch et al. state that people typically pay attention to displays “only when they expected the content to be of interest to them” and that they are “more likely to expect interesting content in a university context rather than within commercial premises.”17 in other words, the context in which the display is situated affects user expectations and primes them for interaction. the dominant communication pattern in existing display and signage systems has been narrowcast, a model in which displays are essentially seen as distribution points for centrally created content without much consideration for users. this model of messaging exists in commercial spaces, such as malls, but also in public areas like transit centers, university campuses, and other spaces where crowds of people may gather or pass by. observational studies indicate that people tend to perceive this type of content as not relevant to them and ignore it.18 for public displays to be engaging to end users, in other words, “there needs to be some kind of reciprocal interaction.”19 in public spaces, interactive displays may be more successful than noninteractive displays in engaging viewers and making city centers livelier and more attractive.20 in terms of precise measures of attention to such displays, studies of average attention time correlate age with responsiveness to digital signage. children (1–14 years) are more receptive than adults and men spend more time observing digital signage than women.21 studies also indicate a significantly higher average attention times for observing dynamic content as compared to static content.22 scholars like buerger suggest that designers of applications for public digital displays should assume that viewers are not willing “to spend more than a few seconds to determine whether a display is of interest.”23 instead, they recommend presenting informational content with minimal text and in such a way that the most important information can be determined in two-to-three seconds. in a museum context, the average interaction time with the digital display was between two and five minutes, which was also the average time people spent exploring analog exhibits.24 dynamic, game-like exhibits at the cube incorporate all the above findings to make interaction interesting, short, and drawing the attention of children first. social context social context is another aspect that has been studied extensively in the field of human-computer interaction, and it provides many valuable lessons for applying evidence-based practices to technology service planning in libraries. many scholars have observed the honeypot effect as related to interaction with digital displays in public settings. this effect describes how users who are actively engaged with the display perform two important functions: they entice passers-by to become actively engaged users themselves, and they demonstrate how to interact with the technology without formal instruction. information technology and libraries | june 2018 54 many argue that a conductive social context can “overcome a poor physical space, but an inappropriate social context can inhibit interaction” even in physical spaces where engagement with the technology is encouraged.25 this finding relates to use of gestures on public displays. researchers also found that contextual social factors such as age and being around others in a public setting do, in fact, influence the choice of multi-touch gestures. hinrichs suggests enabling a variety of gestures for each action—accommodating different hand postures and a large number of touch points, for example—to support fluid gesture sequences and social interactions.26 a major deterrent to users’ interaction with large public displays has been identified as the potential for social embarrassment.27 as an implication, the authors suggest positioning the display along thoroughfares of traffic and improving how the interaction principles of the display are communicated implicitly to bystanders, thus continually instructing new users on techniques of interaction.28 findings technical and hardware landscape the average age of public displays was around three years, indicating an early stage of development of this type of service among learning institutions. such technologies first appeared in europe more than 10 years ago (for example, the most widely cited early example of a public display is the citywall in helsinki in 2007).29 however, adoption in north american did not start until around 2013.the median year for the installation of these technologies among organizations studied in this report is 2014. among public institutions represented in the study population, such as public libraries and museums, digital displays were most frequently installed in 2015. while most organizations have only one display space, it was not unusual to find several within a single organization. for example, for the purposes of this study, the researcher has counted the cube as three display spaces, as documentation and promotional literature on the technology cites “3 separate display zones.” as a result, the average number of display spaces in the population of this study is 1.75. the following modes of interaction beyond displaying video content with digital displays have been observed in the study population in descending order of frequency: • sound (79%). while research on human-computer interaction is inconclusive about best practices related to incorporating sound into digital interactive displays, it is clear, among the organizations interviewed in the environmental scan, that sound is a major component of digital exhibits and should not be overlooked. • touch or multi-touch (46%). this finding highlights that screens capable of supporting multi-user interaction is not consistent across the study population. • gesture (25%): these include tools such as microsoft kinect, leap motion, or other systems for detecting movement for interaction. • mobile (14%). while some researchers in the human-computer interaction field suggest mobile is the most effective way to bridge the divide between large public displays, personalization of content, and user engagement, mobile interactivity is not used frequently to engage with digital displays in the study population. one outlier is north carolina state university library, which takes a holistic, “massively responsive design” approach in which responsive web design principles are applied to content that can be it is our flagship | zvyagintseva 55 https://doi.org/10.6017/ital.v37i2.9987 displayed effectively at once online, on digital display walls, and on mobile devices while optimizing institutional resources dedicated to supporting visualization services. further, as in the broader personal computing environment, the microsoft windows operating system dominates display systems, with 61% of the organizations choosing a windows machine to power their digital display. a fifth (21%) of all organizations have some form of networked computing infrastructure, such as the cube with its capacity to process exhibit content using 30 servers. instead, the majority (79%) of organizations interviewed have a single computer powering the display. this finding is perhaps not surprising, given that few institutions have dedicated it teams to support a single technology service like the cube. users and use cases understanding primary audiences was also important for this study, as the organizational user base defines the context for digital exhibits. the breakdown of these audiences is summarized in figure 1. for example, the university of oregon ford alumni center’s digital interactive display focuses primarily on showcasing the success of its alumni, with a goal of recruiting new students to the university. however, the interactive exhibits also serve the general public through tours and events on the university of oregon campus. other organizations with digital displays, such as all saints anglican school and the philadelphia museum of art, also target specific audiences, so planning for exhibits may be easier in those contexts than in organizations like the university of waterloo stratford campus, with its display wall at the downtown campus that receives visitor traffic from students, faculty, and the public. 44% 33% 22% types of audience academic public both public and academic information technology and libraries | june 2018 56 figure 1. audience types for digital displays in the study population. digital displays serve various purposes, which depend on the context of the organization in which they exist, their technical functionality, their primary audience, their service design, and other factors. interview participants were asked about the various uses for these technologies at their institutions. a single display could have multiple functions within a single institution. the following list summarizes these multiple uses: 1. educational (67%), such as displaying digital collections, archives, historical maps, and other informational. these activities can be summarized in the words of one participant as “education via browse”—in other words, self-guided discovery rather than formal instruction. 2. fun or entertainment (56%), including art exhibitions, film screenings, games, playful exhibits, and other engaging content to entice users. 3. communication (47%), which can be considered a form of digital signage to promote library or institutional services and marketing content. displays can also deliver presentations and communicate scholarly work. 4. teaching (42%), including formal and semi-formal instruction, workshops, student presentations, and student course-work showcases. 5. events (31%), such as public tours, conferences, guest speakers, special events, galas, and other social activities near or using the display. 6. community engagement (28%), including participation from community members through content contribution, showing local content, using the display technology as an outreach tool, and other strategies to build relationships with user communities. 7. research (22%), where the display functions as a tool that facilitates scholarly activities like data collection, analysis, and peer review. many study participants acknowledged challenges in using digital displays for this purpose and have identified other services that might support this use more effectively. content types and management in the words of deakin university librarians, “content is critical, but the message is king,” so it was particularly important for the author to understand the current digital display landscape as it relates to content.30 specifically, the research project encompassed the variety of content used on digital displays as well as how it is created, managed, shared, and received by the audiences of various organizations interviewed in this study. as can be observed in figure 2, all organizations supported 2d content, such as images, video, audio, presentation slides, and other visual and textual material. however, dynamic forms of content, such as social media feeds, interactive maps, and websites were less prevalent. it is our flagship | zvyagintseva 57 https://doi.org/10.6017/ital.v37i2.9987 figure 2. types of content supported by digital displays in the study population. discussions around interest in emergent, immersive, and dynamic 3d content such as games and virtual and augmented reality also came up frequently in the study interviews, and the researcher found that these types of content were supported in only 16 (57%) of the 28 total cases. this number is lower than the total number of interviewees because not all organizations interviewed had content to manage or display. in addition, many organizations recognized that they would likely be exploring ways to present 3d games or immersive environments through their digital display in the near future. not surprisingly, the creative agencies included in this study revealed an awareness and active development of content of this nature, noting “rising demand and interest in 3d and game-like environments.” furthermore, projects involving motion detection, the internet of things, and other sensor-based interactions are also seeing rise in demand, according to study participants. 100 % 61 % 57 % 0 10 20 30 40 50 60 70 80 90 100 content types supported content types static 2d dynamic web dynamic 3d information technology and libraries | june 2018 58 figure 3. content management systems for digital displays. in terms of managing various types of content, 20 (71%) of the organizations interviewed had used some form of content management system (cms), while the rest did not use any tool to manage or organize content. of those organizations that used a cms, 15 (75%) relied on a vendorsupplied system, such as tools by fourwinds interactive, visix, or nec live. the remaining 5 (18%) cms users created a custom solution without going to a vendor. this finding suggests that since the majority of content supported by organizations with digital displays is 2d, current vendor solutions for managing that content are sufficient for the study population at this point. it is unclear how the rise in demand for dynamic, game-like content will be supported by vendors in the coming years. table 1 reflects the distribution of approaches to managing content observed in the study population. 18% 11% 53% 18% 71% content management no system unknown vendor-supplied system in-house created system it is our flagship | zvyagintseva 59 https://doi.org/10.6017/ital.v37i2.9987 table 1. content management in study population content management responses % vendor supplied system 15 54 in-house created system 5 18 no system 5 18 unknown 3 10 middleware, automation, and exhibit management middleware can be described as the layer of software between the operating system and applications running on the display, especially in a networked computing environment. for example, most organizations studied in the environmental scan supported a windows environment with a range of exhibit applications, like slideshows, web browsers, and executable files, such as games. middleware can simplify and automate the process of starting up, switching between, and shutting off display applications on a set schedule. as figure 4 demonstrates, the majority of the organizations in the study population (17, or 61%) did not have a middleware solution. however, this group was heterogeneous: 14 organizations (50%) did not require a middleware solution because they ran content semi-permanently or relied on user-supplied content, in which case the display functioned as a teaching tool. the remaining three organizations (11%) manually managed scheduling and switching between exhibit content. in such cases, a middleware solution would be valuable to management of content, especially as the number of applications grows, but it was not present in these organizations. comparatively, 10 organizations (36%) used a custom solution, such as a combination of windows or linux scripts to manage automation and scheduling of content on the display. one organization (3%) did not specify their approach to managing content. these findings suggest that no formalized solution to automating and managing software currently exists among the study population. in addition to organizing content, digital-exhibits services involve scheduling or automating content to meet user needs according to the time of day, special events, or seasonal relevance. as a result, the middleware technology solution supports sustainable management of displays and predictable sharing of content for end users. this environmental scan revealed that digital exhibits and interactive experiences are still in the early days of development. it is possible that new solutions for managing content both at the application and the middleware level may emerge in the coming years, but they are currently limited. information technology and libraries | june 2018 60 figure 4. middleware solutions in the study population. sources of content when finding sources of content to be displayed on digital displays, organizations interviewed used multiple strategies simultaneously. table 2 below brings together the findings related to this theme. table 2. content sources for digital exhibits content source % external/commissioned 64 user-supplied 64 internal/in-house 50 collaborative with partner 43 for example, many organizations rely on their users to generate and submit material (18, or 64%); others commission vendors to create exhibits for them (18, or 64%). in 50% of all cases, organizations also produce content for exhibits in-house. in other words, most organizations used a combination of all sources to generate content for their digital displays. only a few use a single 61% 36% 3% middleware use none custom unknown it is our flagship | zvyagintseva 61 https://doi.org/10.6017/ital.v37i2.9987 source of content, such as the semi-permanent historical exhibit at henrico county public library. others, like the duke media wall, rely entirely on their users to supply content, which employs a “for students by students” model of content creation. additionally, only 12 (43%) of the organizations interviewed had explored or established some form of partnership for creating exhibits. primarily, these partnerships existed with departments, centers, institutes, campus units, and/or students in academic settings, such as the computer science department, faculty of graduate studies, and international studies. other examples of partnerships were with similar civic, educational, cultural, and heritage organizations, such as municipal libraries, historical societies, art galleries, museums, and nonprofits. examples included study participants working with ars electronica, local symphony orchestras, harvard space science, and nasa on digital exhibits. clearly, a variety of approaches were taken in the study population to come up with digital exhibits content. content creation guidelines seven organizations (19%) in the study population shared publicly the content guidelines aimed to simplify the process of engaging users in creating exhibits. these guidelines were analyzed, and key elements were identified that are necessary for users to know in order to contribute in a meaningful way, thereby lowering the barrier to participation. these elements include resolution of the display screen(s), touch capability, ambient light around the display space, required file formats, and maximum file size. a complete list of organizations with such guidelines, along with websites where these guidelines can be found, is included in appendix c. based on the analysis of this limited sample, the bare minimum for community participation guidelines would include clearly outlining • the scope, purpose, audience, and curatorial policy of the digital exhibits service; • the technical specifications, such as the resolution, aspect ratio, and file formats supported by the display; • the design guidelines, such as colors, templates and other visual elements; • the contact information of the digital exhibits coordinator; and • the online or email submission form. it should be noted, however, that such specifications are primarily useful when a cms exists and the content solicited from users is at least somewhat standardized. for example, images, slides, or webpages may be easier for community partners to contribute than video games or 3d interactive content. no examples of guidelines for the latter were observed in the study. content scheduling whereas the middleware section of this study examined the technical approaches to content management and automation, this section explores the frequency of exhibit rotation from a service design perspective. as can be observed in figure 5, no consistent or dominant model for exhibit scheduling has been identified in the study population. generally, approaches to scheduling digital exhibits reflect organizational contexts. for example, museums typically design an exhibit and display it on a permanent basis, while academic institutions change displays of student work or scholarly communication once per semester. the following scheduling models have emerged in the descending order of frequency in the study population. information technology and libraries | june 2018 62 figure 5. content scheduling distribution in the study population. 1. unstructured (29%): no formal approach, policy, or expectation is identified by the organization regarding displaying exhibits. this model is largely related to the early stage of service development in this domain, lack of staff capacity to support the service, and/or responsiveness to user needs. one study participant, for example, referred to this loose approach by noting that “no formalized approach and no official policy exists.” for example, institutions may have frameworks for what types of content are acceptable but no specific requirements on the content subjects. institutions adopting a lab space model (see figure 6) for digital displays largely belong to this category. in other words, content is created on the fly through workshops, data analysis, and other situations as needed by users. in this case, no formal scheduling is required apart from space reservations. 2. seasonal (29%), which can be defined as a period from three to six months and includes semester-based scheduling in academic institutions. many organizations operate on a quarterly basis, so it would seem logical that content refresh cycles reflect the broader workflow of the organization. 3. permanent (21%): in the cases of museums, permanent exhibits may mean displaying content indefinitely or until the next hardware refresh, which might reconfigure the entire interactive display service. no specific date ranges were cited for this model. 4. monthly (10%): this pattern was observed among academic libraries, with production of “monthly playlists” featuring curated book lists or other monthly specials. 5. weekly (7%): north carolina state university and deakin university libraries aim to have fresh content up once per week; they achieve this in part by formalizing the roles needed to support their digital display and visualization services. 29% 29% 21% 10% 7% 4% content scheduling unstructured seasonal permanent monthly weekly daily it is our flagship | zvyagintseva 63 https://doi.org/10.6017/ital.v37i2.9987 6. daily (4%): only griffith university ensures that new content is available every day on its #seemore display; it does this largely by relying on standardized external and internal inputs, such as weather updates and the university marketing department content. staffing and skills one key element of the digital exhibits research project included investigating staffing models required to support a service of this nature. not surprisingly, the theme around resource needs for digital exhibits emerged in most interviews conducted. several participants have noted that one “can’t just throw up content and leave it” while others advised to “have expertise on staff before tech is installed.” data gathered shows that the average full-time equivalent (fte) needed to support digital display services in organizations interviewed was 2.97—around three full time staff members. in addition, 74% of the organizations studied had maintenance or support contracts with various vendors, including av integrators, cms specialists, creative studios that produced original content, or hardware suppliers. hardware and av integrators typically provided a 12-month contract for technical troubleshooting while creative studios ensured a 3month support contract for digital exhibits they designed. the average time to create an original, interactive exhibit was between 9 and 12 months according to the data provided by creative agencies, the cube teams, and learning organizations who have in-house teams creating exhibits regularly. this length of time varies on the complexity of interaction designed, depth of the exhibit “narrative,” and modes of input supported by the exhibit application. additionally, it was important to understand the curatorial labor behind digital exhibits; the author did not necessarily speak with the curator of exhibits, and this work may be carried out by multiple individuals within organizations with digital displays or creative studios. in 20 (57%) of the cases, the person interviewed also curated some of or all the content for the digital display in their respective institutions. in five (14%) of the cases, the individual interviewed was not a curator for any of the content, because there was no need for curation in the first place. for example, displays in these cases were used for analysis or teaching and therefore did not require prepared content. in the rest of the cases (10, or 29%), a creative agency vendor, another member of the team, or a community partner was responsible for the curation of exhibit content. this finding suggests that, while a significant number of organizations outsource the design and curation of exhibits, the majority retain control over this process. therefore, dedicating resources to curation, organization, and management of exhibit content is deemed significant by the organizations represented in the study. in terms of the capacity to carry out digital display services, skills that have been identified by study participants as being important to supporting work of this nature include the following: 1. technical skills (such as the ability to troubleshoot), general interest in technology, and flexibility and willingness to learn new things (74%) 2. design, visual, and creative sensibility (40%), as this type of work is primarily a visual experience 3. software-development or programming-language knowledge (31%) 4. communication, collaboration, and relationship-building (25%) 5. project management (20%) information technology and libraries | june 2018 64 6. audiovisual and media skills (14%), as digital exhibits are “as much an av experience as an it experience,” according to one study participant 7. curatorial, organizational, and content-management skills (11%) the most frequent dedicated roles mentioned in the interviews are shown in table 3. table 3. types of roles significant to digital exhibits work position responses % developer/programmer 11 31 project manager 8 23 graphic designer 6 17 user experience or user interface designer 4 11 it systems administrator 4 11 av or media specialist 4 11 the relatively low percentages represented in this table suggest the distribution of skills mentioned above among various team members or combining multiple skills in a single role, as may be the case in small institutions or those without formalized services with dedicated roles. nevertheless, the presence of specific job titles indicates understanding of various skill sets needed to run a service that uses digital displays. challenges and successes many challenges were identified by study participants related to initiating and supporting a service that uses digital displays for learning. clearly, multiple challenges could be associated with the services related to digital displays within a single organization. however, many successes and lessons learned were also shared by interviewees, often overlapping with identified challenges. this pattern suggests that some organizations can pursue strategies that address challenges faced by their library or museum colleagues while perhaps lacking resources or capacity in other areas related to this type of service. for example, some organizations have observed a lack of user engagement because of limited interactivity of the technology solution they used. others have had successful user engagement largely by investing in technology solutions that provide a range of modes of interaction. it is important to learn from both these areas to anticipate possible pain points and to be able to capitalize on successes that lead to industry recognition and engagement from library customers. table 4 summarized the range of challenges identified. it is our flagship | zvyagintseva 65 https://doi.org/10.6017/ital.v37i2.9987 table 4. challenges related to digital display services challenge identified responses % technical 14 41 content 11 33 costs 11 33 user expectations 11 33 workflow 10 29 service design 9 26 time 8 24 organizational culture 8 24 user engagement 7 20 as reflected in table 4, several key challenges have been discussed: 1. technical, such as troubleshooting the technology, keeping up with new technologies or upgrades, and finding software solutions appropriate for the hardware selected. 2. content, such as coming up with original content or curating existing sources. in the words of one participant, “quality and refresh of content is key—it has to be meaningful, interesting, and new.” this clearly presents a resource requirement. 3. costs, such as the financial commitment to the service, the unseen costs in putting exhibits together, software licensing, and hardware upgrades. 4. user expectations, such as keeping the service at its full potential, using maximum functionality of the hardware, and software solutions. according to study participants, users “may not want what they think or they say they want,” and to some extent, "such technologies are almost an expectation now, and not as exciting for users.” 5. workflow or project-management strategies specifically related to emergent multimedia experiences that require new cycles of development and testing. 6. time to plan, source, create, troubleshoot, launch, and improve exhibits. 7. service design, such as thinking holistically about the functions of the technology within the larger organizational structure. as one study participant stated, organizations “cannot disregard the reality of the service being tied to a physical space” in that these types of technologies are both a virtual and physical customer experience. 8. organizational culture and policy, in terms of adapting project-based approaches to planning and resourcing services, getting institutional support, and educating all staff about the purpose, function, and benefits of the service. 9. user engagement, particularly keeping users interested in the exhibits and continually finding new and exciting content. various participants have found that “linger time is information technology and libraries | june 2018 66 between 30 seconds to few minutes” and content being displayed needs to be “something interesting, unique, and succinct, but not a destination in itself.” despite the clear challenges with delivering digital exhibits services, organizations that participated in this study have identified keys to success (see table 5). table 5. successes and lessons learned in using digital displays successful approach or lesson identified responses % user engagement and interactivity 16 47 service design 14 41 “wow” factor 12 35 organizational leadership 12 35 technology solution 10 29 flexibility 10 29 communication and collaboration 10 29 project management 9 26 team and skill sets 9 26 as reflected in table 5, several approaches have been discussed: • user engagement and interactivity, particularly for those institutions that invested in highly interactive and immersive experiences; the rewards are seen in interest and enthusiasm of their user groups. • service design: organizations that have carefully planned the service have found that this technology was successfully serving the needs of their user communities. • promotion and “wow factor” that has brought attention to the organization and the service. it is not surprising that digital displays are central points on tours of dignitaries, political figures, and external guests. further, many have commented that they “did not imagine a library could be involved in such an innovative experiment,” and others have added that their digital displays have “created new conversations that did not exist before.” • leadership and vision at the organizational level, which secures support and resources as well as defines the scope of the service to ensure its sustainability and success: “money is not necessarily the only barrier to doing this service, but risk taking, culture.” • technology solution, where “everything works” and both the organization and users of the service are happy with the functionality, features, and performance of the chosen solution. • flexibility and willingness to learn new things, including being open to agile projectmanagement methods, taking risks, and continually learning new tools, technologies, and processes as the service matures. it is our flagship | zvyagintseva 67 https://doi.org/10.6017/ital.v37i2.9987 • communication and collaboration, both internally among stakeholders and externally by building community partnerships, new audiences, and user participation in content creation. for example, one study participant noted that the technology “has contributed to giving the museum a new audience of primarily young people and families—a key objective held in 2010 at the commencement of the gallery refurbishments.” • workflow and project management for those embracing new approaches required to bring multiple skill sets together to create engaging new exhibits. as one participant has put it, “these types of approaches require testing, improvement, a new workflow and lifecycle for the projects.” • having the right team with appropriate skills to support the service, though this theme was rated as being less significant than designing services effectively and securing institutional support for the technology service. in other words, study participants noted that having in-house programming or design skills is not enough without proper definition of success for digital exhibits services. perceptions institutional and user reception of digital displays as a service to pursue in learning organizations has been identified as overwhelmingly positive, with 87% of the organizations noting positive feedback. for example, one study participant noted the positive attention received by the wider community for the digital display, stating “it is our flagship and people are in general impressed by both the potential and some of the existing content." some participants have gone as far as to say that the reception among users has been “through the roof” and they have “never had a negative feedback comment” about their display. this finding indicates a high degree of satisfaction with such technologies by organizations that pursued a digital display. table 6 further explores the range of perceptions observed in the study. table 6. perception of digital display services perception responses % positive 20 87 hesitation or uncertainty 7 30 concerns about purpose 4 17 concerns about user engagement 4 17 concerns about costs 3 13 negative 3 13 a minority (13%) have noted some negative perceptions, largely related to concerns about costs or functionality of the technology; 30% have observed uncertainty and hesitation on behalf of the staff and users in terms of engagement as well as interrogating its purpose in the organization. for example, one study participant summarizes this mixed sentiment by saying, “the perception is information technology and libraries | june 2018 68 that it’s really neat and worthwhile for exploring new ways of teaching, but that the same features and functions could be achieved with less (which we think is a good thing!).” it is helpful to note this trend in perception, as any new service will likely bring a mixture of excitement, hesitation, and occasional opposition. interestingly, these reactions have originated both from the staff of organizations interviewed and their communities of users. discussion the findings from this study indicate that the functions of the digital displays are highly dependent on the organizational context in which displays exist. this context, in turn, defines the nature of the services delivered through the digital display. for example, figure 6 can be useful in classifying the various ways digital displays appear in the study population, from research and teaching-oriented lab spaces to public spaces with passive messaging or active immersive gamelike digital experiences. figure 6. types of digital displays in the study population. as such, visualization walls might belong in the “lab spaces” category that typically appears in academic libraries or research units and do not require content planning and scheduling. what we might call “digital interactive exhibits” tend to appear in museums and galleries with a primarily public audience and may have a permanent, seasonal, or monthly rotation schedule. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, the digital-exhibits services were perceived as being overwhelmingly satisfactory in all types of organizations included in this study because they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the data gathered in the environmental scan demonstrates that there is currently little consistency among digital displays in learning environments. this lack of consistency is seen in content-development methods among study participants, their programming, content it is our flagship | zvyagintseva 69 https://doi.org/10.6017/ital.v37i2.9987 management, technology solutions, and even naming of the display (and, by extension, the display service). for example, this study revealed that no evidently “open platform” for managing content at the application or the middleware level currently exists. a small number of software tools are used by organizations to support digital displays, but their use is in no way standardized, as compared to nearly every other area of library services. there is some indication that digitaldisplay services may become more standardized in the coming years, and more tools, solutions, vendors, and communities of practice will be available. for example, many signage cmss are currently on the market, and the number of game-like immersive experience companies is growing, suggesting extension of these services to libraries in the coming years. only a few software tools exist for creating exhibits, such as intuiface and touchdesigner, though no free, open-source versions of exhibit software are currently available. as well, the growing number of digital exhibits and interactive media companies currently focuses on turnkey—rather than software-as-a-service or platform—solutions. in contrast, some consistency exists in staffing needs and skills required to support the digitalexhibits service. a majority of organizations interviewed agreed that design, software development, systems administration, and project-management skills are needed to ensure digital-exhibits services run sustainably in a learning organization. in addition, lack of public library representation in this study makes it challenging to draw parallels to the library context. adapting museum practices is also not necessarily reliable, as there is rarely a mandate to engage communities and partner on content creation, as there is in libraries. for example, only the el paso (texas) museum of history engages the local community to source and organize content. these findings suggest that digital displays are a growing domain, and more solutions are likely to emerge in the coming years. the cube, compared to the rest of the study population, is a unique service model because it successfully brings together most elements examined in the environmental scan. for example, to ensure continual engagement with the digital display, the cube schedules exhibits on a regular basis and employs user interface designers, systems administrators, software engineers, and project managers. it also extends the content through community engagement, public tours, and stem programming. it has created an in-house middleware solution to simplify exhibit delivery and has chosen unity3d as its platform of choice for exhibit development. limitations only organizations from english-speaking countries were interviewed as part of the environmental scan. it is therefore unclear if access to organizations from non–english-speaking countries would have produced new themes and significantly different results. in addition, as with all environmental scans, the data is limited by the degree of understanding, knowledge, and willingness to share information of the individual being interviewed. particularly, individuals with whom the author spoke may or may not have been technology or service leads for the digital display at their respective institutions. thus, the study participants had a range of understanding of hardware specifications, functionality, and service-design components associated with digital displays. for example, having access to technology leads would have likely provided more nuanced responses around the middleware solutions and the underlying technical infrastructure required to support this service. information technology and libraries | june 2018 70 a small number of vendors were also interviewed as part of the environmental scan even though vendors did not necessarily have digital displays or service models parallel to libraries or museums. they are included in appendix b. nevertheless, gathering data from this group was deemed relevant to the study, as creative agencies have formalized staffing models and clearly identified skill sets necessary to support services of this nature. in addition, this group possesses knowledge of best practices, workflows, and project-management processes related to exhibit development. finally, this environmental scan also did not capture any interaction with direct users of digital displays, whose experiences and perceptions of these technologies may or may not support the findings gathered from the organizations interviewed. these limitations were addressed by increasing the sample size of the study within the time and resource constraints of the research project. conclusion the findings of this study show that the functions of digital-display technologies and their related services are highly dependent on the organizational context in which they exist. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, digital displays were perceived as being overwhelmingly positive in all types of organizations interviewed in this study, as they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the successes and lessons learned from the study population are meant to provide a broader perspective on this maturing domain as well as help inform planning processes for future digital exhibits in learning organizations. it is our flagship | zvyagintseva 71 https://doi.org/10.6017/ital.v37i2.9987 appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations 1. what are the technical specifications of the digital interactive technology at your institution? 2. who are the primary users of this technology (those interacting with the platform)? is there anyone you thought would use it and isn’t? 3. what are primary uses for the technology (events, presentations, analysis, workshops)? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) 5. where is content created and how is this content managed? 6. what is the schedule for the content and how is it prioritized? 7. can you estimate the fte (full-time equivalent) of staff members involved in supporting this technology/service, both directly and indirectly? what does indirect support for this technology entail? 8. in your experience, what kinds of skills are necessary in order to support this service? 9. have partnerships with other organizations producing content to be exhibited been established or explored? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? 12. what has been the biggest success of this service and what has been the biggest disappointment? 13. what is the perception of this technology in institution more broadly? 14. are there any other institutions you suggest we contact to learn more about similar technologies? information technology and libraries | june 2018 72 digital exhibits environmental scan interview questions: vendors 1. what is the relationship between creative studio and hardware/fabrication? do you do everything or work with av integrators instead to put together touch interactives? 2. who have been the primary users of the interactive exhibits and projects you have completed? 3. who writes the use cases when creating a digital interactive exhibit? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) do you see a rise in interest for 3d and game-like environments and do you have internal expertise to support it? 5. where is content created for the exhibits and how is this content managed? who curates? 6. what timespan or lifecycle do you design for? 7. how big is your team? how long to projects typically take to create? 8. what types of expertise do you have in house? what might a project team look like? 9. to what extent is there a goal of sharing knowledge back with the company from clients or users? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? it is our flagship | zvyagintseva 73 https://doi.org/10.6017/ital.v37i2.9987 appendix b: study population in environmental scan organization location date interviewed all saints anglican school merrimac, australia july 25, 2016 anode nashville, tn july 22, 2016 belle & wissell seattle, wa july 26, 2016 bradman museum bowral, australia july 10, 2016 brown university library providence, ri june 3, 2016 university of calgary library and cultural resources calgary, ab june 2, 2016 deakin university library geelong, australia june 14, 2016 university of colorado denver library denver, co june 24, 2016 duke university library durham, nc august 17, 2016 el paso museum of history el paso, tx june 24, 2016 georgia state university library atlanta, ga june 10, 2016 gibson group wellington, new zealand july 16, 2016 henrico county public library henrico, va august 9, 2016 ideum corrales, nm july 26, 2016 indiana university bloomington library bloomington, in may 31, 2016 interactive mechanics philadelphia, pa august 2, 2016 johns hopkins university library baltimore, md june 20, 2016 nashville public library nashville, tn july 22, 2016 north carolina state university library raleigh, nc june 8, 2016 university of north carolina atchapel hill library chapel hill, nc june 2, 2016 university of nebraska omaha omaha, ne june 16, 2016 omaha do space omaha, ne july 11, 2016 university of oregon alumni center eugene, or june 7, 2016 philadelphia museum of art philadelphia, pa august 10, 2016 queensland university of technology brisbane, australia june 30; july 29, 2016; august 16, 2016 société des arts technologiques montreal, qc august 8, 2016 second story portland, or july 28, 2016 st. louis university st. louis, mo july 4, 2016 stanford university library stanford, ca july 22, 2016 university of illinois at chicago chicago, il june 22, 2016 university of mary washington fredericksburg, va july 7, 2016 visibull waterloo, on august 12, 2016 university of waterloo stratford campus stratford, on june 22, 2016 yale university center for science and social science information new haven, ct july 13, 2016 information technology and libraries | june 2018 74 appendix c: digital content publishing guidelines organization name guidelines website deakin university library http://www.deakin.edu.au/library/projects/sparking-trueimagination duke university https://wiki.duke.edu/display/lmw/lmw+home griffith university https://intranet.secure.griffith.edu.au/work/digitalsignage/seemore north carolina state university library http://www.lib.ncsu.edu/videowalls university colorado denver http://library.auraria.edu/discoverywall university of calgary library and cultural resources http://lcr.ucalgary.ca/media-walls university of waterloo stratford campus https://uwaterloo.ca/stratford-campus/research/christiemicrotiles-wall http://www.deakin.edu.au/library/projects/sparking-true-imagination http://www.deakin.edu.au/library/projects/sparking-true-imagination https://wiki.duke.edu/display/lmw/lmw+home https://intranet.secure.griffith.edu.au/work/digital-signage/seemore https://intranet.secure.griffith.edu.au/work/digital-signage/seemore http://www.lib.ncsu.edu/videowalls http://library.auraria.edu/discoverywall http://lcr.ucalgary.ca/media-walls https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall it is our flagship | zvyagintseva 75 https://doi.org/10.6017/ital.v37i2.9987 references 1 flora salim and usman haque, “urban computing in the wild: a survey on large scale participation and citizen engagement with ubiquitous computing, cyber physical systems, and internet of things,” international journal of human-computer studies 81 (september 2015): 31–48, https://doi.org/10.1016/j.ijhcs.2015.03.003. 2 peter peltonen et al., “it’s mine, don't touch! interactions at a large multi-touch display in a city center,” proceedings of the sigchi conference on human factors in computing systems, florence, italy, april 5–10, 2008, 1285–94, https://doi.org/10.1145/1357054.1357255. 3 shawna sadler, mike nutt, and renee reaume, “managing public video walls in academic library,” (presentation, cni spring 2015 meeting, seattle, washington, april 13-14, 2015), http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf. 4 peltonen et al., “it’s mine, don't touch!” 5 john brosz, e. patrick rashleigh, and josh boyer. “experiences with high resolution display walls in academic libraries” (presentation, cni fall 2015 meeting, washington, dc, december 13-14, 2015), https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf; bryan sinclair, jill sexton, and joseph hurley, “visualization on the big screen: hands-on immersive environments designed for student and faculty collaboration” (presentation, cni spring 2015 meeting, seattle, washington, april 13–14, 2015), https://scholarworks.gsu.edu/univ_lib_facpres/29/. 6 niels wouters et al., “uncovering the honeypot effect: how audiences engage with public interactive systems. conference on designing interactive systems,” dis ’16 proceedings of the 2016 acm conference on designing interactive systems, brisbane, australia, june 4–8, 2016, 516, https://doi.org/10.1145/2901790.2901796. 7 gonzalo parra, joris klerkx, and erik duval, “understanding engagement with interactive public displays: an awareness campaign in the wild,” proceedings of the international symposium on pervasive displays, copenhagen, denmark, june 3–4, 2014, 180–85, https:/doi.org/10.1145 /2611009.2611020; ekaterina kurdyukova, mohammad obaid, and elisabeth andre, “direct, bodily or mobile interaction?,” proceedings of the 11th international conference on mobile and ubiquitous multimedia, ulm, germany, december 4–6, 2012, https://doi.org/10.1145 /2406367.2406421; tongyan ning et al., “no need to stop: menu techniques for passing by public displays,” proceedings of the 2011 annual conference on human factors in computing systems, vancouver, british columbia, https://www.gillesbailly.fr/publis/bailly_chi11.pdf. 8 jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. 9 parra et al, “understanding engagement,” 181. https://doi.org/10.1016/j.ijhcs.2015.03.003 https://doi.org/10.1145/1357054.1357255 http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf https://scholarworks.gsu.edu/univ_lib_facpres/29/ https://doi.org/10.1145/2901790.2901796 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2406367.2406421 https://doi.org/10.1145/2406367.2406421 https://www.gillesbailly.fr/publis/bailly_chi11.pdf https://doi.org/10.7763/ijiee.2015.v5.566 https://doi.org/10.7763/ijiee.2015.v5.566 information technology and libraries | june 2018 76 10 parra et al, “understanding engagement,” 181; walter, robert, gilles gailly, and jorg müller. “strikeapose: revealing mid-air gestures on public displays.” proceedings of the sigchi conference on human factors in computing systems, paris, france, april 27-may 2, 2013, 841850. https://doi.org/10.1145/2470654.2470774. 11 philipp panhey et al., “what people really remember: understanding cognitive effects when interactive with large displays,” proceedings of the 2015 international conference on interactive tabletops & surfaces, madeira, portugal, november 15–18, 2015, 103–6, https://doi.org/10.1145/2817721.2817732. 12 christopher ackad et al., “an in-the-wild study of learning mid-air gestures to browse hierarchical information at a large interactive public display,” proceedings of the 2015 acm international joint conference on pervasive and ubiquitous computing, osaka, japan, september 7–11, 2015, 1227–38, https://doi.org/10.1145/2750858.2807532. 13 parra et al, “understanding engagement,” 181; kurdyukova, obaid and andre, 2012, n.p. 14 jouni vepsäläinen et al., “web-based public-screen gaming: insights from deployments,” ieee pervasive computing 15 no. 3 (2016): 40–46, https://ieeexplore.ieee.org/document/7508836/. 15 uta hinrichs, holly schmidt, and sheelagh carpendale, “emdialog: bringing information visualization into the museum,” ieee transactions on visualization and computer graphics 14 no. 6 (november 2008):1181-1188. https://doi.org/10.1109/tvcg.2008.127. 16 hinrichs, schmidt, and carpendale, “emdialog.” 17 sarah clinch et al., “reflections on the long-term use of an experimental digital signage system,” proceedings of the 13th international conference on ubiquitous computing, beijing, china, september 17-21, 2011, 133-142. https://doi.org/10.1145/2030112.2030132. 18 elaine m. huang, anna koster, and jan borchers. “overcoming assumptions and uncovering practices: when does the public really look at public displays?,” proceedings of the 6th international conference on pervasive computing, sydney, australia, may 19-22, 2008, 228-243. https://doi.org/10.1007/978-3-540-79576-6_14; jorg muller et al., “looking glass: a field study on noticing interactivity of a shop window,” proceedings of the sigchi conference on human factors in computing systems, austin, texas, may 5-10, 2012, 297-306. https://doi.org/10.1145/2207676.2207718. 19 salim & haque, “urban computing in the wild,” 35 20 mettina veenstra et al., “should public displays be interactive? evaluating the impact of interactivity on audience engagement,” proceedings of the 4th international symposium on pervasive displays, saarbruecken, germany, june 10–12, 2015, 15–21, https://doi.org/10.1145/2757710.2757727. 21 clinch et al., “reflections.” https://doi.org/10.1145/2470654.2470774 https://doi.org/10.1145/2817721.2817732 https://doi.org/10.1145/2750858.2807532 https://ieeexplore.ieee.org/document/7508836/ https://doi.org/10.1109/tvcg.2008.127 https://doi.org/10.1145/2030112.2030132 https://doi.org/10.1007/978-3-540-79576-6_14 https://doi.org/10.1145/2207676.2207718 https://doi.org/10.1145/2757710.2757727 it is our flagship | zvyagintseva 77 https://doi.org/10.6017/ital.v37i2.9987 22 robert ravnik and franc solina, “audience measurement of digital signage: qualitative study in real-world environment using computer vision,” interacting with computers 25, no. 3 (2013), https://doi.org/10.1093/iwc/iws023. 23 neal buerger, “types of public interactive display technologies and how to motivate users to interact,” media informatics advanced seminar on ubiquitous computing, 2011, hausen, doris, conradi, bettina, hang, alina, hennecke, fabiant, kratz, sven, lohmann, sebastian, richter, hendrik, butz, andreas and hussmann, heinrich (eds). university of munich, department of computer science, media informatics group, 2011. https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf . 24 c. g. screven, “information design in informal settings: museums and other public spaces,” in information design, ed. robert e. jacobson (cambridge, ma: mit press, 2000), 131–192. 25 parra et al., “understanding engagement,” 181. 26 uta hinrichs and sheelagh carpendale, “gestures in the wild: studying multi-touch gesture sequences on interactive tabletop exhibits,” proceedings of the sigchi conference on human factors in computing systems, vancouver, british columbia, may 7–12, 2011, 3023–32, https://doi.org/10.1145/1978942.1979391. 27 harry brignull and yvonne rogers, “enticing people to interact with large public displays in public spaces,” interact ’03, proceedings of the international conference on human-computer interaction, zurich, switzerland, september 1-5, 2003, 17-24, matthias rauterberg, marino menozzi, and janet wesson (eds.), tokyo: ios press, 2003. http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact200 3-p17.pdf. 28 peltonen et al., “it’s mine, don't touch!” 29 peltonen et al., “it’s mine, don't touch!” 30 anne horn, bernadette lingham, and sue owen, “library learning spaces in the digital age,” proceedings of the 35th annual international association of scientific and technological university libraries conference, espoo, finland, june 2-5, 2014. http://docs.lib.purdue.edu/iatul/2014/libraryspace/2. https://doi.org/10.1093/iwc/iws023 https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf https://doi.org/10.1145/1978942.1979391 http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://docs.lib.purdue.edu/iatul/2014/libraryspace/2 abstract introduction method literature review definitions interactivity user engagement age display content social context findings technical and hardware landscape users and use cases figure 1. audience types for digital displays in the study population. content types and management middleware, automation, and exhibit management sources of content content creation guidelines content scheduling staffing and skills challenges and successes perceptions discussion figure 6. types of digital displays in the study population. limitations conclusion appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations digital exhibits environmental scan interview questions: vendors appendix b: study population in environmental scan appendix c: digital content publishing guidelines references 176 journal of library a-utomation vol. 2/3 september, 1969 book reviews information retrieval systems; characteristics, testing, and evaluation, by f. wilfred lancaster. new york, john wiley & sons, 1968. 222 pp. $9.00. despite the fact that users retrieve the majority of information that they obtain from collections such as libraries by employing author / title listings in catalogs, information scientists consider only subject listings in discussions of information retrieval this book is no excepton. lancaster defines an information retrieval system as informing a user "on the existence (or non-existence) and whereabouts of documents relating to his request." half of his book treats of characteristics and operation of information retrieval systems and half of testing and evaluating such systems. it is the latter half of the book that distinguishes it from other general introductions to the subject. for the testing and evaluation sections of his book, the author draws heavily on his experience gained while working on the cranfield project as well as at the national library of medicine. at the latter he examined a segment of the real world in a major investigation of the medlars system. an interesting finding of the medlars study that he reports in the book, but on which he does not elaborate, is that there was no relationship between recall ratio percentage and precision ratio percentage for 299 searches examined. in his preface the author expresses the hope that his book will be helpful to students and useful to practitioners. however, a principal function of such an introduction is to guide the reader in further pursuit, or retrieval, of information. in this function the book does not succeed, for seven chapters are barren of references, another eight average somewhat more than three, and the remaining chapter boasts fifty-three. this book will not supplant other general introductions to information retrieval systems, but its discussion of testing and evaluation is a useful introduction. frederick g. kilgour book reviews 177 how to manage your information, by bart e. holm. new york, reinhold book corporation, 1968. 292 pp. $10.00 essential information exceeds the grasp of the keenest minds in all professions. a method of readily obtaining needed resource material can be a particularly knotty problem for those who have no background in appropriate methods of data storage and retrieval. successful operation for many professionals depends directly upon their ability to work out a practical personal system which does not require complex apparatus, excessive cost or time. the purpose of this volume is to help such individuals evaluate their particular needs and design a method of managing information which will be workable and practical. i found the book enjoyable and informative. it immediately recommends itself with its own efficient organization, attractive format, readable style, clever illustrations, and complete indexing. it not only deals with the broad principles necessary for development of a personal information system but also includes specific information of a practical nature on the approach to this problem for professionals in several different fields. the first chapter, which is titled "man the collector", is fascinating to an unsophisticated non-librarian. it outlines the enormous problem of the growth of world-wide information that appears to be proliferating in an almost malignant manner. this served to emphasize a repeatedly stressed cardinal principle: the need to be selective, so that only items of probable real value will be retained. a most valuable chapter for those not experienced in library work relates to the basic principles for retrieval on a single or multiple entry basis. this logically leads into a discussion of how to evaluate the individual's personal need. the operations of specific simple systems, such as optical coincidence, termatrex, keysort and term cards were adequately discussed. individual chapters are devoted to the unique problems that might be encountered by the engineer, the chemist, the physicist, the architect, the doctor, and the archivist, with emphasis on the specific vocabulary needed for proper organization and a brief review of information sources of the various disciplines. the remaining seven chapters deal with proper use of available sources of information, such as keeping current with the literature, use of the modern library, records management, microfilming, and data systems of the present and the future. this volume should be a real value to many who have limited background and are struggling in vain to keep up with the information they need. it can provide practical pointers for those who want to make a serious effort toward establishing and maintaining a system of storage and retrieval of information that does not rely on an all too often faulty memory. ellis a. fuller, m.d. 178 journal of library automation vol. 2/3 september, 1969 the institutes of education union list of periodicals processing system, by j. d. dews and j. m. smethurst. ( symplegades, no. 1). newcastleupon-tyne, oriel press ltd., 1969. 39 pp. sbn ( 69uk) 85362 060 1. 15s. the first half of this small manual is devoted to describing the file .maintenance .and text editing system developed by the university of newcastle-upon-tyne. the second half of the text is devoted to the technical specifications of the newcastle file handling system and refers specifically to the english electric-leo marconi kdf 9 computer. the system described is the application of a series of general purpose programs, that provide the capability of storing, adding, deleting, or changing variable length records, to a union list project for a group of libraries. unfortunately this otherwise well designed system has not been able to do away with the manual "typed slips" back-up file which plagues so many other computerized union list projects. also of interest in this processing system is the use of the work developed at the newcastle computer typesetting research project for computer controlled composition of the final output. section two of seminar on the organization and handling of bibliographic records by computer, newcastle-upon-tyne, 1967 edited by nigel s. m. cox and michael w. grose (archon books, hamden, connecticut, 1967) is the preferred description of all aspects of the system except for those who need the program specifications. alan d. hogan computer based information retrieval systems, edited by bernard houghton. camden, conn., archon books, 1969. 136 pp. $5.00. this book contains six papers that their authors presented at a special course in april 1968 at the liverpool school of librarianship. the objective of the course was to survey the major computer based informational retrieval systems operating in the united kingdom for an audience of prospective users and planners. the book is a successful elementary introduction to large information retrieval systems. in the 1940's and early 1950's, such pioneers as w. e. baten, g. cordonnier, calvin mooers and mortimer taube developed new techniques for information retrieval, a phrase which mooers coined. the major innovation in the new development was "coordinate indexing" or the coordination of index terms at the time of searching. coordination employed simpl~ boolean logic -"and," "or," and "but not." coordinate indexing increased flexibility of searching and number of accesses to documents in contrast to the inflexible, pre-coordinated traditional subject catalogs. book reviews 179 it was also characteristic of the early systems that they dealt with relatively small files of documents not under classical bibliographical control -patents, internal reports, and segments of external report literature. with the advent of the computer, it became feasible to apply the new information retrieval techniques to large files of traditional materials, but to date the major effort has been directed toward huge files of journal articles. it is, therefore, no surprise to find that the five chapters in computer based information retrieval systems that describe systems all depict retrieval from files of journal articles. these five systems are medlars, the science citation index (sci) and its peripherals, chemical titles ( ct) and chemical biological activities ( cbac), a burgeoning institution of electrical engineers (lee ) sponsored project in selective dissemination of electronics information, and a minor computer application to production of the british technology index; the three major, operational projects are of united states origin. selective dissemination is a gt·atifying feature of sci, ct, cbas, and the lee project, for sdi applications take advantage of the computer's potential for personalization by servicing individual users on the basis of their individual needs. the book is a successful primer that provides a useful introduction to computer based systems for retrieval of journal citations from large files. g. a. somerfield's last chapter, "state of the art of computer based information retrieval systems," is more than its title implies, for the last half of the chapter analyzes desirable improvements yet to be achieved. the first half could well serve as an introduction to the book. recently, several worthwhile primers on information retrieval and retrieval systems have appeared. computer based infotimation retrieval systems is still another to provide the brief, clear, elementary introduction that new students, new users, and new planners find most effective in providing an understanding of an unfamiliar field. frederick g. kilgour modern data processing, by robert r. arnold, harold c. hill and aylmer v. nichols. new york, john wiley and sons, inc., 1969. 370 pp. $8.95 this book is an updated version of the authors' previous book, introduction to data processing, john wiley and sons, 1966. the present volume is designed to be used as an introductory text to the concepts of all facets of data processing. it will not teach people to be programmers or systems analysts but it can be very useful to anyone who would like to learn about data processing without having to become a programmer or systems analyst. the book is well organized and explains, in non-technical terms, highly technical facets of data processing. this book can be used not only 180 journal of library automation vol. 2/3 september, 1969 at the high school level but also at the beginning college level. in it the authors strived and achieved to ·make . available all the latest advancements in the computer science field. in my opinion the authors have achieved then· goal of developing a very good elementary text in data processing. i highly recommend this book to librarians and all others as a basic primer in automation. it will be particularly useful to administrators, as it has an excellent glossary that assist them in their understanding of the data processing vocabulary and jargon. thomas k. burgess solving seo issues in dspace-based digital repositories: a case study and assessment of worldwide repositories article solving seo issues in dspace-based digital repositories a case study and assessment of worldwide repositories matúš formanek information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12529 matúš formanek (matus.formanek@fhv.uniza.sk) is assistant professor in the department of mediamatics and cultural heritage, faculty of humanities, university of zilina, slovakia. © 2021. abstract this paper discusses the importance of search engine optimization (seo) for digital repositories. we first describe the importance of seo in the academic environment. online systems, such as institutional digital repositories, are established and used to disseminate scientific information. next, we present a case study of our own institution’s dspace repository, performing several seo tests and identifying the potential seo issues through a group of three independent audit tools. in this case study, we attempt to resolve most of the seo problems that appeared within our research and propose solutions to them. after making the necessary adjustments, we were able to improve the quality of seo variables by more than 59% compared to the non-optimized state (a fresh installation of dspace). finally, we apply the same software audit tools to a sample of global institutional repositories also based on dspace. in the discussion, we compare the seo sample results with the average score of the semi-optimized dspace repository (from the case study) and make conclusions. introduction and state of art search engine optimization (seo) is a crucial part of the academic electronic environment. all their users attempt to process too much information and need to retrieve information fast and effectively. making academic information findable is essential. digital institutional repository systems, used to disseminate scientific information, must present their content in ways that make it easy for researchers elsewhere to find. in this paper, we describe work conducted in the department of mediamatics and cultural heritage at faculty of humanities, university of zilina to improve the discoverability of materials contained within its dspace institutional repository. in the literature review, we examine definitions of website quality and discuss audit tools. then, beginning our case study, we describe the tools applied at our institution. we next describe the selection process of a suitable set of testing tools, focused on the optimization of seo variables of the selected institutional repository running with dspace software, that will be applied later in the case study. the remainder of the article focuses on the identification and resolution of potential seo issues using the three independent online tools we selected. we aim to resolve as many problems as possible and compare the level of achieved improvement with the default installation of dspace 6.3 software which our digital repository is based on. the primary goal is not only to improve the seo parameters of the discussed system but also to increase the searchability of scientific website content disseminated by dspace-based digital repositories. next, we offer insights into worldwide dspace-based repositories. we will show that dspace is currently one of the most widely used software packages to support and run digital repositories. unfortunately, there are many major seo issues that will be discussed later. the secondary objective of this paper is to use the same set of tools to evaluate the current state of the sample of worldwide digital repositories also based on dspace. we will provide the report based on our own findings. in the discussion, the seo score of the optimized dspace (from th e case study) will be mailto:matus.formanek@fhv.uniza.sk information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 2 compared with the results of the current state of seo parameters from the worldwide dspace repositories. finally, our work also carries out many relatively innovative approaches related to digital repositories that have not been extensively debated anywhere in the literature yet. literature review to achieve our goal, we started with a review of existing academic papers. drawing from those papers we describe the current state of academic institutions’ presentation through the internet and search engines. in this sense, we focus on website optimization. the internet, as a medium, is still rapidly expanding. a massive amount of data is communicated, shared, and available online, as noted by christos ziakos: as a result, billions of websites were created, which made it hard for the average (or even advanced) user to extract useful information from the web efficiently for a specific search. the need for an easier, more efficient way to search for information led to the development of search engines. gradually, search engines began to assess the relevance of every website on their indexes compared to the queries provided to them by the users. they took into consideration several website characteristics and metrics and calculated the value of each website using complex algorithms. the enormous number of websites being indexed from search engines, along with the increasing competition for the first search results, led to studying and implementing various techniques in order for websites to appear more valuable in search engines.1 that description applies equally to academic websites as well as commercial ones. a review of relevant literature suggests that it is very important for academic institutions to carefully consider and apply website optimization. there were around 28,000 universities worldwide in 2010, according to one study that monitored research in the field of worldwide academic webometrics.2 the actual number of universities seems to be very similar in 2020. baka and leyni affirm in their working paper that the success or failure of an academic institution depends on its website: “the work of each university exists only when it encounters and interacts with society. their popularity with the public is steadily growing.” what is directly connected with the institution’s presence in the world wide web.3 many authors define the term search engine optimization (seo) as a series of processes that are conducted systematically to improve the volume and quality of traffic from search engines to a specific site by utilizing the working mechanism or algorithm of the search engine. it is a technique of optimization a website’s structure and content to achieve a higher position in search results. the aim is to make increase the website’s ranking in a web search results.4 after an extensive information retrieval in the relevant literature, we can conclude that although seo is currently a widely discussed topic, there is very little accessible scientific literature related to seo applications in the field of digital repositories in general, and none at all in the particular subset of dspace-based repositories. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 3 website quality many authors generally affirm that there is a positive correlation between academic excellence and the complex web presence of an institution. it confirms that website quality is a factor that can give us a predictive or causal relationship with seo performance.5 numerous tools could be employed to measure the quality of websites, test them closely and produce an seo performance ranking websites ability to properly promote their content through the search engines. for example, the academic ranking of world universities (the shanghai ranking, http://www.shanghairanking.com) has been established for the top 1,000 universities in the world. the website quality is considered by the authors as the quality of institution’s online presence, its ability to properly promote digital content in search engines and finally, in combination, its overall web presence. according to the shanghai ranking list, this is a factor for some “prospective students to decide on whether they will enroll in a specific institute or not. ” 6 a number of recent studies have also attempted to examine the online presence of academic institutions from various points of view. one of the older studies mentioned that the quality of academic websites is very important for students in the process of enrollment.7 another key aspect is the optimized website performance as well as seo and website security.8 audit tools if we want to perform any optimization, we need an appropriate software tool to check a current website’s ranking. according to g2, the world’s largest technology online marketplace, seo software is designed to improve the ranking of websites in search engine results pages without paying the search engine provider for placement. these tools provide seo insights to companies through a variety of different features, helping identify the best strategies to improve a website’s search relevance.9 seo audit software could be used by seo specialists or system administrators, as well. audit software performs one or more of the following functions in relation to seo: content optimization, keyword research, rank tracking, link building, or backlink monitoring. the software then provides reports on the optimization-related metrics.10 many authors stress the importance of a holistic approach to seo factors (24 factors were tested), but it depends on the most effective ones: for example, the quantity and quality of the backlinks, the ssl certificate and so on, which will be described later in this paper.11 the quality of academic websites is very important for researchers, too. they need to disseminate scientific information and communicate it in effective ways. according to some authors, the topic of academic seo (aseo) has been gaining attention in recent years.12 aseo applies seo principles to the search for academic documents in academic search engines such as google scholar and microsoft academic. in another scientific paper, aseo is considered as very similar to traditional seo, where institutions want to make good use of a seo to promote digital scientific content on the internet. beel, gipp, and wilde emphasize the importance for researchers to ensure that their publications will receive a high rank on academic search engines.13 by making good use of aseo, researchers will have a higher chance of improving the visibility of their publications and have their work read and cited by more researchers. in recent years, digital institutional repositories (as the academic systems) have been used as modern ways of promotion and dissemination of digital scientific objects through the internet. digital objects need to reach a wider audience—digital repositories have a form of website interface, interact with students, teachers, or researchers on a daily basis and use the number of citations, articles, theses or other research objects. institutional repositories are affected by search http://www.shanghairanking.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 4 engines too, so some improvements on repositories’ seo parameters are needed. these factors contribute to a system’s rankings. seo on institutional repositories is not considered an absolutely new scientific topic. kelly stressed eight years ago that google is critical in driving traffic to repositories. he analyzed results from a survey describing the summarization of seo findings for the 24 institutional repositories in the united kingdom. the survey results showed that referring platforms were primarily responsible for driving traffic to those institutional repositories—thanks to many hypertext links in referring domains.14 since then, seo analyses of digital repositories have not been a widely discussed topic in the literature. it is a relatively unique topic to discuss seo on a specific type of digital repository software—dspace, as the most used and popular software for running digital libraries and repositories.15 consequently, this paper focuses on that topic since the dspace-based digital repository is a complex online computer system where some seo parameters could be adjusted. seo audit tools help to identify areas of potential adjustments of those website properties that could help produce higher rankings in search engines (and improve the whole system visibility). audit tools selection process website variables that affect seo can be tested using specialized online software tools. this topic is discussed in detail on a semi-professional level on specialized websites that provide a number of recommendations regarding the use of specific tools as well as evaluations of the tools.16 these tools can keep track of changes in many seo variables. we want to use this approach in our study. however, first we need to choose the appropriate set of these tools. we have found that many seo audit tools mentioned in professional online sources are narrowly specialized.17 for example, they may be focused only on keyword analysis, backlink analysis (for example, ahrefs’ free backlink checker), and so on. in our study, we intend to describe a greater number of seo parameters to monitor rather than emphasize only a few selected ones. we also need tools that are fully available online for free. based on these criteria, we immediately excluded several tools from the selection, because they provide only austere, simple, or restricted information. many tools were excluded because they were limited to a single test with the requirement of registration or provision of an email address. a number of testing tools were also available only in paid versions. we wanted a set of tools that focus on several aspects of seo analyses and evaluate the quality of websites’ seo variables comprehensively. it is important to add that the selected tools results must be comparable, too. after careful consideration of all possibilities, we finally decided to choose three independent seo audit tools in order to make the approach more transparent. the selected tools met most of the criteria mentioned above. however, it is very important to note that many other software tools surely meet the criteria and could also be suitable for testing purposes. based on the scientific literature review, we were not able to identify specific recommendations in this regard; therefore, we have been inspired by the advice offered in the websites and blogs previously mentioned that are focused primarily on seo. our tools selection is as follows (listed in alphabetical order): 1. seo checker (https://suite.seotesteronline.com/seo-checker ) is part of a complex audit software suite called seo tester online suite. seo checker provides tests in the following categories: base, content, speed, and connections to social media. it tracks, among many other parameters, title coherence, text/code ratio, accessibility of microdata, opengraph https://suite.seotesteronline.com/seo-checker information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 5 metadata, social plugins, in-page and off-page links, quality of links, mobile friendliness of the page and many other seo and technical website attributes. regarding restrictions, only two sites can be tested within a 24-hour period. the limit increases to four sites per day after free registration with a valid email address. moreover, there is a 14-day trial period during which all hidden functionalities work. in the free version that we used, a complete report can be viewed only, not downloaded or saved. 2. seo site checkup (https://seositecheckup.com/) was selected based on many positive recommendations from the technically oriented expert website traffic radius.18 seo site checkup is described as “a great seo tool that offers more than 40 checks in 6 different categories (common seo issues like missing metadata, keywords, issues related with absence of connections to social media, semantic web, etc.) to serve up a comprehensive report that you can use to improve results and the website’s organic traffic. it also gives recommendations to fix critical issues in just a few minutes. as a tool, it is very fast and provides in-depth information about the various seo opportunities and accurate results.”19 seo site checkup is appreciated and recognized as number one among other audit tools ranked by the geekflare website.20 another reason we selected this tool for our testing scenario is the fact that the google search engine will offer a link to this tool as the first after entry the search query “seo testing tool” (excluding paid links). seo site checkup is also the fastest of the selected audit tools, which could be considered as another advantage. its disadvantages include the ability to test only one website within 24 hours from one public ip address. 3. woorank (https://woorank.com) is recommended by traffic radius: “woorank offers an in-depth analysis that covers the performance of existing seo strategies, social media and more. the comprehensive report analysis is classified into eight sections for improved readability quotient, and you may also download the report as branded pdf.”21 woorank has obtained the third position among the recommended software tools. trustradius gives it a score of 9.2 out of 10 and users rate it of 4.67 out of 5 stars based on 51 reviews .22 on the one hand, some results are hidden in the free version, but the final score will be shown. on the other hand, woorank has no limit to the number of websites tested per day, but it is the slowest of the selected testing tools. we selected these three seo audit tools because they work independently, their results are comparable to each other, and they offer a quick way to get comprehensive seo analysis results for a tested site. it should be noted that results of some performed tests are hidden, but there is general guidance on how to fix some issues. however, the solution always depends on the specif ic site and used technology. using three different tools adds objectivity because we do not rely on just one tool and a one-sided view of the seo issue. the three selected testers all display results in the same way—test results are always shown as a summarized score in the range of 0 to 100 points (100 represents the best result). a very large set of seo parameters and technical website properties is evaluated in all three cases. these tests are usually divided into several categories (for example, common seo issues, performance, security issues, and social media integration). although similar parameters https://seositecheckup.com/ https://woorank.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 6 are assessed in all three audit tools, there are still some differences between them. each of the testing tools is unique in a certain area because it also tests a parameter that the others do not deal with or evaluates a website by a different methodology. still, the fact remains that the evaluated seo parameters overlap between the tools. we will not overload this paper with detailed information and technical details of individual partial tests, because they can be easily found on the website of the given test tools (seo site checkup, seo checker online, woorank). we will just mention the common core of main tests: css minification test, favicon test, google search results preview test, google analytics test, h1 heading tags test, html page size test, image alt test, javascript minification test, javascript error test, keywords usage test, meta description test, meta title test, seo friendly url test, sitemap test, social media test, robots.txt test, url canonicalization test, and url redirects test. another specific group consists of tests related to a particular audit tool. thanks to them we can get a more comprehensive view of the tested area of a website’s seo characteristics. for example, seo checker features the following specific tests: title coherence test, unique key words test, h1 coherence test, h2 heading tags test and facebook popularity test. woorank as the second tool extends the basic set of tests with the following: title tag length test, in-page links test, off-page links test, language test, twitter account test, instagram account test, traffic estimations and traffic rank. of course, there is also a set of tests that are parts of two audit tools, but the third one does not deal with them since it is specialized in another area. as we have mentioned, the tools offer a list of suggestions for potential improvement of seo characteristics. the user is informed about an issue, but no instructions or solutions are provided on how to resolve it. the main benefit of this paper lies with its objective to solve specific seo issues. this work may improve the visibility and searchability of dspace-based institutional repositories. a set of the three audit tools described above will be used in the following section. we attempt to identify possible seo issues of the selected institutional repository in the form of a case study. then we aim to fix the identified seo issues and increase its quality of seo parameters as well as demonstrate the potential impact on website traffic caused by performed repairs. all traffic measurements will be based on google analytics data. the institutional repository of the department of mediamatics and cultural heritage (seo case study) background information an older version of our digital repository (based on dspace v5.5) was launched by the department of cultural heritage and mediamatics in april 2017. now, in 2021, the repository makes available online over 180 digital objects, most of them open access under creative commons licenses. the first attempts to create and establish a similar virtual space for digital objects started long ago. several software solutions had been tested for this purpose—for example, invenio and eprints, along with dspace. according to opendoar’s statistics, eprints and dspace have always been the most popular tools for running digital repositories.23 a few years ago, dspace was chosen as the primary software for running a digital repository. since then, the usage of open-source software has been raising. for example, ubuntu server lts (long term support) is used as an operating system, tomcat 8 is used as a web server, postgresql information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 7 assumes the role of a database system, etc. all of those software components are part of a complex digital system and are orchestrated in a virtual environment that is built on an open-source virtualization solution called xcp-ng (in version 8.2). some software components have been switched for others during the development period. based on our experience, the digital repository’s regular visitors were mostly from the staff and students of the department. we initially did not feel a need to improve the visibility of this system to search engines, an oversight that turned out to be a mistake in the long run. we did not perform any search engine optimization on this repository until november 2019, when we coincidentally discovered several scientific articles dealing with seo in the academic environment. after studying the theoretical background, we initiated the practical application process. we applied theory and our experience with dspace software into an seo troubleshooting process within our local repository. most of the optimizing actions related to solving the major seo issues were performed before november 10, 2019. we will describe the seo adjustments we made and derive a list of recommendations for other institutions based on our own experience. initial testing of a clean dspace 6 installation in order to formulate any recommendations related to seo and the administration of dspace digital repositories, it is important to determine and test a starting point. for this purpose, we chose a clean instance of dspace v6.3 with an xml user interface (xmlui)—the latest commonly available stable version. this is the same version that we use in this case study and in our production environment. (a newer version, dspace 7 beta 4, was released by atmire on october 13, 2020).24 no other customization edits were made except a base configuration and necessary url settings. this installation of dspace v6.3 has been tested by the same set of tools mentioned previously. the tests we performed are summarized in table 1, where they are divided into four main seo sections in the first column: common seo issues, social, speed and security. a test name is shown in the second column. the third column is marked as “default installation,” where we display the test results on our clean dspace 6.3 installation. if the tested instance met the criteria of the given test, the green pictogram occurs. when the particular test fails, the red cross is used. the improved state is shown in the fourth column marked as semi-optimized. it is a consequence caused by many important technical changes and seo issues solving process. th is issue will be discussed and described later in this paper; however, a short note about the considered issue is displayed in each row. these notes were retrieved by reports on results. we have used the prefix semiin the last column because we were not able to resolve all detected seo issues—only most of them. all related reasons will be described briefly in the discussion section. when the improving change between states has been made, we have changed a status pictogram (from the red cross to the green correct tick) and set the row color to yellow. the changes leading to improvement (e.g., the yellow rows) will be discussed in detail later, too. recall that we have no need to overload the main text of this paper with detailed technical information about partial tests, because it can be easily found on the websites of the given test tools. table 1 shows the compared results between the non-optimized and semi-optimized states of the dspace repository. based on table 1, the default instance of dspace with basic http and other information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 8 default settings received only 58 points out of 100 in seo site checkup, 50.1 points in seo checker and 32 points in woorank. the average final score is 46.7 points out of 100. although this gained score could be considered as low, the dspace default instance still meets certain basic criteria of seo. in addition, many repository administrators usually do not rely only on a default installation, but they make at least some changes in configuration immediately after the initial installation. inter alia, the first thing to do should be an implementation of https protocol, adding a connection with google analytics services and so on. the improved state is shown in the last column of table 1. whenever we solved an issue, the overall score raised. the semi-optimized repository has obtained a higher score compared to the previous column (default installation). the last column represents the final (however semioptimized) state of technical and seo attributes which we were able to reach at this moment. as shown, many seo issues have been solved. we highlighted them in yellow. on the one hand, some issues remain unsolved. on the other hand, the overall seo improvement is more than noticeable although the final average gained score has not reached the maximum value (100 points). information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 9 table 1. comparison of results between the non-optimized and semi-optimized states of dspace repository. test name state default installation (before optimization) semi-optimized (after a few optimization steps) meta title test, title tag length the title tag is set, but the meta title of the webpage (dspace home) has a length of 11 characters. it is too low. the title tag has been set to “digitálny repozitár katedry mediamatiky a kultúrneho dedičstva” (note: in slovak language). title coherence test the keywords in the title tag are included in the body of the page the title of the page seems optimized. meta description test no meta-description tag is set. meta-description tag has been set. (121 characters) google search results preview test “dspace home” is too general. the title of the page has been changed. keywords usage test the keywords are not included in title and meta-description tags. a set of appropriate keywords has been added. unique key words test the textual content is not optimized on the page. there is an excellent concentration of keywords in the page. this page includes 382 words of which 58 are unique. h1 heading tags test 8 h1 tags, 6 h2 tags the h1 tags of the page seem not to be optimized. there are too many h1 tags. h1 coherence test the keywords present in the tag h1 are included in the body of the page. some of the keywords of the tag h1 are not included in the body of the page. h2 heading tags test the keywords present in the tag are included in the body of page. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 10 test name state default installation (before optimization) semi-optimized (after a few optimization steps) language test detected: slovak declared: missing a missed language tag has been implemented. robots.txt test no “robots.txt” file has been found. “robots.txt” file has been enabled. sitemap test no sitemap has been found. sitemap has been enabled. seo friendly url test webpage contains urls that are not seo friendly! webpage contains urls that are not seo friendly. image alt test the webpage does not use “img” tags. it is optimized. inline css test the webpage uses inline css styles. the webpage uses inline css styles. deprecated html tags test the webpage does not use html deprecated tags. google analytics (ga) test ga is not in use. ga has been implemented. favicon test default dspace favicon is used. the favicon has been customized. js error test no severe javascript errors were detected. no severe javascript errors were detected. social media test no connection with social media has been detected. the website is successfully connected with social media (using facebook). facebook account test information about facebook page has been added by schema.org metadata. facebook popularity (low) the webpage is promoted enough on facebook. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 11 test name state default installation (before optimization) semi-optimized (after a few optimization steps) twitter account test no connection with twitter has been detected. information about twitter account has been added by schema.org metadata. twittercard test no twittercard is implemented. metainformation about twittercard has been added by opengraph metadata. instagram account test no connection with instagram has been detected. information about instagram account has been added by schema.org metadata. microdata (opengraph, schema.org) test there is no microdata or opengraph/schema.org metadata on the website. some opengraph and schema.org matadata has been added. html page size test the size of the page is excellent. (23.65 kb) the size of the page is excellent. (28.84 kb) text/code ratio test 10.71% (excellent) 15.45% (excellent) html compression/gzip (no compression is enabled) the size of html could be reduced up to 79%. the webpage is successfully compressed using gzip compression on your code. your html is compressed with 78% size savings. site loading speed test loading time is around 1.86s loading time is around 2.39s page objects test the webpage has fewer than 20 http requests. the webpage has fewer than 20 http requests. page cache test (server-side caching) the pages are not cached. the pages are not cached. flash test website does not include flash objects. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 12 test name state default installation (before optimization) semi-optimized (after a few optimization steps) cdn usage test your webpage is not serving all resources (images, javascript and css) from cdns. your webpage is not serving all resources (images, javascript and css) from cdns. image, javascript, css caching tests data are not cached. data are not cached. javascript minification test javascripts are not minified. javascript files’ minification has been enabled in tomcat configuration. css minification test some of your webpage’s css resources are not minified. some of your webpage’s css resources are not minified. nested tables test the webpage does not use nested tables. frameset test the webpage does not use frames. doctype test the website has a valid doctype declaration. url redirects test 1 url redirect has been detected. it is acceptable. url canonicalization test the webpage urls are not canonized. https://repozitar.kmkd.uniza.sk/x mlui and https://www.repozitar.kmkd.uniz a.sk/xmlui should resolve to the same url, but currently do not. canonical tag test no canonical tag has been detected. the webpage is using a canonical link tag. https test website is not ssl secured. https has been implemented. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 13 test name state default installation (before optimization) semi-optimized (after a few optimization steps) safe browsing test no malware or phishing activity found. server signature test server self-signature for https is off. directory browsing test server has disabled directory browsing. plaintext emails test the webpage does not include email addresses in plain text. mobile friendliness (includes tap targets, no plugin content, font size legibility, mobile viewport) the webpage is optimized for mobile visitors. seo site checkup final score 58/100 81/100 seo checker online final score 50.1/100 78.0/100 woorank final score 32/100 65/100 average final score 46.7/100 74.66/100 resolving major seo issues this section will look at how we resolved the major seo issues that the tools detected. this is the key technical part because most of mentioned issues highlighted in table 1 were solved and described. the following technical and seo adjustments have been implemented and tested in order to improve the average final score by 59.87% (by 27.96 points, from 46.7 to 74.66 points)— comparing the fresh installation of dspace against the semi-optimized one. all the following solution procedures are based on our own experience, experiments, and research carried out in the area of digital repositories and their optimization as virtual spaces. during the solving process, we follow the order of issues stated in table 1 and describe them in more details in the dspace v6.3 environment and an xml user interface (xmlui). the following procedures may differ slightly if you are using a different version of dspace or another graphic interface (for example jspui). examples of code are given in monospaced font. title, description, and keywords tags in a website header this criterion requires filling in the specific metadata (e.g., metacontent) fields in the page’s html code. the search engines process them automatically to find out what the website is about. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 14 to solve these seo issues change a website title (in default “dspace home”) located in the language translations config files in the folder path. /dspace/webapps/xmlui/i18n/messages_en.xml. find the appropriate key and change the value. all content in this file is fully customizable. next, edit dspace’s page structure config file (in path /themes/mirage/lib/xsl/core/page-structure.xsl) in order to add the metadata content: • a meta-description tag • a keywords tag • an author tag with a carefully selected content and length just below the main tag, as shown in the example: note: do not forget the termination characters />. the keywords should be included in title and meta–description tags. several other seo parameters are affected by performing those steps, for example, google search results preview test, keywords usage test, unique key words test and keywords concentration test. language declaration the language declaration is very important for search engines to identify the primary language of the website content. if a declared language is missing in a website, you can define it by adding the following line into the page-structure.xsl file (the process is similar to adding keywords and description tag as explained above). edit the page-structure.xsl file (with vim or another text editor, for example) and add a statement like the following above the main tag: note: “sk” is an abbreviation for “slovak language” as stated in w3 namespaces. more information is available at https://www.w3.org/tr/xml/ . google analytics, robots.txt and sitemap implementation the connection between a website and google analytics services enables google analytics to track users’ behavior and understand as they interact with this site. it is the basis of web analysis. the “robots.txt” and “sitemap.xml” files are simple text files which are required for search engines to specify the website structure and additional information about it. https://www.w3.org/tr/xml/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 15 to enable google analytics services, insert a ua code identifier (id is a string), obtained from google analytics, into the dspace.cfg config files located in the dspace home folder. in that file find the key/row named “xmlui.google.analytics.key=” and insert the corresponding ua identifier there. next, it is needed to uncomment the row with the key “xmlui.controlpanel.activity.max = 250” in the same “dspace.cfg” file. finally, uncomment the row below in the “xmlui.xconf”file located in the path /dspace/config/ and restart the tomcat service: the “robots.txt” file is commonly used and enabled in dspace, but many seo audit tools are not able to detect it successfully because this file is located in path other than the expected default one. to enable robots.txt file detection, copy the file /dspace/webapps/xmlui/static/robots.txt to the root of the tomcat folder (usually located in path /var/lib/tomcat8/webapps/root). finally, restart the tomcat web service. a sitemap for a currently running dspace instance is available in the “robots.txt” file mentioned above. edit this file and set an appropriate url for the sitemap location. enabling connections with social media this criterion detects a hyperlink (or other metadata) connection between a website and popular social media, such as facebook, twitter, etc. the primary goal is to promote the digital content. this subsection deals with social media connections with a dspace-based repository. a simple creation of a profile or a site on a social network related to a repository is considered an essential example of good practice. however, an appropriate form of connection between sites must be created, too. naturally, further endorsement of this system through social networks is another key step. social media-oriented tests are performed by each seo audit tool nowadays. the detected connection with social media could have a big impact on the site’s popularity, as well as on the gained seo final score. there are many ways how to establish these connections: connection with facebook, instagram and twitter—a direct link from the homepage, for example: to add a link to a facebook site profile, edit the page-structure file (/dspace/webapps/xmlui/themes/mirage/lib/xsl/core/page-structure.xsl) just below a div tag with id “ds-footer-wrapper”. for example: